Data validation and Schema Interoperability


Validating RDF data becomes necessary in order to ensure data compliance against the conceptualization model it follows, e.g., schema or ontology behind the data, and improve data consistency and completeness. There are different approaches to validate RDF data, for instance, JSON schema, particularly for data in JSONLD format, as well as Shape Expression and Shapes Constraint Language, which can be used with other serialization as well, e.g., RDF/XML or Turtle. Currently, no validation approach is prevalent regarding others, selection commonly depends on data characteristics, background knowledge and personal preferences . In some cases, the approaches are interchangeable; however, that is not always the case, making it necessary to identify a subset among them that can be seamlessly translated from one to another. During the NBDC/DBCLS 2019 BioHackathon, we worked on a variety of topics related to RDF data validation, including (i) development of ShEx shapes for a number of datasets, (ii) development of a tool to semi-automatically create ShEx shapes, (iii) improvements to the RDFShape tool, and (iv) enabling validation schema conversion from one format to the other. Here we report on our BioHackathon achievements.

BioHackrXiv Preprint