Describe and Validate your RDF Data with Data Shapes


RDF is a fundamental part of the web (championed by the semantic web and knowledge graph efforts). It has a versatile data model that enables automatic integration of data from different sources. The RDF ecosystem supports systems which can be available through SPARQL endpoints as well as knowledge representation languages that can be used to infer new data. However, RDF versatility also comes with a price and it is necessary to adopt defensive programming techniques when confronted with it. Although RDF producers usually have an implicit schema of the data, it is not explicitly documented nor is it rigorously followed, requiring consumers to program defensively or write elaborate queries to work around inconsistencies. Shape Expressions (ShEx) has been created as a concise and human readable language to describe and validate RDF data. ShEx schemas declare expectations about the topology of RDF data which can be automatically verified. ShEx is used in large-scale modeling efforts like FHIR and Wikidata. For instance, Wikidata Entity Schemas extensions create a whole ecosystem of ShEx schemas and it has also been increasingly adopted by the community. In this talk, we will describe the language and some of its applications and tools.

Lotico talks