Knowledge graphs: description, validation and subsetting


Knowledge graphs like Wikidata have reached great success in the representation and integration of vast amounts of information from different domains. The use of flexible data models, without predefined schemas facilitates the aggregation of data from heterogeneous sources. However, the lack of schemas also undermines the quality of the data available in knowledge graphs. Many times, data curators have an implicit schema that can describe the data and be used to check the conformance of the data. The Shape Expressions (ShEx) language was created with the goal to describe and validate RDF and was adopted by Wikidata in the Entity Schemas namespace. In this talk, we will briefly present the ShEx language and show some applications. More specifically, we will show how ShEx can be used to describe and generate Wikidata subsets, which can help the use of Wikidata as a research source.

Invited talk at International Center for Computational Logic at Dresden