Wikipedia-Wikidata sentence-level relation annotations
Daniil Sorokin and Iryna Gurevych (2017)
Context-Aware Representations for Knowledge Base Relation Extraction, In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), p. (to appear), September 2017
PDF | BibTeX
We provide a subcorpus of Wikipedia that was annotated with Wikidata relations using a distant supervision procedure. The corpus contains two types of annotations: entities and relations. Entity annotations were extracted from the Wikipedia linkes in the article text. Each link was converted to a Wikidata identifier using the mappings from the Wikidata itself. Additional entities were recognised using a named entity recognizer and were later linked to Wikidata. For each pair of entities in each sentence we searched for Wikidata relations that connect this pair of entities and stored all unambigious instances (only one relation is possible).
The zipped file (download (115MB) here, v1.0, 2017-08-25) contains the following files in Json format:
- The training set of sentences that was used for model training.
- The development set of sentences for parameter tuning.
- The held-out set of sentences and relations for the final evaluation.