Wikipedia Edit-Turn-Pair Corpus

Corresponding and Non-Corresponding Edit-Turn-Pairs from the English Wikipedia

For the edit-turn-pair detection task:

Johannes Daxenberger and Iryna Gurevych

Automatically Detecting Corresponding Edit-Turn-Pairs in Wikipedia

In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Short Papers. June 2014. Baltimore, MD, USA.

For the crowdsource annotation:

Emily K. Jamison and Iryna Gurevych

Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets

In: Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing. December 2014. Phuket, Thailand.

Resource Download

The ETP-gold corpus is based on article edits and discussion page turns from the English Wikipedia, and therefore available under the Creative Commons Attribution/Share-Alike License (CC-BY-SA).

The ETP-gold-labels MTurk dataset contains the labels and metadata from the crowdsource annotation task. We release this dataset under the Creative Commons Attribution/Share-Alike License (CC-BY-SA).

In case of questions, please contact Johannes Daxenberger (ETP-gold) or Emily Jamison (ETP-gold-labels).