Text Analytics

Text Analytics: Crowdsourcing

Course Description

How would it be if hundreds of people worked for me?

This experience can be made in the 2012/13 edition seminar in the TextAnalytics series.

As a seminar project, we will create linguistic annotations using crowdsourcing. Crowdsourcing allows us to distribute simple tasks over a large number of crowdworkers that are paid for this. Funds are available for about 10 projects.

These annotations will be used in language technology software to create processing components based on machine learning. At this, the quality of the component is dependent on size and quality of the anotated dataset.

By creating and using datasets, students get to know the entire process chain of statistical natural language processing: in practice, especially the creation of datasets is a challenge.

This seminar is being held in the format of a Mini-Workshop: After an introductory lecture, individual topics are assigned. Introductory literature by topic, as well as a basic software environment and a manual for crowdsourcing is provided. Students write a paper, consisting of a literature overview and a description of their own experiment. Papers are mutually peer-reviewed. In a final workshop, the work is presented in a 20 minute presentation.


Research papers will be distributed per topic.


Each student is expected to

  • write a term paper
  • review other student papers
  • give a 20 min. talk in class + 10 min. Q&A afterwards

Introduction Session: 16.10.2011


Prof. Dr. Chris Biemann

Workshop Proceedings

The papers and the presentation slides of all participants that have given their permission to publish their materials, as well as all data obtained with crowdsourcing during the seminar, are given below.

Please download all the following files at once here: [download] .

  • Chris Biemann, Dominik Fischer: Introduction to the Workshop on TextAnalytics (slides1) (slides2) (slides3)
  • Lukas Georgieff: Improving Ontology Construction Using Crowdsourcing and Machine Learning (paper) (slides) (data)
  • Dennis Werner: Crowdsourcing für patternbasierte Informationsextraktion auf Fußballsportnachrichten (paper) (slides) (data)
  • Leo Swiezinski: Crowdsourcing Mongolian Suffix Boundaries (paper) (data)
  • Qi Shao: Sentiment Analyse für beliebte Marken (paper) (slides) (data)
  • Benjamin Milde: Crowdsourcing slang identification and transcription in twitter language (paper) (slides) (data)
  • Stefan Glotzbach: Silbentrennung durch Crowdsourcing und Möglichkeiten Userinterfaces bei Crowdflower anzupassen (paper) (slides) (data)
  • Christian Fahr: A Crowdsourcing Approach to Alignment of Dictionary Definitions (paper) (slides) (data)
  • Christian Hollubetz: Zuweisung von Wörterbuch-Definitionen mit Hilfe von Crowdsourcing (paper) (slides) (data)
  • Olga Popova: Paraphrases detection with the help of crowdsourcing (data)
  • Gerold Hintz & Martin Tschirsich: Leveraging Crowdsourcing for Paraphrase Recognition (data)

One paper was accepted for oral presentation in an international workshop:

Tschirsich, Martin and Hintz, Gerold (2013): Leveraging Crowdsourcing for Paraphrase Recognition. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Sofia, Bulgaria, pp. 205-213. Association for Computational Linguistics (pdf)