CLARIN-D: Implementation of a web-based annotation platform for linguistic annotations (F-AG 7)

We develop a web-based tool, which runs in a web browser without further installation effort. We support annotations on several linguistic layers within the same user interface. Further, we realize an interface to crowdsourcing platforms, to be able to scale simple annotation tasks to a large amount of annotators. The annotation platform will be connected to the CLARIN-D infrastructure, to be interoperable with the processing pipelines in WebLicht. The development of the tool is supported by a concurrent second curation project, which defines ‘best practices’ for linguistic annotation on several language layers for different annotator status groups.

This platform addresses all communities that perform systematic annotation of textual material, which means tagging the text with a closed set of labels that are defined in annotation guidelines. This is especially relevant for the communities of computational linguistics, language technology and quantitative linguistics.

The project is scheduled to run from 1 September 2012 to 30 November 2013.

The software is available as open-source under the Apache 2.0 License at

Project architecture


  • Prof. Chris Biemann, investigator
  • Prof. Iryna Gurevych, investigator
  • Richard Eckart de Castilho, investigator
  • Seid Muhie Yimam, executive staff
  • WebLicht Team, Universität Tübingen, executive staff


  • Hinrichs, Marie; Thomas Zastrow and Erhard Hinrichs (2010): WebLicht: Web-based LRT Services in a Distributed eScience Infrastructure. Proceedings of LREC 2010, Malta.
  • Stenetorp, Pontus; Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou and Jun'ichi Tsujii (2012). brat: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations Session at EACL 2012, Avignon, France
  • Richard Eckart de Castilho and Sabine Bartsch and Iryna Gurevych (2012): CSniper – Annotation-by-query for non-canonical constructions in large corpora. Proceedings of the 50th Meeting of the Association for Computational Linguistics (ACL) 2012 (Demo section), Jeju, South Korea
  • Richard Eckart de Castilho and Iryna Gurevych (2009): DKPro-UGD: A Flexible Data-Cleansing Approach to Processing User-Generated Discourse. Proceedings of LINA CNRS UMR 6241, Nantes, France