LKE/KDSL Reading Group

2013/06/11

The LKE/KDSL Reading Group will discuss 3 papers on Tuesday June 18, 11:40-13:10 S1/03 223.

Event: The Language and Knowledge Engineering / Knowledge Discovery in Scientific Literature (LKE/KDSL) reading group

Professors: Prof. Iryna Gurevych, Prof. Marc Rittberger, Prof. Karsten Weihe et al.

Coordination: Dr. Judith Eckle-Kohler, Emily Jamison

Place / Time: Tuesday June 18, 11:40-13:10 S1/03 223

The following papers will be presented:

Emily Jamison (moderator):

Pilehvar, Mohammad Taher and Navigli, Roberto. Paving the Way to a Large-scale Pseudosense-annotated Dataset. In Proceedings of NAACL 2013. www.aclweb.org/anthology/N13-1130

Pseudo-sense words are artificial words that model real polysemous words. They are generated by combining the contexts of multiple monosemous words. The benefit of pseudowords is that their sense ambiguity is completely controlled; the correct “sense” is known. However, in order for psuedowords to accurately model real sense distributions, there must be semantic connections between the senses. This can be challenging to model, and pseudowords may not be generated in large quantity.

Points to discuss:

What are the pros and cons of this method of generating pseudowords? Can we expect them to model real polysemous words?

Data acquisition is always a bottleneck of NLP research. Can we draw conclusions about other forms of corpus creation for other subfields, from this research?

Tristan Miller (moderator):

Karën Fort, Adeline Nazarenko and Sophie Rosset. Modeling the complexity of manual annotation tasks: A grid of analysis. In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pp. 895–910, December 2012. www.aclweb.org/anthology/C12-1055

Manual annotation of text is costly. If we could discover what the most expensive (i.e., difficult, complex, time-consuming) parts of an annotation task are, perhaps we could (re)design or (re)implement it in such a way as to minimize the cost. In this paper, the authors identify which factors contribute to the cost of annotation tasks, and propose an analytical framework for assessing and comparing them. Their eventual goal is to produce a plugin for annotation management tools which which would perform this analysis automatically.

The work seems potentially useful, but is the set of factors they identify correct and complete? And is the grid of analysis they propose realistically useful?

Kostadin Cholakov (moderator):

Eiji Aramaki, Sachiko Maskawa and Mizuki Morita. Twitter Catches The Flu: Detecting Influenza Epidemics Using Twitter (EMNLP 2011) www.aclweb.org/anthology-new/D/D11/D11-1145.pdf (opens in new tab)

Why this paper was chosen:

- The catchy title

- Twitter and text classification techniques will be interesting for UKP members working on those

- Using NLP techniques for a large and important social task

Points to discuss:

- Better text classification techniques, may be some UKP inventions?

- The task of epidemics detection based on Twitter only is very challenging, so how about include other types of data sources into the mix?

- Why text data only? How about getting other types of data in the mix?

About the LKE/KDSL Reading Group:

The Language and Knowledge Engineering/Knowledge Discovery in Scientific Literature (LKE/KDSL) reading group is part of the LKE/KDSL doctoral program. This program has been established in 2013 at the German Institute for Educational Research and Educational Information (http://www.kdsl.tu-darmstadt.de/typo3conf/ext/rs_linklayout/res/link_ext.gif DIPF) in Frankfurt/Main and the Computer Science Department at www.kdsl.tu-darmstadt.de/typo3conf/ext/rs_linklayout/res/link_ext.gif Technische Universität Darmstadt. The goals of the reading group are:

-to stay up-to-date about recent emerging topics and outstanding papers on graph-based algorithms, machine learning and knowledge engineering, computational linguistics and semantics, Web 2.0 technologies and Information Management in the (shared) context of language technologies;

-to practice reviewing, argumentation and negotiation skills;

-to learn about paper writing and best practices in research.

The Language and Knowledge Engineering/Knowledge Discovery in Scientific Literature reading group is the successor of the LKE reading group. The LKE/KDSL reading group takes place on a 4-weekly basis. Each time, papers chosen by the session moderators are collectively read, reviewed and then discussed by the reading group participants.