LKE/KDSL Reading Group and Talks Event


On December 2nd 2014, there will be a combined LKE/KDSL Reading Group and Talks event. It will feature the following talks:

(Practice Talk) Emily Jamison

Title: Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets

Abstract: Crowdsourced data annotation is noisier than annotation from trained workers. Previous work has shown that redundant annotations can eliminate the agreement gap between crowdsource workers and trained workers.

Redundant annotation is usually non-problematic because individual crowdsource judgments are inconsequentially cheap in a class-balanced dataset.

However, redundant annotation on class-imbalanced datasets requires many more labels per instance.

In this work, using three class-imbalanced corpora, we show that annotation redundancy for noise reduction is very expensive on a class-imbalanced dataset, and should be discarded for instances receiving a single common-class label. We also show that this simple technique produces annotations at approximately the same cost of a metadata-trained, supervised cascading machine classifier, or about 70% cheaper than 5-vote majority-vote aggregation.

(Reading Group) Christian Stab

Gabor Angeli and Chris Manning. 2014. NaturalLI: Natural Logic Inference for Common Sense Reasoning. In Empirical Methods in Natural Language Processing (EMNLP), pages 534-545, Doha, Qatar (

Is it possible to assess the truth of a given statement without using predefined domain knowledge, learned inference rules or fuzzy database lookups? The authors of this paper present an inference system for inferring common sense facts from a “very large database” of known facts. The approach is based on Natural Logic and uses Natural Language as an “inference engine”. The authors claim that their system is able to capture strict Natural Logic inferences by relying on lexical mutations. The controversial questions include: How far can we go with Natural Logic? Is it really possible to replace strict inference rules by this approach? What is a good “seed fact base” for getting a good coverage of common facts?

(Matsers Thesis Final Presentation) Frerik Koch