LKE/KDSL Research Seminar

2014/06/05

On Tuesday, 3rd June 2014, LKE/KDSL Research Seminars featured two talks:

Johannes Daxenberger

Title: DKPro TC: A Java-based Framework for Supervised Learning Experiments on Textual Data

Abstract: This is a training talk for the poster presentation (and maybe a short system demonstration), upcoming at ACL 2014.

We present DKPro TC, a framework for supervised learning experiments on textual data. The main goal of DKPro TC is to enable researchers to focus on the actual research task behind the learning problem and let the framework handle the rest. It enables rapid prototyping of experiments by relying on an easy-to-use workflow engine and standardized document preprocessing based on the Apache Unstructured Information Management Architecture. It ships with standard feature extraction modules, while at the same time allowing the user to add customized extractors. The extensive reporting and logging facilities make DKPro TC experiments fully replicable.

Martin Riedl:

Title: Evaluating Unsupservised Parsers for Distributional Similarity

Abstract: In this work, we address the role of syntactic parsing for distributional similarity.

On the one hand, we are exploring distributional similarities as an extrinsic test bed for unsupervised parsers. On the other hand, we explore whether single unsupervised parsers, or their combination, can contribute to better distributional similarities, or even replace supervised parsing as a preprocessing step for word similarity.

We evaluate distributional thesauri against manually created taxonomies both for English and German for five unsupervised parsers. While for English, a supervised parser is the best single parser in this evaluation, we find an unsupervised parser to work best for German. For both languages, we show significant improvements in word similarity when combining features from supervised and unsupervised parsers.