LKE/KDSL Research Seminar


The LKE/KDSL Research Seminar on November 18th 2014, 9:50am in S2|02 C110, will feature two talks, with details as follows:

Steffen Remus

Title: Introducing LtBot, an application for focused crawling. Also, discussing distributional features for relation classification.

Abstract: Have you ever had domain specific text data but it was not enough for your algorithms to produce reasonable results? In this talk I am going to present LtBot, a focused crawling application with the goal to collect web data of the same topical domain as the provided text data. It thereby omits to download irrelevant documents and retrieves an extended in-domain corpus faster and by using less computing resources. If time permits I want to talk about the usefulness and applicability of distributional features, i.e. a word and its context, for relation classification. The presented work is ongoing and unpublished, hence a lively and open discussion is welcome and appreciated.

Carsten Schnober

Title: Children and their World

Abstract: In the context of “Welt der Kinder”, a history science project, natural language processing methods are developed and adapted that are suitable for a specific text type: German textbooks from the 19th century. In order to provide insights from that corpus to the the targeted users, historians, NLP techniques such as topic modelling and opinion mining are applied and adapted. However, there are currently no NLP tools that have been designed for and tested against the particularities found in the “Welt der Kinder” corpus. It contains noise from digitization and OCR mistakes, and variations in orthography and semantics typical of this era and genre. One of the research questions is thus: how can we apply and adapt NLP techniques for the given corpus which has never been exploited in a large-scale project before.

This is a practice talk for the presentation I am going to give at the joint CLARIN-NeDiMAH workshop “Exploring Historical Sources with Language Technology”. I am going to present the “Welt der Kinder” project in general and discuss its NLP component in particular, which is work in progress.