Text Analytics

Text Analytics: Active Learning

Course Description

Active Learning (not to confuse with Active Learning from the educational field) tackles the challenge of how machine learning algorithms can achieve an equal or even greater performance with less training data. Supervised learning methods require labeled data which is often costly to annotate and often involves experts. However, different data contributes differently to the performance of an machine learning algorithm. For example, when training a classifier to classify pictures of cats, dogs, and alligators, adding more cat pictures would not help that much when the classifier is already very good at recognizing cats. The main research question in active learning is which data gives us the biggest gain if they had labels. For this we search for different strategies which decrease the number of necessary training data, independent of the underlying machine learning algorithm.

The increasing popularity of crowd-sourcing platforms opens up another question. How do we optimally distribute different annotation tasks between annotators with different skill sets? It would be feasible to give easy data points to unskilled annotators and hard ones to experienced annotators. In the same time we can use those strategies to find data points which are unsuited for crowd-sourcing platforms.

This seminar covers active learning sampling strategies, evaluation metrics for active learning, occurring problems with active learning and examines practical use cases of active learning in the field of text analytics.

Teaching Staff

  • Prof. Dr. Iryna Gurevych
  • Ji-Ung Lee

We do not have fixed office hours. Please register via email if you need an appointment.

Literature

Will be announced during the seminar.

Timetable

The first sessions will consist of introductory lectures to cover the basics of active learning. The program for the remainder of the seminar will be determined according to the number of participants and will cover the following topics (not necessarily in this order):

  • Active Learning and crowd-sourcing
  • Active Learning and recommender systems
  • Deep Active Learning
  • Active Transfer Learning

Note, that active learning can be seen as a meta-machine learning method. Thus, each topic offers a rich variety of ongoing research.