Text Analytics: Machine Learning for Text

Text Analytics: Machine Learning for Text

Course Description

Text analytics is about the extraction of useful knowledge from texts. The ubiquity of texts in Web, social networks, emails, and digital libraries makes the need for the text processing approaches imperative. In recent years, leveraging machine learning methods for analyzing texts has received a lot of attention. But, what makes learning from texts specific? How should texts be represented for machine learning models? How can machine learning models discover hidden semantic information within a text?

This seminar is going to answer these questions by coherently introducing various text-centric machine learning approaches from basic algorithms (e.g. rule-based) to advanced models (e.g., deep neural networks). The seminar covers a wide spectrum of fascinating topics such as text clustering, text classification, heterogeneous data, and feature selection. We discuss exciting applications of these approaches. Examples include information extraction, text summarization, sentiment analysis, and text segmentation. By the end of this seminar, students gain the knowledge of how to apply machine learning approaches to solve text-centric problems.


Seminar: Tuesday 15:20-17:00, Room S202 / C120

The first class will be held on October 16th, 2018.

Additional material will be distributed via the Moodle eLearning platform. The required passcode will be announced during the first lecture.

Teaching Staff

  • Mohsen Mesgar
  • Prof. Dr. Iryna Gurevych

We do not have fixed office hours. Please register via email if you need an appointment.


The most content of this seminar is provided from the following book:

“Machine Learning for Text” by Charu C. Aggarwal, published 2018.

Other literature will be announced during the seminar.


The first session will be an introduction to the text processing tasks and relevant machine learning approaches for fulfilling them. The program for the remainder of the seminar will be determined according to the number of participants and will cover the following topics (not necessarily in this order):

  • Text similarity computation
  • Topic modeling
  • Text clustering
  • Text classification
  • Linear regression for text
  • Deep learning
  • Text summarization
  • Sentiment analysis