Text Analytics: Text Summarization

Text Analytics: Text Summarization


Due to the steadily growing amount of unstructured text on the web, it becomes more and more important to use methods for automatic summarization, to get control over the information flood. The goal of such methods is to take one or more input texts and to transform them into a shorter text, the so called summary. A summary should be informative and readable and should preserve the meaning of the original texts.

Simple methods to produce a summary choose the first paragraph, count word frequencies, or look for cue words. But more sophisticated methods use techniques from Natural Language Processing (e.g., lexical chains, or the rhetorical structure theory), and utilize machine learning techniques (e.g., Naïve Bayes, or decision trees).

The seminar reviews the most important methods along the dimensions of single vs. multi document, including pre- and post-processing methods to improve the quality of the resulting summaries. We will also have a look on how to evaluate summaries and how to manually create summaries, which is a possible prerequisite for the evaluation.


Students are expected to perform the following tasks:

  • attend the seminars
  • prepare a presentation on a topic relevant for the seminar
  • present this presentation and be able to answer questions from the audience
  • prepare a term paper on the topic


  • Lectures are Thursdays, 13:30-15:10 in D017.
  • All other information will be provided during the course and added here.


  • Karen Spärck Jones: “Automatic summarising: a review and discussion of the state of the art”, Technical Report Number 279, University of Cambridge, 2007.
  • Inderjeet Mani, Mark T. Maybury: “Advances in Automatic Text Summarization”, The MIT Press, London, 1999.


Introductory courses on Natural Language Processing and Summarization will be held in the first three sessions of the seminar on Thursdays (17.10., 24.10. and 31.10) 13:30-15:10 in D017. The program for the remainder of the seminar will be announced according to number of participants and topics to be discussed.


  • Dr. Margot Mieskes (office hours: will be announced in the first session, please resigster by e-mail)
  • Prof. Dr. Iryna Gurevych