Automatic Text Summarization

Automatic Text Summarization

Content

Due to the steadily growing amount of unstructured text on the web, it becomes more and more important to use methods for automatic summarization, to get control over the information flood. The goal of such methods is to take one or more input texts and to transform them into a shorter text, the so called summary. A summary should be informative and readable and should preserve the meaning of the original texts.

Simple methods to produce a summary choose the first paragraph, count word frequencies, or look for cue words. But more sophisticated methods use techniques from Natural Language Processing (e.g., lexical chains, or the rhetorical structure theory), and utilize machine learning techniques (e.g., Naïve Bayes, or decision trees).

The seminar reviews the most important methods along the dimensions of single vs. multi document, generic vs. user-specific, and abstractive vs. extractive summarization. A connecting theme will be the methodology of evaluation, as only quantitative and qualitative measurements allow the judgement of summaries or the comparison of different methods.

Core literature

  • Karen Spärck Jones: “Automatic summarising: a review and discussion of the state of the art”, Technical Report Number 279, University of Cambridge, 2007.
  • Inderjeet Mani, Mark T. Maybury: “Advances in Automatic Text Summarization”, The MIT Press, London, 1999.

Further literature will be announced.

Timetable

  • Introduction 1: April 13th 2010, 15:20 – 17:00, Room S202/A313, general introduction, organization and topic assignments
  • Introduction 2: April 20th 2010, 15:20 – 17:00, Room S202/E102, introduction to NLP
  • Introduction 3: April 27th 2010, 15:20 – 17:00, Room S202/E102, introduction to automatic text summarization
  • Presentations: May 25th – July 13th 2010, 15:20 – 17:00 in Room S202/E102.
  • Office hours: For up-to-date information, please look at the Moodle news forum.

Presentations

May 25th 2010, S202/E102: Abstractive methods

  • Sebastian Kasten: Hahn and Reimer: “Knowledge-based Text Summarization: Salience and generalization Operators for Knowledge Base Abstraction”, in: Advances in Automatic Text Summarization, Mani & Maybury (Ed.), 1999
  • Amir Naseri: Fiszman et. al, “Abstraction Summarization for Managing the Biomedical Research Literature”, in Proceedings of the HLT/NAACL, 2004

June 1st 2010, S202/E102: Evaluation and summarization challenges

  • André Schaller: Lin: “ROUGE: a Package for Automatic Evaluation of Summaries”, in: Proceedings of the Workshop on Text Summarization Branches Out, 2004
  • Henning Koes: Dang and Owczarzak: “Overview of the TAC 2008 Update Summarization Task”, in: Proceedings of the TAC 2008

June 8th 2010, S202/E102: Linguistic methods

  • Ella Syndikus: Marcu: “Discourse trees are good indicators of importance in text”, in: Advances in Automatic Text Summarization, Mani & Maybury (Ed.), 1999

June 15th 2010, S202/E102: Linguistic & graph-based methods

  • Madieha Taddbier: Barzilay and Elhadad: “Using lexical chains for text summarization”, in: Proceedings of the ISTS, 1997
  • Johannes Beutel: Erkan & Radev: “LexRank: Graph-based Lexical Centrality as Salience in Text Summarization”, in: Journal of Artificial Intelligence Research, Vol. 22, 2004, pp. 457-479

June 22nd 2010, S202/E102: Machine learning methods

  • Bouchra Elfakir: Kupiec et. al.: “A trainable document summarizer”, in: Proceedings of the SIGIR 1995
  • Steffen Remus: Chuang and Yang: “Extracting Sentence Segments for Text Summarization: A Machine Learning Approach”, in: Proceedings of the ACM SIGIR 2000

July 13th 2010, S202/E102: Invited talk

  • Leonhard Hennig (DAI-Labor Berlin): Hennig, De Luca, and Albayrak: “Learning Summary Content Units with Topic Modeling”, in: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) [to appear]

Materials and forum

Slides, student presentations, forum, and additional materials will be available at the Moodle eLeaning platform. The required enrolment key will be distributed during the lecture.

For general advice on presenting your topic, please have a look at these guidelines.

Teaching Staff

  • Prof. Dr. Iryna Gurevych
  • Joachim Caspar, Dipl.-Inf.

Please contact Joachim Caspar for any organizational issues concerning this seminar.