Guiding Theme B2: Content comparison through large-scale inference and reasoning

Guiding Theme B2: Content comparison through large-scale inference and reasoning

Guiding Theme B2 deals with the selection of statements from heterogenous documents that should be part of a summary. This involves the development of methods for measuring the importance of statements.

Research results of the first Ph.D cohort

The main focus of the first phase was on content selection methods for multi-document summarisatization, which is the first step in summarizing a document, whether it is abstractive or extractive. The goal of content selection is the identification and extraction of the key elements from the source documents. We focused on optimization-based content selection, as this yields very promising results and allows us to closely cooperate with guiding theme D2, which is concerned with analyzing and defining suitable evaluation metrics for multi-document summarisatization.

In particular, we formulated content selection as an optimization problem, where the goal is to choose a set of information nuggets which have some desired properties while meeting a length constraint. The objective function of the optimization methods should approximate as closely as possible the quality judgment of a summary. To this end, we developed objective functions to approximate the ROUGE metric (Peyrard and Eckle-Kohler, 2016a) and the Jensen-Shannon divergence (Peyrard and Eckle-Kohler, 2016b). We also explored the use of genetic algorithms to generate training data (Peyrard and Eckle-Kohler, 2017a) and used it to optimize towards the recent Automatic Pyramid metric (Peyrard and Eckle-Kohler, 2017b) and, in a collaboration with C3, towards human judgments (Peyrard et al., 2017).

Ongoing project of the 2nd Ph.D cohort

In the second phase, we broaden content selection to pairwise comparison and evaluation of content, e.g. regarding the similarity or entailment of statements. This allows to distinguish between similar content extracted from different documents, which should be summarized together, and completely independent content, which cannot be summarized as one. The content comparison thus allows a second content selection phase, in which entailed content is not further considered for summarization. Since predicting the similarity or entailment of statements is a difficult task, additional background knowledge is required, which can be learned or extracted at large scale.

In order to apply reasoning to natural language statements in combination with knowledge bases, two connected problems have to be investigated:

  • 1) Finding a suitable representation of the two statements that can be used in combination with an appropriate reasoning paradigm. This may be a logical representation, e.g. the parse structures of the statements as obtained using semantic parsing methods, or a numerical representation in the form of distributional semantics such as word or sentence vectors. Accordingly, purely logical or probabilistic reasoning methods may be applied.
  • 2) Choosing a (combination of) knowledge base(s), e.g. Wikidata, DBPedia, Freebase, Wordnet, etc, that provides background knowledge usable as resource for the chosen reasoning paradigm.

In addition to creating a new state of the art approach for pairwise classification problems by injecting additional knowledge and applying reasoning methods, research of the second Ph.D. cohort aims at achieving explainability of the classification. For example, an explanation as to why a statement entails another in terms of the reasoning steps and background knowledge involved.

This project provides opportunities for collaboration with Area C (especially semantic role labelling in C3) as well as with Area A in terms of semantic relation extraction (in particular A2).

People

  • PI (First Cohort): Prof. Dr. Iryna Gurevych
  • PI (Second Cohort): Dr. Claudia Schulz
  • Former PI: Dr. Judith Eckle-Kohler
  • First Cohort PhD student: Maxime Peyrard
  • Second Cohort PhD student: Prasetya Ajie Utama

Publications

Peyrard, Maxime ; Gurevych, Iryna (2018):
Objective Function Learning to Match Human Judgements for Optimization-Based Summarization.
In: Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, New Orleans, USA, June 2018, [Online-Edition: http://aclweb.org/anthology/N18-2103],
[Konferenzveröffentlichung]

P. V. S., Avinesh ; Peyrard, Maxime ; Meyer, Christian M. (2018):
Live Blog Corpus for Summarization.
In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), European Language Resources Association, Miyazaki, Japan, [Online-Edition: http://www.lrec-conf.org/proceedings/lrec2018/summaries/317....],
[Konferenzveröffentlichung]

Rücklé, Andreas ; Eger, Steffen ; Peyrard, Maxime ; Gurevych, Iryna (2018):
Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations.
In: arXiv:1803.01400, [Online-Edition: https://arxiv.org/abs/1803.01400],
[Article]

Utama, Prasetya ; Weir, Nathaniel ; Basik, Fuat ; Binnig, Carsten ; Cetintemel, Ugur ; Hättasch, Benjamin ; Ilkhechi, Amir ; Ramaswamy, Shekar ; Usta, Arif (2018):
An End-to-end Neural Natural Language Interface for Databases.
In: arXiv preprint arXiv:1804.00401, [Article]

Peyrard, Maxime ; Botschen, Teresa ; Gurevych, Iryna (2017):
Learning to Score System Summaries for Better Content Selection Evaluation.
In: Proceedings of the EMNLP workshop "New Frontiers in Summarization", Association for Computational Linguistics, Copenhagen, Denmark, September 2017, [Online-Edition: http://www.aclweb.org/anthology/W17-4510],
[Konferenzveröffentlichung]

Peyrard, Maxime ; Eckle-Kohler, Judith (2017):
A Principled Framework for Evaluating Summarizers: Comparing Models of Summary Quality against Human Judgments.
In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Association for Computational Linguistics, Vancouver, Canada, July 2017, Volume 2: Short Papers, DOI: 10.18653/v1/P17-1100,
[Online-Edition: http://aclweb.org/anthology/P/P17/P17-2005.pdf],
[Konferenzveröffentlichung]

Peyrard, Maxime ; Eckle-Kohler, Judith (2017):
Supervised Learning of Automatic Pyramid for Optimization-Based Multi-Document Summarization.
In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Association for Computational Linguistics, Vancouver, Canada, July 2017, Volume 1: Long Papers, [Online-Edition: http://aclweb.org/anthology/P17-1100],
[Konferenzveröffentlichung]

Peyrard, Maxime ; Eckle-Kohler, Judith (2016):
A General Optimization Framework for Multi-Document Summarization Using Genetic Algorithms and Swarm Intelligence.
In: Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016), The COLING 2016 Organizing Committee, Osaka, Japan, December 2016, [Online-Edition: http://aclweb.org/anthology/C16-1024],
[Konferenzveröffentlichung]

Zopf, Markus ; Peyrard, Maxime ; Eckle-Kohler, Judith (2016):
The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach.
In: Proceedings of the 26th International Conference on Computational Linguistics, The COLING 2016 Organizing Committee, Osaka, Japan, [Online-Edition: http://aclweb.org/anthology/C16-1145],
[Konferenzveröffentlichung]

Peyrard, Maxime ; Eckle-Kohler, Judith (2016):
Optimizing an Approximation of ROUGE - a Problem-Reduction Approach to Extractive Multi-Document Summarization.
In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Association for Computational Linguistics, Berlin, Germany, August 2016, Volume 1: Long Papers, DOI: 10.18653/v1/P16-1172,
[Online-Edition: http://www.aclweb.org/anthology/P16-1172],
[Konferenzveröffentlichung]

go to TU-biblio search on ULB website