Guiding Theme C2: Methods for contextual and constraint-based ranking

Guiding Theme C2: Methods for contextual and constraint-based ranking

The goal of this guiding theme is the development of suitable ranking algorithms that will be used in other parts of the project, with a particular focus on multi-document summarization. In particular, we aim at the development of algorithms for learning to rank information items such as sentences according to various criteria such as their importance, a key ingredient in extractive summarization algorithms.

Research results of the first Ph.D. cohort

The main focus of the first phase was to support multi-document summarization by ranking sentences according to their importance. In particular, we developed supervised and unsupervised machine learning methods for estimating the intrinsic importance of text units, and used them as the backbone of the CPSum summarization system. Unlike conventional approaches, CPSum does not rely on centrality or structural features as indicators for information importance, but learns to rank sentences according to their perceived information importance directly from a background corpus (Zopf et al., 2016a). Furthermore, we developed a methodology to evaluate automatically generated summaries without reference summaries (Zopf, 2018a). The basic summarization algorithm of CPSum, which generates contextual rankings of sentences for addressing both importance and redundancy jointly (Zopf, 2015; Zopf et al., 2016b), can not only learned information importance but also use a wide variety of different annotation types that are contributed by other guiding themes, such as named entities (A1), events and relations between them (A2), opinions (A3), concepts (B2), motifs (C1), and frames (C3), to estimate importance of text units. We published a joint work that integrates the results from these work packages into our summarization system (Zopf et al., 2018b).

Ongoing project of the 2nd Ph.D. cohort

In the second phase, we continue this work and extend it in two ways. On the one hand, in the first phase of AIPHES, we partly relied on crowd-sourcing for obtaining training signals for information importance, which we now intend to replace with incidental signals that can be obtained from background corpora or other information, an approach that have already started to explore in (Zopf et al. 2016a) and (Zopf, 2018a). On the other hand, instead of focusing solely on information importance, we want develop a unified framework for exploiting incidental signals to rank sentences according to different criteria such as importance, validity, trustworthiness and more.

Our framework will use preference learning and ranking as the underlying formalism that allows us to encode and aggregate incidental supervision signals. In this particular setting, several challenges need to be addressed: (i) identifying the informative cues capable of generating the incidental supervision signals from available resources; (ii) exploiting these signals in order to produce good models to rank sentences according to different criteria, such as validity, importance, style or redundancy; (iii) learning to adequately aggregate local rankings in order to produce a final ranking that take into account the preferences determined by different criteria. This problem can be seen as constraint-based ranking.

This guiding theme will benefit from various information identification methods as inputs such as those to be developed in A1, A3, B2 and C1. A2 can additionally provide information about relations between events, such as causal or temporal relations, that can be interpreted as incidentals signals. Moreover, the results of this guiding theme can be used as support for C3, D1 and D2.

People

  • PI: Prof. Dr. Johannes Fürnkranz
  • Co-Supervisor: Dr. Eneldo Loza Mencía
  • First Cohort PhD student: Markus Zopf
  • Second Cohort PhD student: Aissatou Diallo

References

  • Fürnkranz, J. und Hüllermeier, E., editors (2011). Preference Learning. Springer-Verlag.
  • Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., und Brinker, K. (2008). Multilabel Classification via Calibrated Label Ranking. Machine Learning, 73(2):133–153.
  • Hüllermeier, E. und Fürnkranz, J. (2010). On Predictive Accuracy and Risk Minimization in Pairwise Label Ranking. Journal of Computer and System Sciences, 76(1):49–62.
  • Hüllermeier, E., Fürnkranz, J., Cheng, W., und Brinker K. (2008). Label Ranking by Learning Pairwise Preferences. Artificial Intelligence, 172(16-17):1897–1916.

Publications

Zopf, Markus ; Botschen, Teresa ; Falke, Tobias ; Heinzerling, Benjamin ; Marasovic, Ana ; Mihaylov, Todor ; P. V. S., Avinesh ; Loza Mencía, Eneldo ; Fürnkranz, Johannes ; Frank, Anette (2018):
What's Important in a Text? An Extensive Evaluation of Linguistic Annotations for Summarization.
[Online-Edition: https://ieeexplore.ieee.org/document/8554853],
[Konferenzveröffentlichung]

Zopf, Markus (2018):
Estimating Summary Quality with Pairwise Preferences.
In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Miyazaki, Japan, [Online-Edition: http://aclweb.org/anthology/N18-1152],
[Konferenzveröffentlichung]

Zopf, Markus ; Loza Mencía, Eneldo ; Fürnkranz, Johannes (2018):
Which Scores to Predict in Sentence Regression for Text Summarization?
In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, [Online-Edition: http://aclweb.org/anthology/N18-1161],
[Konferenzveröffentlichung]

Zopf, Markus (2018):
auto-hMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus.
In: Proceedings of the 11th edition of the Language Resources and Evaluation Conference (LREC 2018), [Online-Edition: http://www.lrec-conf.org/proceedings/lrec2018/pdf/1018.pdf],
[Konferenzveröffentlichung]

Zopf, Markus ; Loza Mencía, Eneldo ; Fürnkranz, Johannes (2016):
Sequential Clustering and Contextual Importance Measures for Incremental Update Summarization.
In: Proceedings of the 26th International Conference on Computational Linguistics, Osaka, Japan, [Online-Edition: http://www.aclweb.org/anthology/C16-1102],
[Konferenzveröffentlichung]

Zopf, Markus ; Peyrard, Maxime ; Eckle-Kohler, Judith (2016):
The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach.
In: Proceedings of the 26th International Conference on Computational Linguistics, The COLING 2016 Organizing Committee, Osaka, Japan, [Online-Edition: http://aclweb.org/anthology/C16-1145],
[Konferenzveröffentlichung]

Zopf, Markus ; Loza Mencía, Eneldo ; Fürnkranz, Johannes (2016):
Beyond Centrality and Structural Features: Learning Information Importance for Text Summarization.
In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL 2016), Association for Computational Linguistics, Berlin, Germany, [Online-Edition: http://www.aclweb.org/anthology/K16-1009],
[Konferenzveröffentlichung]

Zopf, Markus (2015):
SeqCluSum: Combining Sequential Clustering and Contextual Importance Measuring to Summarize Developing Events over Time.
In: Proceedings of the 24th Text Retrieval Conference, National Institute of Standards and Technology, Gaithersburg, Maryland, USA, [Online-Edition: https://trec.nist.gov/pubs/trec24/papers/AIPHES-TS.pdf],
[Konferenzveröffentlichung]

go to TU-biblio search on ULB website