Information Consolidation
(Funding Period: 2014 - 2021)

A New Paradigm in Knowledge Search (DIP project)

Motivation

Although existing search engines are effective in identifying relevant documents among bilions on non-relevant ones, they remain weak at isolating the facts of users' interest within these documents, let alone organizing and presenting this knowledge intuitively and concisely. Searchers have to laborously skim through all retrieved documents and collect the statements that are relevant to their information needs.

For example, a public decision maker in the domain of education wants to learn positive and negative experiences with a particular policy across countries, its impact on various populations, etc. So far, such information must have been consolidated by field experts, which is costly and time-consuming.

Goals

This project targets the big next step in information access technology by

  • Automatically identifying relevant statements
  • Consolidating the information and inferring relations between the statements
  • Enabling users to explore the consolidated information

Methods

The progress of the project will be led by an iterative methodology that encompasses the following:

  • Corpus – large data set of partially annotated data in the domain of educational topics acquired using focused crawling and web-based annotation tools
  • Linguistic annotation on various levels (syntax, semanantic roles, word senses, named entities, co-reference resolution, truth values, and other domain-specific ones) using state-of-the-art automatic annotation methods
  • Extracting atomic statements – by adapting and extending open information extraction techniques
  • Reflecting relationships between statements – by applying textual entailment and semantic similarity methods
  • Knowledge exploration – effective and efficient user interfaces for interactively displaying statements relevant to user queries

Results

In the project, the following corpora were created:

Team

  • Prof. Dr. Iryna Gurevych, Principal Investigator
  • Dr. Ing. Nils Reimers
  • M.Sc. Michael Bugert, Doctoral Researcher
  • M.Sc. Yevgeniy Puzikov, Doctoral Researcher
  • M.Sc. Max Glockner, Doctoral Researcher

Former staff:

  • Dr. Eugenio Martínez Cámara, Postdoctoral Researcher
  • Dr. Judith Eckle-Kohler, Senior Researcher
  • Dr. Ivan Habernal
  • M.Sc. Maria Sukhareva

Partners

Department of Compute Science, Bar-Ilan University, Israel

Department of Information Science, Bar-Ilan University, Israel

Faculty of Industrial Engineering and Management, Technion – Israel Institute of Technology

Teaching

Teaching activities of the current team of the project:

Student theses

Master theses:

  • Can Diehl. Automatic Aggregation of Argument Components. 2016 Supervised by: Dr. Christian Stab and Prof. Iryna Gurevych

Bachelor theses:

  • Michelle Peters. Broad-coverage distantly supervised verb sense disambiguation. 2016 Supervised by: Dr. Judith Eckle-Kohler and Prof. Iryna Gurevych

Funding

This project is funded by:

  • Funder: Deutsche Forschungsgemeinschaft (German Research Foundation)
  • Programme: DIP Programme; 17. Round of the German-Israeli project co-operation
  • Grant code: GU 798/17-1 and DA 1600/1-1
  • More information: funder web page of the project

Publications

Bugert, Michael ; Gurevych, Iryna (2021):
Event Coreference Data (Almost) for Free: Mining Hyperlinks from Online News.
In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 471-491,
ACL, 2021 Conference on Empirical Methods in Natural Language Processing, virtual Conference and Punta Cana, Dominican Republic, 07.-11.11.2021, ISBN 978-1-955917-09-4,
[Conference or Workshop Item]

Bugert, Michael ; Reimers, Nils ; Gurevych, Iryna (2021):
Generalizing Cross-Document Event Coreference Resolution Across Multiple Corpora.
In: Computational Linguistics, MIT Press, ISSN 0891-2017,
DOI: 10.1162/coli_a_00407,
[Article]

Reimers, Nils ; Gurevych, Iryna (2021):
The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes.
In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 605-611,
Association for Computational Linguistics, 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), virtual Conference, 01.-06.08.2021, [Conference or Workshop Item]

Geigle, Gregor ; Reimers, Nils ; Rücklé, Andreas ; Gurevych, Iryna (2021):
TWEAC: Transformer with Extendable QA Agent Classifiers.
In: arXiv-Computer Science, In: Computation and Language, (Preprint), arXiv, [Article]

Thakur, Nandan ; Reimers, Nils ; Daxenberger, Johannes ; Gurevych, Iryna (2021):
Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks.
In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 296-310,
ACL, 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics, virtual Conference, 06.-11.06.2021, ISBN 978-1-954085-46-6,
[Conference or Workshop Item]

Mesgar, Mohsen ; Simpson, Edwin ; Gurevych, Iryna (2021):
Improving Factual Consistency Between a Response and Persona Facts.
In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 549-562,
16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021), virtual conference, 21.-23.04.2021, [Conference or Workshop Item]

Reimers, Nils ; Gurevych, Iryna (2020):
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation.
pp. 4512-4525, Association for Computational Linguistics, The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), virtual Conference, 16.-20.11., [Conference or Workshop Item]

Mesgar, Mohsen ; Bücker, Sebastian ; Gurevych, Iryna (2020):
Dialogue Coherence Assessment Without Explicit Dialogue Act Labels.
pp. 1439-1450, The 58th annual meeting of the Association for Computational Linguistics (ACL 2020), virtual Conference, 05.-10.07.2020, [Conference or Workshop Item]

Şahin, Gözde Gül ; Kementchedjhieva, Yova ; Rust, Phillip ; Gurevych, Iryna (2020):
PuzzLing Machines: A Challenge on Learning From Small Data.
pp. 1241-1254, The 58th annual meeting of the Association for Computational Linguistics (ACL 2020), virtual Conference, 05.-10.07.2020, [Conference or Workshop Item]

Bugert, Michael ; Reimers, Nils ; Barhom, Shany ; Dagan, Ido ; Gurevych, Iryna (2020):
Breaking the Subtopic Barrier in Cross-Document Event Coreference Resolution.
Text2Story@ECIR'20 - 3rd International Workshop on Narrative Extraction from Texts, Lisbon, Portugal, 14.04.2020, [Conference or Workshop Item]

Simpson, Edwin ; Gurevych, Iryna (2020):
Scalable Bayesian Preference Learning for Crowds.
In: Machine Learning, 109, pp. 689-718. Springer, [Article]

Puzikov, Yevgeniy ; Gardent, Claire ; Dagan, Ido ; Gurevych, Iryna (2019):
Revisiting the Binary Linearization Technique for Surface Realization.
pp. 268-278, The 12th International Conference on Natural Language Generation (INLG 2019), Tokyo, Japan, 29.10.2019-01.11.2019, [Conference or Workshop Item]

Böhm, Florian ; Gao, Yang ; Meyer, Christian M. ; Shapira, Ori ; Dagan, Ido ; Gurevych, Iryna (2019):
Better Rewards Yield Better Summaries: Learning to Summarise Without References.
pp. 3101-3111, The 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019), Hong Kong, China, 03.11.201-07.11.2019, [Conference or Workshop Item]

Reimers, Nils ; Gurevych, Iryna (2019):
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.
pp. 3973-3983, The 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019), Hong Kong, China, 03.12.2019-07.12.2019, [Conference or Workshop Item]

Barhom, Shany ; Shwartz, Vered ; Eirew, Alon ; Bugert, Michael ; Reimers, Nils ; Dagan, Ido (2019):
Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution.
pp. 4179-4189, The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, 28.07.2019-02.08.2019, [Conference or Workshop Item]

Puzikov, Yevgeniy ; Gurevych, Iryna (2018):
E2E NLG Challenge: Neural Models vs. Templates.
In: Proceedings of the 11th International Conference on Natural Language Generation (INLG 2018), pp. 463-471,
The 11th International Conference on Natural Language Generation (INLG 2018), Tilburg, Netherlands, 05.11.2018--08.11.2018, [Conference or Workshop Item]

Puzikov, Yevgeniy ; Gurevych, Iryna (2018):
BinLin: A Simple Method of Dependency Tree Linearization.
In: Proceedings of the Multilingual Surface Realization Workshop 2018 (ACL 2018), pp. 13-28,
Melbourne, Australia, Surface Realization Shared Task 2018, Melbourne, Australia, 15.07.2018--20.07.2018, [Conference or Workshop Item]

Martínez Cámara, Eugenio ; Shwartz, Vered ; Gurevych, Iryna ; Dagan, Ido (2017):
Neural Disambiguation of Causal Lexical Markers Based on Context.
Volume 2: Short papers, In: Proceedings of the 12th International Conference on Computational Semantics (IWCS 2017),
Association for Computational Linguistics, Montpellier, France, [Conference or Workshop Item]

Stanovsky, Gabriel ; Eckle-Kohler, Judith ; Puzikov, Yevgeniy ; Dagan, Ido ; Gurevych, Iryna (2017):
Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets.
Volume 2: Short Papers, In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), pp. 352-357,
Association for Computational Linguistics, The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, 30.07.2017--04.08.2017, [Conference or Workshop Item]

Bugert, Michael ; Puzikov, Yevgeniy ; Rücklé, Andreas ; Eckle-Kohler, Judith ; Martin, Teresa ; Martínez Cámara, Eugenio ; Sorokin, Daniil ; Peyrard, Maxime ; Gurevych, Iryna (2017):
LSDSem 2017: Exploring Data Generation Methods for the Story Cloze Test.
In: Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, pp. 56-61,
Association for Computational Linguistics, The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, Valencia, Spain, 03.04.2017--04.04.2017, ISBN 978-1-945626-40-1,
[Conference or Workshop Item]

Wities, Rachel ; Shwartz, Vered ; Stanovsky, Gabriel ; Adler, Meni ; Shapira, Ori ; Upadhyay, Shyam ; Roth, Dan ; Martínez Cámara, Eugenio ; Gurevych, Iryna ; Dagan, Ido (2017):
A Consolidated Open Knowledge Representation for Multiple Texts.
In: Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, pp. 12-24,
Association for Computational Linguistics, Valencia, ISBN 978-1-945626-40-1,
[Conference or Workshop Item]

Levy, Omer ; Dagan, Ido ; Stanovsky, Gabriel ; Eckle-Kohler, Judith ; Gurevych, Iryna (2016):
Modeling Extractive Sentence Intersection via Subtree Entailment.
In: Proceedings of the 26th International Conference on Computational Linguistics (COLING), pp. 2891-2901,
Osaka, Japan, [Conference or Workshop Item]

Falke, Tobias ; Stanovsky, Gabriel ; Gurevych, Iryna ; Dagan, Ido (2016):
Porting an Open Information Extraction System from English to German.
In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 892-898,
Association for Computational Linguistics, Austin, TX, USA, [Conference or Workshop Item]

Eckle-Kohler, Judith (2016):
Verbs Taking Clausal and Non-Finite Arguments as Signals of Modality – Revisiting the Issue of Meaning Grounded in Syntax.
Volume 1: Long Papers, In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), pp. 811-822,
Association for Computational Linguistics, Berlin, Germany, [Conference or Workshop Item]

Habernal, Ivan ; Sukhareva, Maria ; Raiber, Fiana ; Shtok, Anna ; Kurland, Oren ; Ronen, Hadar ; Bar-Ilan, Judit ; Gurevych, Iryna (2016):
New Collection Announcement: Focused Retrieval Over the Web.
In: SIGIR '16, In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 701-704,
ACM, Pisa, Italy, ISBN 978-1-4503-4069-4/16/07,
DOI: 10.1145/2911451.2914682,
[Conference or Workshop Item]

Sukhareva, Maria ; Eckle-Kohler, Judith ; Habernal, Ivan ; Gurevych, Iryna (2016):
Crowdsourcing a Large Dataset of Domain-Specific Context-Sensitive Semantic Verb Relations.
In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 2131-2137,
European Language Resources Association (ELRA), Portoroz, Slovenia, [Conference or Workshop Item]