INCEpTION

Information Consolidation: A New Paradigm in Knowledge Search (DIP project)

Motivation

Although existing search engines are effective in identifying relevant documents among bilions on non-relevant ones, they remain weak at isolating the facts of users' interest within these documents, let alone organizing and presenting this knowledge intuitively and concisely. Searchers have to laborously skim through all retrieved documents and collect the statements that are relevant to their information needs.

For example, a public decision maker in the domain of education wants to learn positive and negative experiences with a particular policy across countries, its impact on various populations, etc. So far, such information must have been consolidated by field experts, which is costly and time-consuming.

Goals

This project targets the big next step in information access technology by

  • Automatically identifying relevant statements
  • Consolidating the information and inferring relations between the statements
  • Enabling users to explore the consolidated information
Figure 1: An example of how atomic statements are created from their original documents
Figure 1: An example of how atomic statements are created from their original documents

Methods

The progress of the project will be led by an iterative methodology that encompasses the following:

  • Corpus – large data set of partially annotated data in the domain of educational topics acquired using focused crawling and web-based annotation tools
  • Linguistic annotation on various levels (syntax, semanantic roles, word senses, named entities, co-reference resolution, truth values, and other domain-specific ones) using state-of-the-art automatic annotation methods
  • Extracting atomic statements – by adapting and extending open information extraction techniques
  • Reflecting relationships between statements – by applying textual entailment and semantic similarity methods
  • Knowledge exploration – effective and efficient user interfaces for interactively displaying statements relevant to user queries
Figure 2.: Architecture overview
Figure 2.: Architecture overview

Results

In the project, the following corpora were created:

Team

  • Prof. Dr. Iryna Gurevych, Principal Investigator
  • Dr. Ing. Nils Reimers
  • M.Sc. Michael Bugert, Doctoral Researcher
  • M.Sc. Yevgeniy Puzikov, Doctoral Researcher
  • M.Sc. Max Glockner, Doctoral Researcher

Former staff:

  • Dr. Eugenio Martínez Cámara, Postdoctoral Researcher
  • Dr. Judith Eckle-Kohler, Senior Researcher
  • Dr. Ivan Habernal
  • M.Sc. Maria Sukhareva

Partners

Department of Compute Science, Bar-Ilan University, Israel

Department of Information Science, Bar-Ilan University, Israel

Faculty of Industrial Engineering and Management, Technion – Israel Institute of Technology

Teaching

Teaching activities of the current team of the project:

Student theses

Master theses:

  • Can Diehl. Automatic Aggregation of Argument Components. 2016 Supervised by: Dr. Christian Stab and Prof. Iryna Gurevych

Bachelor theses:

  • Michelle Peters. Broad-coverage distantly supervised verb sense disambiguation. 2016 Supervised by: Dr. Judith Eckle-Kohler and Prof. Iryna Gurevych

Funding

This project is funded by:

  • Funder: Deutsche Forschungsgemeinschaft (German Research Foundation)
  • Programme: DIP Programme; 17. Round of the German-Israeli project co-operation
  • Grant code: GU 798/17-1 and DA 1600/1-1
  • More information: funder web page of the project

Publications

Puzikov, Yevgeniy ; Gardent, Claire ; Dagan, Ido ; Gurevych, Iryna (2019):
Revisiting the Binary Linearization Technique for Surface Realization.
In: The 12th International Conference on Natural Language Generation (INLG 2019), Tokyo, Japan, 29.10.2019--01.11.2019, [Online-Edition: https://public.ukp.informatik.tu-darmstadt.de/UKP_Webpage/pu...],
[Konferenzveröffentlichung]

Böhm, Florian ; Gao, Yang ; Meyer, Christian M. ; Shapira, Ori ; Dagan, Ido ; Gurevych, Iryna (2019):
Better Rewards Yield Better Summaries: Learning to Summarise Without References.
In: The 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019), Hong Kong, China, 03.11.201-07.11.2019, S. 3101-3111, [Online-Edition: https://www.aclweb.org/anthology/D19-1307.pdf],
[Konferenzveröffentlichung]

Reimers, Nils ; Gurevych, Iryna (2019):
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.
In: The 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019), Hong Kong, China, 03.12.2019-07.12.2019, S. 3973-3983, [Online-Edition: https://www.aclweb.org/anthology/D19-1410.pdf],
[Konferenzveröffentlichung]

Barhom, Shany ; Shwartz, Vered ; Eirew, Alon ; Bugert, Michael ; Reimers, Nils ; Dagan, Ido (2019):
Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution.
In: The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, 28.07.2019-02.08.2019, S. 4179-4189, [Online-Edition: https://www.aclweb.org/anthology/P19-1409],
[Konferenzveröffentlichung]

Puzikov, Yevgeniy ; Gurevych, Iryna (2018):
E2E NLG Challenge: Neural Models vs. Templates.
In: Proceedings of the 11th International Conference on Natural Language Generation (INLG 2018), In: The 11th International Conference on Natural Language Generation (INLG 2018), Tilburg, Netherlands, 05.11.2018--08.11.2018, S. 463-471, [Online-Edition: http://aclweb.org/anthology/W18-6557],
[Konferenzveröffentlichung]

Puzikov, Yevgeniy ; Gurevych, Iryna (2018):
BinLin: A Simple Method of Dependency Tree Linearization.
In: Proceedings of the Multilingual Surface Realization Workshop 2018 (ACL 2018), Melbourne, Australia, In: Surface Realization Shared Task 2018, Melbourne, Australia, 15.07.2018--20.07.2018, S. 13-28, [Online-Edition: http://aclweb.org/anthology/W18-3602],
[Konferenzveröffentlichung]

Martínez Cámara, Eugenio ; Shwartz, Vered ; Gurevych, Iryna ; Dagan, Ido (2017):
Neural Disambiguation of Causal Lexical Markers Based on Context.
Volume 2: Short papers, In: Proceedings of the 12th International Conference on Computational Semantics (IWCS 2017), Association for Computational Linguistics, Montpellier, France, [Online-Edition: http://aclweb.org/anthology/W17-6927],
[Konferenzveröffentlichung]

Stanovsky, Gabriel ; Eckle-Kohler, Judith ; Puzikov, Yevgeniy ; Dagan, Ido ; Gurevych, Iryna (2017):
Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets.
Volume 2: Short Papers, In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Association for Computational Linguistics, In: The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, 30.07.2017--04.08.2017, S. 352-357, [Online-Edition: http://aclweb.org/anthology/P17-2056],
[Konferenzveröffentlichung]

Bugert, Michael ; Puzikov, Yevgeniy ; Rücklé, Andreas ; Eckle-Kohler, Judith ; Martin, Teresa ; Martínez Cámara, Eugenio ; Sorokin, Daniil ; Peyrard, Maxime ; Gurevych, Iryna (2017):
LSDSem 2017: Exploring Data Generation Methods for the Story Cloze Test.
In: Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, Association for Computational Linguistics, In: The 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, Valencia, Spain, 03.04.2017--04.04.2017, S. 56-61, ISBN 978-1-945626-40-1,
[Online-Edition: http://aclweb.org/anthology/W17-0908],
[Konferenzveröffentlichung]

Wities, Rachel ; Shwartz, Vered ; Stanovsky, Gabriel ; Adler, Meni ; Shapira, Ori ; Upadhyay, Shyam ; Roth, Dan ; Martínez Cámara, Eugenio ; Gurevych, Iryna ; Dagan, Ido (2017):
A Consolidated Open Knowledge Representation for Multiple Texts.
In: Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, Association for Computational Linguistics, Valencia, S. 12-24, ISBN 978-1-945626-40-1,
[Online-Edition: http://aclweb.org/anthology/W17-0902],
[Konferenzveröffentlichung]

Levy, Omer ; Dagan, Ido ; Stanovsky, Gabriel ; Eckle-Kohler, Judith ; Gurevych, Iryna (2016):
Modeling Extractive Sentence Intersection via Subtree Entailment.
In: Proceedings of the 26th International Conference on Computational Linguistics (COLING), Osaka, Japan, S. 2891-2901, [Online-Edition: http://aclweb.org/anthology/C16-1272],
[Konferenzveröffentlichung]

Falke, Tobias ; Stanovsky, Gabriel ; Gurevych, Iryna ; Dagan, Ido (2016):
Porting an Open Information Extraction System from English to German.
In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Austin, TX, USA, S. 892-898, [Online-Edition: http://aclweb.org/anthology/D16-1086],
[Konferenzveröffentlichung]

Eckle-Kohler, Judith (2016):
Verbs Taking Clausal and Non-Finite Arguments as Signals of Modality – Revisiting the Issue of Meaning Grounded in Syntax.
Volume 1: Long Papers, In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Association for Computational Linguistics, Berlin, Germany, S. 811-822, [Online-Edition: http://aclweb.org/anthology/P/P16/P16-1077.pdf],
[Konferenzveröffentlichung]

Habernal, Ivan ; Sukhareva, Maria ; Raiber, Fiana ; Shtok, Anna ; Kurland, Oren ; Ronen, Hadar ; Bar-Ilan, Judit ; Gurevych, Iryna (2016):
New Collection Announcement: Focused Retrieval Over the Web.
In: SIGIR '16, In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Pisa, Italy, S. 701-704, ISBN 978-1-4503-4069-4/16/07,
DOI: 10.1145/2911451.2914682,
[Online-Edition: https://dl.acm.org/citation.cfm?id=2914682&dl=ACM&coll=DL],
[Konferenzveröffentlichung]

Sukhareva, Maria ; Eckle-Kohler, Judith ; Habernal, Ivan ; Gurevych, Iryna (2016):
Crowdsourcing a Large Dataset of Domain-Specific Context-Sensitive Semantic Verb Relations.
In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA), Portoroz, Slovenia, S. 2131-2137, [Online-Edition: http://www.lrec-conf.org/proceedings/lrec2016/pdf/494_Paper....],
[Konferenzveröffentlichung]

go to TU-biblio search on ULB website