UKP-SQuARE: Software for Question Answering Research

(Funding Period: 2020 - 2023)


Researchers in NLP have devoted significant resources to the creation of more powerful machine learning models for question answering (QA), and the collection of high-quality QA datasets. Combined with the recent breakthroughs by large pretrained language models, we have witnessed a rapid progress in the field, also across many different kinds of QA. This has led to situations where state-of-the-art models become obsolete only after a few months when they have been developed.

Therefore, it is essential for researchers to explore, compare, and combine these models as quickly as possible to identify the strengths and weaknesses of the current state of the art. When QA systems are accompanied by tools to analyse intermediate results of models and pipelines, they can also provide important insights to understand the ‘black box’ of modern neural networks.

Even though there exists a large variety of QA systems and frameworks, present approaches cannot be easily extended with additional kinds of QA that use different types of data sources, models, pipelines, and answer types. For instance, there exists no framework that allows researchers to easily integrate a skill to perform cQA and a skill to perform yes/no QA for claim validation within the same platform, which can then easily inter-operate—e.g., to validate claims that are being made in the user-generated answers, found in cQA. This considerably limits their applicability and re-use across the diverse, rapidly progressing area of QA research, making it infeasible for researchers to quickly integrate novel models and QA pipelines.


In this project, we aim to create a flexible, scalable and interpretable QA platform to enable researchers to:

  • Share their custom QA agents by integrating them to our platform using easy-to-use common interfaces,
  • Study the strengths and weaknesses of existing models by comparing them on a wide range of tasks and datasets that are already provided within our framework,
  • Explore the existing models and datasets to answer more specific research questions using integrated interpretability tools.

For public users, we also aim to develop a QA demonstrator that can combine answers of different QA agents to provide better answers.


  • Prof. Dr. Iryna Gurevych, Principal Investigator
  • Dr. Gözde Gül Şahin, Postdoctoral Researcher
  • Tim Baumgärtner, MSc, Doctoral Researcher
  • Rachneet Sadeva, MSc, Doctoral Researcher – to join in September
  • Kexin Wang, MSc, Doctoral Researcher
  • Haritz Puerto, MSc, Researcher
  • Nandan Thakur, MSc, Software Engineer
  • Gregor Geigle, Student Assistant
  • Clifton Alexander Poth, Student Assistant
  • Hannah Sterz, Student Assistant

Associated Researchers

  • Jonas Pfeiffer, MSc, Doctoral Researcher
  • Dr. Nils Reimers, HuggingFace
  • Leonardo Ribeiro, MSc, Doctoral Researcher


This project is funded by the Deutsche Forschungsgemeinschaft (German Research Foundation).


Wang, Kexin ; Reimers, Nils ; Gurevych, Iryna (2021):
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning.
In: Findings of the Association for Computational Linguistics: EMNLP 2021,
Association for Computational Linguistics, The 2021 Conference on Empirical Methods in Natural Language Processing, Online and in the Barceló Bávaro Convention Centre, Punta Cana, Dominican Republic, 07.11.2021 - 11.11.2021, [Conference or Workshop Item]

Geigle, Gregor ; Reimers, Nils ; Rücklé, Andreas ; Gurevych, Iryna (2021):
TWEAC: Transformer with Extendable QA Agent Classifiers.
In: arXiv-Computer Science, In: Computation and Language, (Preprint), arXiv, [Article]