ATHENE REVISE: Reliable and Verifiable Information through Secure Media
(Funding Period: 2025 - 2028)

SafeLLMs: Safeguarding LLMs against Misleading Evidence Attacks

Motivation

Retrieval-augmented systems, where an LLM is provided with external evidence to answer questions, have come a long way in boosting factual accuracy. But they also open the door to a new threat: “data-void” attacks. In these attacks, bad actors find questions with little reliable coverage, inject misleading (yet technically true) text or charts, and steer models and even human readers toward the wrong conclusion. Since the planted evidence isn’t outright false, it tricks both the LLM’s reasoning and our checks, making it hard to spot. We need to understand how these attacks work and build defenses that keep our AI honest and factually reliable.

This work is part of ATHENE’s REVISE research area, which develops reverse-content-search techniques and other verification methods to reliably spot manipulated or repurposed media, so we can ensure that only trustworthy evidence makes it into AI responses.

Example of a threat scenario where a data void is exploited in RAG.
Example of a threat scenario where a data void is exploited in RAG.

Goals

  • Robust Factual Retrieval: Develop retrieval methods across tables, text, and documents that prioritize evidence quality and direct support for factual answers.
  • Spot Misleading Evidence: Build simple multimodal checks to flag distorted charts, e.g., truncated axes, and misleading text, helping users avoid poor decisions or draw the wrong conclusions.
  • Enrich Context for Verifiability: Automatically attach source metadata and reliability cues to every piece of retrieved evidence, so models and humans can judge its trustworthiness before acting.

Methods

  • Textual Checks: We extend fallacy-detection tools to spot information in text or tables that could lead to wrong conclusions. We’ll train lightweight classifiers and develop novel LLM prompts to transparently detect and clarify such information.
  • Visual Corrections: We turn visual charts into different representations such as code, to detect common misleaders such as missing axes ranges or accumulated values, and automatically generate clear, corrected plots with explanations.
  • Complementary Evidence Gathering: We develop novel retrieval methods that combine textual and tabular data to collect complementary evidence for verifying complex claims and contextualizing potentially misleading information. This reduces the risk of cherry-picked or misinterpreted data and provides models and users with more reliable context.

Team

  • Prof. Dr. Iryna Gurevych, Principal Investigator
  • German Ortiz, Doctoral Researcher
  • Hassan Soliman, Doctoral Researcher
  • Jonathan Tonglet, Doctoral Researcher
  • Justus-Jonas Erker, Doctoral Researcher
  • Leon Engländer, Masters Student
  • Manisha Venkat, Intern
  • Max Glockner, Doctoral Researcher
  • Shivam Sharma (Junior), Doctoral Researcher
  • Shivam Sharma, Postdoctoral Researcher

Funding

This research work was funded from 2025 – 2028 by the German Federal Ministry of Research, Technology and Space and the Hessian Ministry of Higher Education, Research, Science and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE.

Loading...
Loading data from TUbiblio…

Error on loading data

An error has occured when loading publications data from TUbiblio. Please try again later.

  • {{ year }}

    • ({{ publication.date.toString().substring(0,4) }}):
      {{ publication.title }}.
      In: {{ publication.series }}, {{ publication.volume }}, In: {{ publication.book_title }}, In: {{ publication.publication }}, {{ publication.journal_volume}} ({{ publication.number }}), ppp. {{ publication.pagerange }}, {{ publication.place_of_pub }}, {{ publication.publisher }}, {{ publication.institution }}, {{ publication.event_location }}, {{ publication.event_dates }}, ISSN {{ publication.issn }}, e-ISSN {{ publication.eissn }}, ISBN {{ publication.isbn }}, {{ labels[publication.type]?labels[publication.type]:publication.type }}
    • […]

Number of items in this list: {{ publicationsList.length }}
Only the {{publicationsList.length}} latest publications are displayed here.

View complete list at TUbiblio View this list at TUbiblio