Document Structure in LLMs

Bachelor Thesis

The goal is to investigate the representation of document structure of LLMs. There will be two kinds of experiments. 1. Checking how well LLMs represent document structure by asking them to retrieve specific sections of documents or identifying the relations between certain segments. 2. Checking whether adding structural markers to the document input (such as markdown) improves performance on long document downstream tasks.