This seminar is about how AI can be used for data management. This year, the seminar focuses on two topics: learned DBMS components and AI for data engineering tasks. The course starts with a mini lecture series to provide the necessary background for the two practical tasks that follow.
- Task 1 (Learned Query Optimization): You will develop a learned query optimizer that is selecting the best plan from a set of candidates. For this we will develop a learned cost model for DBMS that predicts the execution costs of a given query. This cost model will be used to select the best plan for a given query. Training data and stencil code with the necessary framework for building the learned query optimizer will be provided.
- Task 2 (LLMs for Data Engineering): You will explore how LLMs can solve classical data engineering tasks. Each student reproduces one approach from existing literature and extends it with their own ideas.
Organization
Last offered | Winter Semester (25/26) |
Lecturer | Prof. Carsten Binnig |
Assistants | Johannes Wehrstein, Jan-Micha Bodensohn |
Contact | aidm(at)lists.systems.informatik.tu-darmstadt.de |
Examination | See Moodle |
Kick-Off | October 14th 2025, 9:50-11:30 AM (S103/113) |
Course Infos
Below, you find some general information about the seminar. For all information regarding this year’s seminar (including important dates), please check the Moodle course linked above. Also make sure that you are registered in TUCaN.
Prerequisites:
You should have basic knowledge in machine learning and programming in Python. Advanced knowledge in data management and database systems from courses such as SDMS or ADMS as well as machine learning courses is also helpful.
Seminar Topic:
Database management systems (DBMS) in the cloud are the backbone for managing large volumes of data efficiently and thus play a central role in business and science today. For providing high performance, many of the most complex DBMS components such as query optimizers or schedulers involve solving non-trivial problems.
To tackle such problems, very recent work has outlined a new direction of so-called learned DBMS components where AI-based methods are used to replace and enhance core DBMS components, which has been shown to provide significant performance benefits. This route is particularly interesting since Cloud vendors such as Google, Amazon, and Microsoft are already applying these techniques to optimize the performance of their cloud data systems.
Besides learned DBMS components, AI has been used to improve many other data management-related tasks. For example, classical data engineering tasks like error detection, missing value imputation, and data augmentation typically cause high manual overheads and can be automated with AI. Finally, AI has also been used to extend databases through better data access interfaces (e.g., natural language querying and chatbots for data) or by supporting data beyond structured tabular data (i.e., text and images).
This seminar is designed to introduce students to the foundational concepts of using AI for data management. The course will include a mini lecture series that provides the necessary background on AI in data management, preparing students for the seminar tasks. The seminar is divided into two parts, each focusing on key themes as introduced above: learned DBMS components and the application of AI for data engineering. Students will engage in practical tasks related to these topics.