Carsten Binnig gave an invited talk at TLR@NeurIPS 2022
Prof Binnig talked about Pre-trained Models for Learned DBMS Components at the first “Table Representation Learning workshop” hosted by NeurIPS 2022.
Database management systems (DBMSs) are the backbone for managing large volumes of data efficiently and thus play a central role in business and science today. For providing high performance, many of the most complex DBMS components such as query optimizers or schedulers involve solving non-trivial problems such as query cost estimation. To tackle such problems, very recent work has outlined a new direction of so-called learned DBMS components where core parts of DBMSs are being replaced by machine learning (ML) models. While this line of work has shown to provide significant performance benefits for DBMS, a major drawback of the current so-called workload-driven learning approaches to enable learned DBMS components is that they cause a very high and repeated overhead for training data collection. Hence, in this talk, I will discuss a new direction of so-called zero-shot DBMS models which are pre-trained models that avoid the repeated training data collection overhead. As a concrete first step, we have realized a zero-shot cost model that can predict query execution cost which is a core DBMS task on an unseen database (i.e., a new set of tables with data) out of the box. Furthermore, I will also discuss other more recent results on how the general idea of zero-shot DBMS models can also be applied to other DBMS components as well or how it can even be applied even beyond DBMSs for other data systems.