Natural language has long been a promising alternative query interface to databases that enable non-expert users to formulate complex questions in a more concise manner. Recently, deep learning techniques have gained traction as a way to translate natural language to SQL since similar ideas have been successful in related domains (e.g., English to Spanish). However, the core problem with existing deep learning approaches is that they require an enormous amount of training examples in order provide accurate translations. Such training data is extremely expensive to curate, since it generally requires humans to manually annotate natural language with SQL queries.

Based on these observations, we propose DBPal, a new approach that augments existing deep learning techniques in order to improve the performance of natural language to SQL translation. More specifically, we present a novel training pipeline that automatically generates synthetic training data in order to improve translation accuracy and create a model that is tailor made to the target database. As we show, our training pipeline applied to existing deep learning techniques is able to improve the accuracy of state-of-the-art natural language to SQL translation tasks.

Plugin required: in order to see this object, your browser has to support files of type text/html. Download


Name Office Phone E-mail
S2|02 D110-25603

Doctoral Researcher
S2|02 D110-25603
Foto Benjamin Hättasch


Weir, Nathaniel ; Crotty, Andrew ; Galakatos, Alex ; Ilkhechi, Amir ; Ramaswamy, Shekar ; Bhushan, Rohin ; Cetintemel, Ugur ; Utama, Prasetya ; Geisler, Nadja ; Hättasch, Benjamin ; Eger, Steffen ; Binnig, Carsten (2019):
DBPal: Weak Supervision for Learning a Natural Language Interface to Databases.
Los Angeles, California, USA, 1st International Workshop on Conversational Access to Data (CAST) in conj. with the 45th International Conference on Very Large Data Bases (VLDB), Los Angeles, California, USA, [Konferenzveröffentlichung]

Weir, Nathaniel ; Utama, Prasetya ; Galakatos, Alex ; Crotty, Andrew ; Ilkhechi, Amir ; Ramaswamy, Shekar ; Bhushan, Rohin ; Geisler, Nadja ; Hättasch, Benjamin ; Eger, Steffen ; Cetintemel, Ugur ; Binnig, Carsten
Maier, David ; Pottinger, Rachel ; Doan, AnHai ; Tan, Wang-Chiew ; Alawini, Abdussalam ; Ngo, Hung Q. (Hrsg.) (2020):
DBPal: A Fully Pluggable NL2SQL Training Pipeline.
In: SIGMOD'20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, S. 2347-2361,
ACM, SIGMOD/PODS '20: International Conference on Management of Data, virtual Conference, 14.-19.06.2020, ISBN 978-1-4503-6735-6,
DOI: 10.1145/3318464.3380589,

Basik, Fuat ; Hättasch, Benjamin ; Ilkhechi, Amir ; Usta, Arif ; Ramaswamy, Shekar ; Utama, Prasetya ; Weir, Nathaniel ; Binnig, Carsten ; Cetintemel, Ugur (2018):
DBPal: A Learned NL-Interface for Databases.
In: SIGMOD '18, S. 1765-1768, New York, NY, USA, ACM, Proceedings of the 2018 International Conference on Management of Data, New York, NY, USA, ISBN 978-1-4503-4703-7,
DOI: 10.1145/3183713.3193562,

go to TU-biblio search on ULB website

go to TU-biblio search on ULB website