Natural language has long been a promising alternative query interface to databases that enable non-expert users to formulate complex questions in a more concise manner. Recently, deep learning techniques have gained traction as a way to translate natural language to SQL since similar ideas have been successful in related domains (e.g., English to Spanish). However, the core problem with existing deep learning approaches is that they require an enormous amount of training examples in order provide accurate translations. Such training data is extremely expensive to curate, since it generally requires humans to manually annotate natural language with SQL queries.
Based on these observations, we propose DBPal, a new approach that augments existing deep learning techniques in order to improve the performance of natural language to SQL translation. More specifically, we present a novel training pipeline that automatically generates synthetic training data in order to improve translation accuracy and create a model that is tailor made to the target database. As we show, our training pipeline applied to existing deep learning techniques is able to improve the accuracy of state-of-the-art natural language to SQL translation tasks.