Efficient schema-less text-to-SQL conversion using large language models

Youssef Mellah1* Veysel Kocaman1 Hasham Ul Haq1 David Talby1
1 John Snow Labs, Coastal Highway, Lewes, Delaware, United States of America
AIH 2024, 1(2), 96–106;
Submitted: 6 January 2024 | Accepted: 23 February 2024 | Published: 4 April 2024
© 2024 by the Author (s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( )

Large language models (LLMs) are increasingly being applied to several tasks including text-to-SQL (the process of converting natural language to SQL queries). While most studies revolve around training LLMs on large SQL corpora for better generalization and then perform prompt engineering during inference, we investigate the notion of training LLMs for schema-less prompting. In particular, our approach uses simple natural language questions as input without any additional knowledge about the database schema. By doing so, we demonstrate that smaller models paired with simpler prompts result in considerable performance improvement while generating SQL queries. Our model, based on the Flan-T5 architecture, achieves logical form accuracy (LFA) of 0.85 on the MIMICSQL dataset, significantly outperforming current state-of-the-art models such as Defog-SQL-Coder, GPT-3.5-Turbo, LLaMA-2-7B and GPT-4. This approach reduces the model size, lessening the amount of data and infrastructure cost required for training and serving, and improves the performance to enable the generation of much complex SQL queries.

Large language models
Logical form accuracy
