MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Refining Zero-Shot Text-to-SQL Benchmarks via Prompt Strategies with Large Language Models

Author(s)
Zhou, Ruikang; Zhang, Fan
Thumbnail
Downloadapplsci-15-05306.pdf (7.771Mb)
Publisher with Creative Commons License

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/
Metadata
Show full item record
Abstract
Text-to-SQL leverages large language models (LLMs) for natural language database queries, yet existing benchmarks like BIRD (12,751 question–SQL pairs, 95 databases) suffer from inconsistencies—e.g., 30% of queries misalign with SQL outputs—and ambiguities that impair LLM evaluation. This study refines such datasets by distilling logically sound question–SQL pairs and enhancing table schemas, yielding a benchmark of 146 high-complexity tasks across 11 domains. We assess GPT-4o, GPT-4o-Mini, Qwen-2.5-Instruct, llama 370b, DPSK-v3 and O1-Preview in zero-shot scenarios, achieving average accuracies of 51.23%, 41.65%, 44.25%, 47.80%, and 49.10% and a peak of 78.08% (O1-Preview), respectively. Prompt-based strategies improve performance by up to 4.78%, addressing issues like poor domain adaptability and inconsistent training data interpretation. Error-annotated datasets further reveal LLM limitations. This refined benchmark ensures robust evaluation of logical reasoning, supporting reliable NLP-driven database systems.
Date issued
2025-05-09
URI
https://hdl.handle.net/1721.1/159384
Department
MIT Kavli Institute for Astrophysics and Space Research
Journal
Applied Sciences
Publisher
Multidisciplinary Digital Publishing Institute
Citation
Zhou, R.; Zhang, F. Refining Zero-Shot Text-to-SQL Benchmarks via Prompt Strategies with Large Language Models. Appl. Sci. 2025, 15, 5306.
Version: Final published version

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.