Show simple item record

dc.contributor.authorZhou, Ruikang
dc.contributor.authorZhang, Fan
dc.date.accessioned2025-06-10T17:28:11Z
dc.date.available2025-06-10T17:28:11Z
dc.date.issued2025-05-09
dc.identifier.urihttps://hdl.handle.net/1721.1/159384
dc.description.abstractText-to-SQL leverages large language models (LLMs) for natural language database queries, yet existing benchmarks like BIRD (12,751 question–SQL pairs, 95 databases) suffer from inconsistencies—e.g., 30% of queries misalign with SQL outputs—and ambiguities that impair LLM evaluation. This study refines such datasets by distilling logically sound question–SQL pairs and enhancing table schemas, yielding a benchmark of 146 high-complexity tasks across 11 domains. We assess GPT-4o, GPT-4o-Mini, Qwen-2.5-Instruct, llama 370b, DPSK-v3 and O1-Preview in zero-shot scenarios, achieving average accuracies of 51.23%, 41.65%, 44.25%, 47.80%, and 49.10% and a peak of 78.08% (O1-Preview), respectively. Prompt-based strategies improve performance by up to 4.78%, addressing issues like poor domain adaptability and inconsistent training data interpretation. Error-annotated datasets further reveal LLM limitations. This refined benchmark ensures robust evaluation of logical reasoning, supporting reliable NLP-driven database systems.en_US
dc.publisherMultidisciplinary Digital Publishing Instituteen_US
dc.relation.isversionofhttp://dx.doi.org/10.3390/app15105306en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceMultidisciplinary Digital Publishing Instituteen_US
dc.titleRefining Zero-Shot Text-to-SQL Benchmarks via Prompt Strategies with Large Language Modelsen_US
dc.typeArticleen_US
dc.identifier.citationZhou, R.; Zhang, F. Refining Zero-Shot Text-to-SQL Benchmarks via Prompt Strategies with Large Language Models. Appl. Sci. 2025, 15, 5306.en_US
dc.contributor.departmentMIT Kavli Institute for Astrophysics and Space Researchen_US
dc.relation.journalApplied Sciencesen_US
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2025-05-27T12:54:14Z
dspace.date.submission2025-05-27T12:54:14Z
mit.journal.volume15en_US
mit.journal.issue10en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record