dc.contributor.author | Zhou, Ruikang | |
dc.contributor.author | Zhang, Fan | |
dc.date.accessioned | 2025-06-10T17:28:11Z | |
dc.date.available | 2025-06-10T17:28:11Z | |
dc.date.issued | 2025-05-09 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/159384 | |
dc.description.abstract | Text-to-SQL leverages large language models (LLMs) for natural language database queries, yet existing benchmarks like BIRD (12,751 question–SQL pairs, 95 databases) suffer from inconsistencies—e.g., 30% of queries misalign with SQL outputs—and ambiguities that impair LLM evaluation. This study refines such datasets by distilling logically sound question–SQL pairs and enhancing table schemas, yielding a benchmark of 146 high-complexity tasks across 11 domains. We assess GPT-4o, GPT-4o-Mini, Qwen-2.5-Instruct, llama 370b, DPSK-v3 and O1-Preview in zero-shot scenarios, achieving average accuracies of 51.23%, 41.65%, 44.25%, 47.80%, and 49.10% and a peak of 78.08% (O1-Preview), respectively. Prompt-based strategies improve performance by up to 4.78%, addressing issues like poor domain adaptability and inconsistent training data interpretation. Error-annotated datasets further reveal LLM limitations. This refined benchmark ensures robust evaluation of logical reasoning, supporting reliable NLP-driven database systems. | en_US |
dc.publisher | Multidisciplinary Digital Publishing Institute | en_US |
dc.relation.isversionof | http://dx.doi.org/10.3390/app15105306 | en_US |
dc.rights | Creative Commons Attribution | en_US |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_US |
dc.source | Multidisciplinary Digital Publishing Institute | en_US |
dc.title | Refining Zero-Shot Text-to-SQL Benchmarks via Prompt Strategies with Large Language Models | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Zhou, R.; Zhang, F. Refining Zero-Shot Text-to-SQL Benchmarks via Prompt Strategies with Large Language Models. Appl. Sci. 2025, 15, 5306. | en_US |
dc.contributor.department | MIT Kavli Institute for Astrophysics and Space Research | en_US |
dc.relation.journal | Applied Sciences | en_US |
dc.identifier.mitlicense | PUBLISHER_CC | |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dc.date.updated | 2025-05-27T12:54:14Z | |
dspace.date.submission | 2025-05-27T12:54:14Z | |
mit.journal.volume | 15 | en_US |
mit.journal.issue | 10 | en_US |
mit.license | PUBLISHER_CC | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |