Practical And Realistic BenchmaRk for crOss-system SQL Translation
The first comprehensive benchmark for evaluating cross-system SQL translation systems
Leaderboard โข Documentation โข Submit Results โข Paper
- 09/2025: Our paper "PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation" has been accepted by NeurIPS 2025! ๐ ๐ ๐
- 05/2025: We have released PARROT-1.0 (28,003 translation pairs from 38 open-source benchmarks for extensive syntax testing) and published the leaderboard.
| ๐ฏ Comprehensive | ๐ง Production-Ready | ๐งช Well-Tested | ๐ Multi-Dialect |
|---|---|---|---|
| 598 curated pairs from 38+ benchmarks | Real-world workloads & production data | Built-in validators & parsers | 10+ SQL dialects supported |
- โ 598 Translation Pairs from 38+ public benchmarks and production-derived workloads
- ๐ง Broad Dialect Coverage: PostgreSQL, MySQL, SQLite, Oracle, SQL Server, Db2, DuckDB, Trino, Hive, Snowflake, and more
- ๐งช Built-in Validators: Comprehensive parsers and executability checks for multiple engines
- ๐ ๏ธ Complete Toolkit: Preprocessing utilities and baseline translation tools included
- ๐ Rigorous Evaluation: Multi-dimensional scoring (syntax and execution)
- ๐ Live Leaderboard: Track your progress and compete with the community
-
๐ Prepare Outputs
- Follow the example in
Submission_Example/20250928_LLMTranslator_ExampleTeam.zip - Ensure proper folder structure and file formats
- Follow the example in
-
๐ Read Guidelines
- Review
Submission_Example/PARROT Submission Guidelines.md - Check format requirements and naming conventions
- Review
-
๐ Include System Description
- Approach and methodology
- Models and versions used
- Rules and heuristics applied
- Training data sources
- Compute resources
-
๐ Submit
- Upload via the leaderboard site
- Wait for evaluation results
- Consistent model versions and random seeds
- Clear indication of supported dialect pairs
- Valid UTF-8 text file outputs
- Exact versions of LLM prompts/rule files included
- System description document included
- Reproducibility instructions provided
โ ๏ธ Important: Include exact versions of all dependencies, prompts, and rule files for reproducibility.
| Rule | Description |
|---|---|
| โฑ๏ธ Frequency | One submission per team per month (TBD) |
| ๐ Transparency | Disclose all training data and public resources |
| ๐ท๏ธ Documentation | Clearly mark manual rules or prompts |
| ๐ซ Fairness | No test set contamination or hand-tuning |
| โ Verification | Results may be verified; additional materials may be requested |
We recommend to refer to an LLM-based baseline CrackSQL.
CrackSQL is a powerful SQL dialect translation tool that integrates rule-based strategies with LLMs for high accuracy. It enables seamless conversion between dialects (e.g., PostgreSQL โ MySQL) with flexible access through Python API, command line, and web interface.
Goal: Translate SQL from one database dialect to another while preserving semantic equivalence.
Input: (source_dialect, target_dialect, source_sql)
Output: target_sql
-- Source (PostgreSQL)
SELECT EXTRACT(YEAR FROM created_at) AS year, COUNT(*)
FROM users
WHERE age > 25
GROUP BY EXTRACT(YEAR FROM created_at);
-- Target (MySQL)
SELECT YEAR(created_at) AS year, COUNT(*)
FROM users
WHERE age > 25
GROUP BY YEAR(created_at);| Metric | Count |
|---|---|
| Translation Pairs | 598 |
| Source Benchmarks | 38+ |
| SQL Dialects | 10+ |
| Supported Engines | 15+ |
| Domain Types | Single & Cross-domain |
PARROT/
โโโ ๐ benchmark/ # Source datasets from 38+ benchmarks
โ โโโ Spider/ # Cross-domain SQL queries
โ โโโ SParC/ # Multi-turn conversations
โ โโโ BIRD/ # Complex real-world queries
โ โโโ TPC-H FROID/ # UDF-heavy workloads
โ โโโ ... # 34+ more benchmarks
โโโ ๐ validator/ # Grammar parsers & validators
โ โโโ pg_parser/ # PostgreSQL parser
โ โโโ mysql_parser/ # MySQL parser
โ โโโ oracle_parser/ # Oracle parser
โ โโโ ... # 10+ more dialect parsers
โโโ โ๏ธ processor/ # Preprocessing utilities
โโโ ๐ translator/ # Baseline translation tools
โโโ ๐ค Submission_Example/ # Submission templates
View all 38+ benchmarks
| Benchmark | Year | SQL Dialects | Language | Domain Type | Turn Round | Collection |
|---|---|---|---|---|---|---|
| ATIS | 1994 | SQLite, MySQL | English | Single-domain | Single | Manual |
| GeoQuery | 1996 | MySQL, SQLite | English | Single-domain | Single | Manual |
| Restaurants | 2000 | SQLite | English | Single-domain | Single | Manual |
| Academic | 2014 | Unspecified | English | Single-domain | Single | Manual |
| IMDb | 2017 | Unspecified | English | Single-domain | Single | Manual |
| Yelp | 2017 | Unspecified | English | Single-domain | Single | Manual |
| Scholar | 2017 | Unspecified | English | Single-domain | Single | Manual |
| WikiSQL | 2017 | SQLite | English | Cross-domain | Single | Manual |
| Advising | 2018 | SQLite, MySQL | English | Single-domain | Single | Manual |
| Spider | 2018 | SQLite | English | Cross-domain | Single | Manual |
| SParC | 2019 | SQLite | English | Cross-domain | Multiple | Manual |
| CoSQL | 2019 | SQLite | English | Cross-domain | Multiple | Manual |
| CSpider | 2019 | SQLite | Chinese | Cross-domain | Single | Manual |
| MIMICSQL | 2020 | SQLite | English | Single-domain | Single | Hybridโ |
| SQUALL | 2020 | SQLite | English | Cross-domain | Single | Manual |
| FIBEN | 2020 | IBM Db2, PostgreSQL | English | Single-domain | Single | Manual |
| ViText2SQL | 2020 | General SQL | Vietnamese | Cross-domain | Single | Manual |
| DuSQL | 2020 | Unspecified | Chinese | Cross-domain | Single | Hybridโ |
| PortugueseSpider | 2021 | SQLite | Portuguese | Cross-domain | Single | Hybridโ |
| CHASE | 2021 | SQLite | Chinese | Cross-domain | Multiple | Manual |
| Spider-Syn | 2021 | SQLite | English | Cross-domain | Single | Manual |
| Spider-DK | 2021 | SQLite | English | Cross-domain | Single | Manual |
| Spider-Realistic | 2021 | SQLite | English | Cross-domain | Single | Manual |
| KaggleDBQA | 2021 | SQLite | English | Cross-domain | Single | Manual |
| SEDE | 2021 | T-SQL | English | Single-domain | Single | Manual |
| MT-TEQL | 2021 | SQLite | English | Cross-domain | Single | Automatic |
| PAUQ | 2022 | SQLite | Russian | Cross-domain | Single | Manual |
| knowSQL | 2022 | Unspecified | Chinese | Cross-domain | Single | Manual |
| Dr.Spider | 2023 | SQLite | English | Cross-domain | Single | Hybridโ |
| BIRD | 2023 | SQLite | English | Cross-domain | Single | Manual |
| AmbiQT | 2023 | SQLite | English | Cross-domain | Single | LLM-aided |
| ScienceBenchmark | 2024 | General SQL | English | Single-domain | Single | Hybridโ |
| BookSQL | 2024 | SQLite | English | Single-domain | Single | Manual |
| Archer | 2024 | SQLite | English/ Chinese | Cross-domain | Single | Manual |
| BULL | 2024 | SQLite | English/ Chinese | Single-domain | Single | Manual |
| Spider2 | 2024 | SQLite, DuckDB, PostgreSQL | English | Cross-domain | Single | Manual |
| TPC-H FROID | 2018 | T-SQL, PostgreSQL | English | Cross-domain | Single | Hybridโ |
| DSB | 2021 | T-SQL, PostgreSQL | English | Decision Support | Single | Hybridโ |
| TPC-DS | 2005 | T-SQL, PostgreSQL | English | Decision Support | Single | Hybridโ |
| SQL-ProcBench | 2021 | SQL Server, PostgreSQL, IBM Db2 | English | Single-domain | Single | Production-derived |
โ Hybrid means the dataset was created using both automatic generation and manual annotation.
PARROT evaluates systems across four key dimensions:
| Dimension | Description |
|---|---|
| ๐ Syntax Validity | Can the SQL be parsed by the target dialect? |
| โก Execution Checks | Result equivalence when data available |
If you use PARROT in your research, please cite:
@inproceedings{zhou2025parrot,
author = {Wei Zhou and Guoliang Li and Haoyu Wang and Yuxing Han and Xufei Wu and Fan Wu and Xuanhe Zhou},
title = {PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2025}
}
@article{zhou2025cracksql,
author = {Wei Zhou and Yuyang Gao and Xuanhe Zhou and Guoliang Li},
title = {Cracking SQL Barriers: An LLM-based Dialect Translation System},
journal = {Proceedings of the ACM on Management of Data},
volume = {3},
number = {3 (SIGMOD)},
year = {2025}
}
@article{zhou2025cracksqldemo,
author = {Wei Zhou and Yuyang Gao and Xuanhe Zhou and Guoliang Li},
title = {CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models},
journal = {arXiv Preprint},
url = {https://arxiv.org/abs/2504.00882},
year = {2025}
}This project is released under the MIT License. See LICENSE file for details.
Questions? Feedback? Want to submit?
๐ง Email: weizhoudb@sjtu.edu.cn
๐ฌ Contributions: Issues and PRs are welcome!
