You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CodeTransOcean, a large-scale comprehensive benchmark that supports the largest variety of programming languages for code translation. CodeTransOcean consists of three novel multilingual datasets, namely, MultilingualTrans supporting translations between multiple popular programming languages, NicheTrans for translating between niche programming languages and popular ones, and LLMTrans for evaluating executability of translated code by large language models (LLMs). CodeTransOcean also includes a novel cross-framework dataset, DLTrans, for translating deep learning code across different frameworks.
The MultilingualTrans, NicheTrans, and DLTrans datasets were experimented with on CodeT5+, and the code is in the CodeT5+ file.
The LLMTrans dataset was experimented with on GPT-3.5, and the code is in the ChatGPT file.
Citation
Please cite the paper if you use the data or code from CodeTransOcean.
@article{yan2023codetransocean,
title={CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation},
author={Yan, Weixiang and Tian, Yuchen and Li, Yunzhe and Chen, Qian and Wang, Wen},
journal={arXiv preprint arXiv:2310.04951},
year={2023}
}
Contact
For questions, please feel free to reach out via email at yanweixiang.ywx@gmail.com.
About
[EMNLP 2023] CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation