G-TransEval

This artifact for our ASE 2023 paper "On the Evaluation of Neural Code Translation: Taxonomy and Benchmark" includes benchmark suite, results, materials and source code of our automatic unit test tool. We hope this artifact can motivate and help future research on code translation.

What's inside the artifact:

A benchmark suite of 400 code translation pairs between 5 languages, i.e., Python, C++, Java, C#, and JavaScript (Section V). Located in ./G-TransEval
Empirical study material (Section II and III). Located in ./EmpiricalStudy
Taxonomy examples and experiment results (Section IV). Located in ./Taxonomy
Our automatic unit test tool for G-TransEval. Located in ./TestRunner

Taxonomy

We develop a taxonomy that categorizes code translation tasks into four primary types according to their complexity and knowledge dependence:

Token Level (Type 1): Map trivial tokens to their equivalent in the target
Syntactic Level (Type 2): Migrate syntactic structures based on linguistic rules
Library Level (Type 3): Migrate library to their equivalent in the target language
Algorithm Level (Type 4): Reimplement the program in the target language using a different algorithm

More detailed information is at ./Taxonomy.

Benchmark

G-TransEval is the first categorized test set designed to provide fine-grained and extensive evaluations of code translation models. It comprises a total of 400 code translation pairs between 5 language, i.e., Python, C++, Java, C#, and JavaScript. Each test sample are augmented with unit test cases.

Type 1: 125 pairs. Located in ./G-TransEval/Type1

Type 2: 125 pairs. Located in ./G-TransEval/Type2

Type 3: 125 pairs. Located in ./G-TransEval/Type3

Type 4: 25 pairs. Located in ./G-TransEval/Type4

More detailed information is at ./G-TransEval

Evaluation Results

We evaluate CodeBERT, CodeT5, TransCoder, TransCoder-ST, gpt-3.5-turbo and StarCoderBase on G-TransEval.

Models checkpoint

Models	Source
CodeBERT	https://huggingface.co/microsoft/codebert-base
CodeT5	https://huggingface.co/Salesforce/codet5-base
TransCoder	https://github.com/facebookresearch/CodeGen
TransCoder-ST	https://github.com/facebookresearch/CodeGen

Their translation results are located in ./G-TransEval/Results.

Unit Test Runner

The automatic unit test tool is placed in ./TestRunner. See detailed instruction for usages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

G-TransEval

What's inside the artifact:

Taxonomy

Benchmark

Evaluation Results

Models checkpoint

Unit Test Runner

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
EmpiricalStudy		EmpiricalStudy
G-TransEval		G-TransEval
Taxonomy		Taxonomy
TestRunner		TestRunner
README.md		README.md

polyeval/g-transeval

Folders and files

Latest commit

History

Repository files navigation

G-TransEval

What's inside the artifact:

Taxonomy

Benchmark

Evaluation Results

Models checkpoint

Unit Test Runner

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages