Quality evaluation of Tabby coding assistant using real source code snippets

Environment for testing suggestions generated by Tabby coding assistant

Engineering thesis

Pipeline created as a part of engineering thesis at Warsaw University of Technology under the supervision of professor Robert Nowak.

Full topic of the dissertation: Emacs text editor package for integration with Tabby coding assistant

Running

Tabby server setup

Link to Tabby’s offical guide for installation and documentation
Tabby authorization token necessary to set in .env file or as local environment variable during execution

Recreate the virtual environment

python -m venv .venv
source .venv/bin/activate

Install dependencies

python -m pip install -r requirements.txt

Run scripts

1. Get data

chmod +x get_dataset.sh
./get_dataset.sh

2. Sort data

python src/sort_data-1.py

3. Query server

python src/query_server-2.py

4. Perform evaluation

python src/static_tester-3.py
python src/similarity_tester-3.py

5. Visualize results

python src/make_plot-4.py

Project Description

This testing environment serves the purpose of gathering data on Tabby’s performance in the task of generating suggestions for code completion. Testing outcomes serve as the groundwork for analysis in the engineer’s thesis titled “Quality evaluation of Tabby coding assistant and Tabby integration with Emacs text editor”. This project’s outcomes support the motivation for Tabby’s plugin implementation for Emacs and display the potential of the applied methodology.

Structure

Data

data directory holding intermittent samples and final outcomes is created with a help of get_database.sh script, located at the root, which downloads a part of the Algorithms repository, that is used as a benchmark codebase.

Src

src directory contains all scripts constituting the actual pipeline

Data preprocessing

sort_data-1.py discards files that do not match the file extension criteria and those that are empty, reconstructing sorted structure in data/sorted.

Completions retrieval

Server connection

tabby-connection.py defines actual connection with the Tabby endpoint using authentication token.

Prompt generation

prefix_generator.py creates prefixes out of code samples, in an incremental manner, according to a predefined prefix.

Querying

query_server-2.py is responsible for issuing of the requests containing prefix prompts to Tabby server, followed by saving the concatenated prefixes and responses in data/autocompletions.

Testing

Static metrics

static_tester-3.py defines the process of evaluating both the original code samples and the autocompleted ones, according to cyclomatic complexity, Halstead effort and Halstead bugs metrics, implemented using a Python library for code metrics, Radon. Results are saved to data/static_metrics.

Similarity evaluation

similarity_tester-3.py implements the main part of evaluation, by employing string similarity algorithms:

difflib’s SequenceMatcher
Jaro-Winkler similarity
Damerau-Levenshtein distance
Hamming distance

The last three algorithms are implemented with the help of Python jellyfish library. Similarity testing is performed in two ways:

Whole files
- Each original sample from data/sorted is compared with the Tabby-completed duplicate for each prefix.
- Additional data in the form of ratio between the length of original and duplicate files is captured.
- Results are saved to data/similarity_logs_full.
Overlap of the generated fragments in terms of location in the file
- For each original file, its fragment is selected that overlaps with the Tabby-generated fragment in terms of position.
- This way only purely generated code is compared against the reference snippet.
- Results are saved data/similarity_logs_fragment

Visualization

Testing process’s outcomes are used for the subsequent creation of plots. make_plot-4.py creates the following plots:

Full-file similarity plots per similarity algorithm
File-fragment similarity plots per similarity algorithm
Averaged static metric values for original programs against averaged static metrics values for duplicate programs per static metric
Length ratio between original and duplicate files

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
plots		plots
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.org		README.org
get_dataset.sh		get_dataset.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quality evaluation of Tabby coding assistant using real source code snippets

Environment for testing suggestions generated by Tabby coding assistant

Engineering thesis

Running

Tabby server setup

Recreate the virtual environment

Install dependencies

Run scripts

1. Get data

2. Sort data

3. Query server

4. Perform evaluation

5. Visualize results

Project Description

Structure

Data

Src

Data preprocessing

Completions retrieval

Server connection

Prompt generation

Querying

Testing

Static metrics

Similarity evaluation

Visualization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

metredecoeur/tabby-testing-pipeline

Folders and files

Latest commit

History

Repository files navigation

Quality evaluation of Tabby coding assistant using real source code snippets

Environment for testing suggestions generated by Tabby coding assistant

Engineering thesis

Running

Tabby server setup

Recreate the virtual environment

Install dependencies

Run scripts

1. Get data

2. Sort data

3. Query server

4. Perform evaluation

5. Visualize results

Project Description

Structure

Data

Src

Data preprocessing

Completions retrieval

Server connection

Prompt generation

Querying

Testing

Static metrics

Similarity evaluation

Visualization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages