Pipeline created as a part of engineering thesis at Warsaw University of Technology under the supervision of professor Robert Nowak.
Full topic of the dissertation: Emacs text editor package for integration with Tabby coding assistant
- Link to Tabby’s offical guide for installation and documentation
- Tabby authorization token necessary to set in .env file or as local environment variable during execution
python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt
chmod +x get_dataset.sh
./get_dataset.sh
python src/sort_data-1.py
python src/query_server-2.py
python src/static_tester-3.py
python src/similarity_tester-3.py
python src/make_plot-4.py
This testing environment serves the purpose of gathering data on Tabby’s performance in the task of generating suggestions for code completion. Testing outcomes serve as the groundwork for analysis in the engineer’s thesis titled “Quality evaluation of Tabby coding assistant and Tabby integration with Emacs text editor”. This project’s outcomes support the motivation for Tabby’s plugin implementation for Emacs and display the potential of the applied methodology.
data directory holding intermittent samples and final outcomes is created with a help of get_database.sh script, located at the root, which downloads a part of the Algorithms repository, that is used as a benchmark codebase.
src directory contains all scripts constituting the actual pipeline
sort_data-1.py discards files that do not match the file extension criteria and those that are empty, reconstructing sorted structure in data/sorted.
tabby-connection.py defines actual connection with the Tabby endpoint using authentication token.
prefix_generator.py creates prefixes out of code samples, in an incremental manner, according to a predefined prefix.
query_server-2.py is responsible for issuing of the requests containing prefix prompts to Tabby server, followed by saving the concatenated prefixes and responses in data/autocompletions.
static_tester-3.py defines the process of evaluating both the original code samples and the autocompleted ones, according to cyclomatic complexity, Halstead effort and Halstead bugs metrics, implemented using a Python library for code metrics, Radon. Results are saved to data/static_metrics.
similarity_tester-3.py implements the main part of evaluation, by employing string similarity algorithms:
- difflib’s SequenceMatcher
- Jaro-Winkler similarity
- Damerau-Levenshtein distance
- Hamming distance
The last three algorithms are implemented with the help of Python jellyfish library. Similarity testing is performed in two ways:
- Whole files
- Each original sample from data/sorted is compared with the Tabby-completed duplicate for each prefix.
- Additional data in the form of ratio between the length of original and duplicate files is captured.
- Results are saved to data/similarity_logs_full.
- Overlap of the generated fragments in terms of location in the file
- For each original file, its fragment is selected that overlaps with the Tabby-generated fragment in terms of position.
- This way only purely generated code is compared against the reference snippet.
- Results are saved data/similarity_logs_fragment
Testing process’s outcomes are used for the subsequent creation of plots. make_plot-4.py creates the following plots:
- Full-file similarity plots per similarity algorithm
- File-fragment similarity plots per similarity algorithm
- Averaged static metric values for original programs against averaged static metrics values for duplicate programs per static metric
- Length ratio between original and duplicate files