You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SEART Data Hub platform allows to easily create large-scale datasets that can be used to either run empirical MSR
studies or to train Deep Learning models to automate software engineering tasks.
Contents
This project contains several modules:
dl4se-model: A module containing domain model classes used for mapping the relational database structure to the
programming environment;
dl4se-analyzer: A module containing implementations of code analysis operations running on tree-sitter;
dl4se-transformer: A module containing implementations of code transformation operations running on tree-sitter;
dl4se-crawler: A standalone crawler application that we use to mine source code from GitHub repositories indexed by
GitHub Search;
dl4se-server: A Spring Boot server application that acts as our platform back-end;
dl4se-spring: Common Spring Boot configuration and utilities used in both the server and the crawler;
dl4se-website: A front-end web-application written in Vue.
Installation and Usage
This section will detail the necessary actions for setting up and running the project locally on your machine.
How do you implement language-specific analysis heuristics?
Heuristics used to identify test code in Java and Python can be found
here and
here.
Heuristics used to identify boilerplate code can be found
here and
here respectively.
How can I request a feature or ask a question?
If you have ideas for a feature you would like to see implemented or if you have any questions, we encourage you to
create a new discussion. By initiating a discussion, you can engage
with the community and our team, and we'll respond promptly to address your queries or consider your feature requests.
How can I report a bug?
To report any issues or bugs you encounter, please create a new issue.
Providing detailed information about the problem you're facing will help us understand and address it more effectively.
Rest assured, we are committed to promptly reviewing and responding to the issues you raise, working collaboratively
to resolve any bugs and improve the overall user experience.
About
Building Training Datasets for Deep Learning Models in Software Engineering and Empirical Software Engineering Research