papers-without-code

A Python package (and website) to automatically attempt to find GitHub repositories that are similar to academic papers.

Installation

Stable Release: pip install papers-without-code
Development Head: pip install git+https://github.com/evamaxfield/papers-without-code.git

Usage

Provide a DOI, SemanticScholarID, CorpusID, ArXivID, ACL, or URL from semanticscholar.org, arxiv.org, aclweb.org, acm.org, or biorxiv.org. DOIs can be provided as is. All other IDs should be given with their type, for example: doi:10.18653/v1/2020.acl-main.447 or CorpusID:202558505 or url:https://arxiv.org/abs/2004.07180.

CLI

pip install papers-without-code
pwoc query
# or pwoc path/to/file.pdf

Python

from papers_without_code import search_for_repos
search_for_repos("query")
# search_for_repos("path/to/file.pdf")

⚠️ Prior to using PWOC with a PDF you must be logged in to Docker CLI via docker login because we automatically fetch, spin up, and tear down containers for processing. ⚠️

How it Works

In short, we pass the query on to the Semantic Scholar search API which provides us basic details about the paper. We use a prompted gpt-3.5-turbo with langchain to extract keywords from the title and abstract. We then make multiple threaded requests to GitHub's API for repositories which match the keywords. Once we have all the possible repositories back, we rank them by similarity between the repository's README and the paper's abstract (or if not available, it's title).

When using Papers without Code locally and providing a filepath, the only change to this workflow, is paper details gathering. When local and providing a filepath, we use GROBID to extract the title, abstract, and author list.

Documentation

For full package documentation please visit evamaxfield.github.io/papers-without-code.

Exploratory data analysis of the dataset used for testing

Development

See CONTRIBUTING.md for information related to developing the code.

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github		.github
data		data
docs		docs
papers_without_code		papers_without_code
scripts		scripts
.cookiecutter.yaml		.cookiecutter.yaml
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gcloudignore		.gcloudignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

papers-without-code

Installation

Usage

CLI

Python

How it Works

Documentation

Development

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

evamaxfield/papers-without-code

Folders and files

Latest commit

History

Repository files navigation

papers-without-code

Installation

Usage

CLI

Python

How it Works

Documentation

Development

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages