A tool that uses language models to help find R packages, by matching packages either to a text description, or to entire packages. Can find matching packages either from rOpenSci’s suite of packages, or from all packages currently on CRAN.
This package relies on a locally-running instance of
ollama. Procedures for setting that up are
described in a separate
vignette
(vignette("ollama", package = "pkgmatch")
). Although some
functionality of this package may be used without ollama, the main
functions require ollama to be installed.
Once ollama is running, the easiest way to install this package is via
the associated
r-universe
.
As shown there, simply enable the universe with
options (repos = c (
ropenscireviewtools = "https://ropensci-review-tools.r-universe.dev",
CRAN = "https://cloud.r-project.org"
))
And then install the usual way with,
install.packages ("pkgmatch")
Alternatively, the package can be installed by first installing either the remotes or pak packages and running one of the following lines:
remotes::install_github ("ropensci-review-tools/pkgmatch")
pak::pkg_install ("ropensci-review-tools/pkgmatch")
The package can then loaded for use with
library (pkgmatch)
The ollama_check()
function
can then be used to confirm that ollama is up and
running as expected.
The ‘pkgmatch’ package takes input either from a text description or local path to an R package, and finds matching packages based on both Language Model (LM) embeddings, and more traditional text and code matching algorithms.
The package has two main functions:
pkgmatch_similar_pkgs()
to find similar rOpenSci or CRAN packages based on input as either a local path to an entire package, the name of an installed package, or as a single descriptive text string; andpkgmatch_similar_fns()
to find similar functions from rOpenSci packages based on descriptive text input. (Not available for functions from CRAN packages.)
The following code demonstrates how these functions work, first matching general text strings packages from rOpenSci:
input <- "
Packages for analysing evolutionary trees, with a particular focus
on visualising inter-relationships among distinct trees.
"
pkgmatch_similar_pkgs (input, corpus = "ropensci")
## [1] "phylogram" "phruta" "rotl" "taxa" "lingtypology"
The corpus parameter must be specified as one of “ropensci” or “cran”
(case-insensitive). The CRAN corpus is much larger than the rOpenSci
corpus, and matching for corpus = "cran"
will generally take notably
longer.
Websites of packages returned by the pkgmatch_similar_pkgs()
function
can be automatically opened, either by calling the function with
browse = TRUE
, or by storing the return value of the
pkgmatch_similar_pkgs()
function
as an object and passing that to the pkgmatch_browse()
function.
The input
parameter can also specify an entire package, either as a
local path to a package directory, or the name of an installed package.
To demonstrate that, the following code downloads a .tar.gz
file of
the httr2
package from CRAN:
pkg <- "httr2"
p <- available.packages () |>
data.frame () |>
dplyr::filter (Package == pkg)
url_base <- "https://cran.r-project.org/src/contrib/"
url <- paste0 (url_base, p$Package, "_", p$Version, ".tar.gz")
path <- fs::path (fs::path_temp (), basename (url))
download.file (url, destfile = path, quiet = TRUE)
The path to that package (in this case as a compressed tarball) can then
be passed to the
pkgmatch_similar_pkgs()
function:
pkgmatch_similar_pkgs (path, corpus = "cran")
## $text
## [1] "luca" "httr" "tapLock" "scatterplot3d"
## [5] "AzureAuth"
##
## $code
## [1] "paperplanes" "httr" "prenoms" "tapLock" "AzureAuth"
The result includes the top five matches based from both text and code
of the input package. The input package itself is the second-placed
match in both cases, and not the top match. This happens because
embeddings are “chunked” or randomly permuted, and because matches are
statistical and not deterministic. Nevertheless, the only two packages
which appear in the top five matches on both lists are the package
itself, httr2
, and the very closely related, httptest2
package for
testing output of httr2
. See the vignette on Why are the results not
what I
expect?
for more detail on how matches are generated.
There is an additional function to find functions within packages which best match a text description.
input <- "A function to label a set of geographic coordinates"
pkgmatch_similar_fns (input)
## [1] "GSODR::nearest_stations" "refsplitr::plot_addresses_points"
## [3] "slopes::elevation_extract" "rnoaa::meteo_nearby_stations"
## [5] "charlatan::CoordinateProvider"
input <- "Identify genetic sequences matching a given input fragment"
pkgmatch_similar_fns (input)
## [1] "charlatan::SequenceProvider" "beastier::is_alignment"
## [3] "charlatan::ch_gene_sequence" "beautier::is_phylo"
## [5] "textreuse::align_local"
Setting browse = TRUE
will then open the documentation pages
corresponding to those best-matching functions.
The pkgmatch
package includes the following vignettes:
- A main pkgmatch vignette which gives an overview of how to use the package.
- Example
applications
which describes several different example applications of
pkgmatch
, and illustrates the ways by which this package provides different kind of results to search engines and to general language model interfaces. - Before you begin: ollama
installation
which describes how to install and setup the
ollama
software needed to download and run the language models. - How does pkgmatch work? which provides detailed explanations of the matching algorithms implemented in the package.
- Data caching and
updating
which describes how
pkgmatch
caches and updates the language model results for the individual corpora. - Why local language models
(LMs)?
which explains why
pkgmatch
uses locally-running language models, instead of relying on external APIs. - Why are the results not what I
expect?
which explains in detail why matches generated by
pkgmatch
may sometimes differ from what you might expect, and includes advice for how to improve matches.
- The
utils::RSiteSearch()
function. - The
sos
package that queries the “RSiteSearch” database.
All contributions to this project are gratefully acknowledged using the allcontributors
package following the allcontributors specification. Contributions of any kind are welcome!
mpadge |
Bisaloo |
MargaretSiple-NOAA |
maelle |
Selbosh |
nhejazi |
agricolamz |