You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Julia port of SymSpell, extremely fast spelling correction and fuzzy search algorithm.
TL;DR
using SymSpellChecker
d =SymSpell()
push!(d, "hello")
push!(d, "world")
d["wrold"] = ["world"]
Dictionary creation
Dictionaries can be created as follows
using SymSpellChecker
# Loading from file
d =SymSpell("assets/frequency_dictionary_en_30_000.txt")
# Manual update
d =SymSpell()
push!(d, "hello", 100)
push!(d, "world", 50)
Third term in push! function is the word frequency, which is used later in lookup to sort results from highest frequency to the lowest.
SymSpell constructor has following arguments
max_dictionary_edit_distance: maximum allowed search distance. High value of this argument requires lots of memory. Default value is 2.
prefix_length: prefix length used to generate candidates, higher values corresponds to higher memory requirements, but smaller search times. Default value is 5
count_threshold: words with frequencies below this threshold wouldn't show in search results.
Here 1 is a Damerau-Levenshtein distance between world and wrold, 50 is a word frequency in current dictionary.
One can extract only words from lookup result
term.(lookup(d, "wrold")) = ["world"]
There is more convenient form of lookup exists
d["wrold"] = ["world"]
Search arguments can be passed either in lookup function or set globally with the help of set_options!(d::SymSpell; kwargs...) command.
set_options!(d, include_unknown =true, verbosity ="closest")
d["wrold"] = ["wrold", "world"]
# this is equivalent toterm.(lookup(d, include_unknown =true, verbosity ="closest"))
Following arguments are supported
include_unknown: whether include or not original word in results, if it falls under search criteria
ignore_token: ignore words in lookup that contain token string or regexp.
transfer_casing: when this option set to true, results will try to mimic casing of the original word, for example d["Wrold"] = ["World"]
max_edit_distance: maximum allowed distance for search. By default equals to the max_dictionary_edit_distance
verbosity: select type of search result. Three levels of verbosity exists
"top": only single suggestion is returned, with lowest distance and highest frequency
"closest": all words with lowest distance are returned
"all": all words within given max_edit_distance are returned
License
The SymSpellChecker.jl package is licensed under the MIT License. This package is based on SymSpell and it's python adaptation. Some parts of the code is based on StringDistances.jl.
About
Julia port of SymSpell, extremely fast spelling correction and fuzzy search algorithm.