You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
g++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast
Usage:
List commands
./fast
usage: fastbpe <command> <args>
The commands supported by fastBPE are:
getvocab input1 [input2] extract the vocabulary from one or two text files
learnbpe nCodes input1 [input2] learn BPE codes from one or two text files
applybpe output input codes [vocab] apply BPE codes to a text file
applybpe_stream codes [vocab] apply BPE codes to stdin and outputs to stdout
fastBPE also supports stdin inputs. For instance, these two commands are equivalent:
./fast getvocab text > vocab
cat text | ./fast getvocab - > vocab
But the first one will memory map the input file to read it efficiently, which can be more than twice faster than stdin on very large files. Similarly, these two commands are equivalent:
Although the first one will be significantly faster on large datasets, as it uses multi-threading to pre-compute the BPE splits of all words in the input file.
Note: For Mac OSX Users, add export MACOSX_DEPLOYMENT_TARGET=10.x (x=9 or 10, depending on your version) or -stdlib=libc++ to the extra_compile_args of setup.py before/during the above install command, as appropriate.
Call the API using:
importfastBPEbpe=fastBPE.fastBPE(codes_path, vocab_path)
bpe.apply(["Roasted barramundi fish", "Centrally managed over a client-server architecture"])
>> ['Ro@@ asted barr@@ am@@ un@@ di fish', 'Centr@@ ally managed over a cli@@ ent-@@ server architecture']