You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importsistersentence_embedding=sister.MeanEmbedding(lang="en")
sentence="I am a dog."vector=sentence_embedding(sentence)
If you have custom model file by yourself, you can load it too.
(Data Format has to be loadable as gensim.models.KeyedVectors for word2vec model files)
importsisterfromsister.word_embeddersimportWord2VecEmbeddingsentence_embedding=sister.MeanEmbedding(
lang="ja", Word2VecEmbedding(model_path="/path/to/model")
)
sentence="I am a dog."vector=sentence_embedding(sentence)
Supported languages.
English
Japanese
French
In order to support a new language, please implement Tokenizer (inheriting sister.tokenizers.Tokenizer) and add fastText
pre-trained url to word_embedders.get_fasttext() (List of model urls).
Bert models are supported for en, fr, ja (2020-06-29).
Actually Albert for English, CamemBERT for French and BERT for Japanese.
To use BERT, you need to install sister by pip install 'sister[bert]'.
importsisterbert_embedding=sister.BertEmbedding(lang="en")
sentence="I am a dog."vector=bert_embedding(sentence)
You can also give multiple sentences to it (more efficient).
importsisterbert_embedding=sister.BertEmbedding(lang="en")
sentences= ["I am a dog.", "I want be a cat."]
vectors=bert_embedding(sentences)