You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clojure library meant for word embedding using the deeplearning4j library under the hood.
Word2Vec and Doc2Vec features are fully functional. Project is currently in a stable state.
API is subject to change in future versions.
Pull requests are welcome!
Usage
To install add this to your dependencies:
[hswick/jutsu.nlp "0.1.0"]
To use jutsu.nlp:
(:require '[jutsu.nlp.core :as nlp]
'[jutsu.nlp.util :as util])
;;Configure your Word2Vec model
(defw2v (nlp/word-2-vec"path/to/text-file"
{:min-word-frequency5;;You can also input an option map:iterations1;;To set certain parameters:layer-size100:seed42:window-size5}))
;;This trains the model on the data given
(nlp/fit! w2v)
;;Write the word2vec model to memory
(nlp/write-word-vectors w2v "word_vectors.csv")
;;Load a word2vec model from memory
(defw2v-2 (nlp/read-word-vectors (clojure.java.io/file"word_vectors.csv")))
;;If you want stopping and stemming initialize word2vec like this
(require '[jutsu.nlp.sentence-iterator :as iter]
'[jutsu.nlp.tokenization :as token])
(nlp/word-2-vec
(iter/default-iterator (util/absolute-path"neuromancer.txt"))
(token/default-tokenizer-factory (token/common-stemmer-preprocessor))
{:min-word-frequency6:stopwords (nlp/stop-words)
:window-size10:layer-size150})
;;If you want to input a directory initialize like this
(defw2v4 (nlp/word-2-vec
(iter/dir-iterator"path/to/dir")
(token/default-tokenizer-factory)
{:min-word-frequency6:window-size10:layer-size150:stopwords (nlp/stop-words)}))
Dev
Run boot night to startup nightlight and begin editing your project in a browser.