Character-based Word and Context embeddings

This repository contains a usable code from the paper:

G. Marra, A. Zugarini, S. Melacci, and M. Maggini, “An unsupervised character-aware neural approach to word and context representation learning,” in Proceedings of the 27th International Conference on Artificial Neural Networks – ICANN 2018

The structure of the project contains:

A data folder, containing the txt files from which to learn the embeddings.
A log folder, where the model is saved.
The char2word.py script, which is the main routine.
The encoder.py script, which contains some utility functions.

For a standard learning procedure do the following.

Be sure both data and log folder are present.
Put your training data into the data folder, with files named as data(something).txt, e.g. data01.txt.
Simply run char2word.py

The script will create a vocabulary.txt file inside the data folder to be used during training. All the configurations are set to the default ones (i.e. the paper ones). The script does not yet provide a command-line configurations (apart for folder configuration, run --helpfor info). The user willing to have a custom configuration should modify the Config configuration class in the char2word.py script.

We will provide a more user-friendly command-line interface as soon as possible, together with more details about the training procedure and how to incorporate the model inside bigger models.

To see a fast way to exploit already trained embeddings look at the SentenceEncoder class together with the test function.

This code has been tested with tensorflow==1.4 and python2.7. Moreover, it has a dependency with the python library nltk .

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
log		log
.gitignore		.gitignore
README.md		README.md
char2word.py		char2word.py
encoder.py		encoder.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Character-based Word and Context embeddings

About

Uh oh!

Releases

Packages

Languages

GiuseppeMarra/char-word-embeddings

Folders and files

Latest commit

History

Repository files navigation

Character-based Word and Context embeddings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages