You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have used BERT Token Classification Model to extract keywords from a sentence. Feel free to clone and use it. If you face any problems, kindly post it on issues section.
Special credits to BERT authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, original repo and Huggingface for PyTorch version original repo.
The keyword-extractor.py script can be used to extract keywords from a sentence and accepts the following arguments:
optional arguments:
-h, --help show this help message and exit
--sentence SEN sentence to extract keywords
--path LOAD path to load model from
Example:
python keyword-extractor.py --sentence "BERT is a great model." --path "model.pt"
Training
You can also train it from scratch using BERT's pre-trained model. The main.py script can be utilized for training and accepts the following arguments:
optional arguments:
-h, --help show this help message and exit
--data DATA location of the data corpus
--lr LR initial learning rate
--epochs EPOCHS upper epoch limit
--batch_size N batch size
--seq_len N sequence length
--save SAVE path to save the final model
This model has been trained on SemEval 2010 dataset (scientific publications). You can swap this with your own custom dataset.
Code explanations
I have provided the explanation of keyphrase extraction in the form of python notebook which you can view here
Hyper-parameter Tuning
I ran ablation experiments according to the BERT paper and these are the results. I suggest to use parameters in line 4.
All training was done on batch size of 32.