You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project demonstrates the use of the YAKE (Yet Another Keyword Extractor) algorithm through an interactive Streamlit web application. YAKE is an unsupervised approach for automatic keyword extraction from text documents.
🔧 Installation
Make sure you are using Python 3.8 or higher.
Clone the repository:
git clone https://github.com/LIAAD/yake_demo.git
cd yake-streamlit-demo
Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install the dependencies:
pip install -r packages.txt
📋 Requirements
The application requires the following packages:
streamlit
metadata
yake
pandas
numpy
wordcloud
matplotlib
spacy
rematplotlib
You can install all dependencies using the requirements.txt file.
🚀 Running the Application
To run the Streamlit application:
streamlit run streamlit_app.py
The application will open in your default web browser.
🖥️ Application Features
The Streamlit application provides:
Interactive Parameter Selection:
Adjust max ngram size
Set deduplication threshold
Choose number of keywords to extract
Select deduplication algorithm
Multiple Visualization Options:
Text highlighting of extracted keywords
Word cloud generation
Tabular display of keywords with scores
Sample Texts:
Pre-loaded example texts for demonstration
Option to input custom text
🧠 About YAKE
YAKE (Yet Another Keyword Extractor) is an unsupervised, corpus-independent algorithm for extracting keywords from individual documents. It relies on statistical features such as:
Term casing
Term position
Word frequency
Word relatedness (contextual co-occurrence)
Word dispersion across sentences
YAKE does not rely on dictionaries, thesauri, or training corpora, making it applicable to documents in different languages without additional knowledge.
Original paper:
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., & Jatowt, A. (2018). YAKE! Collection-Independent Automatic Keyword Extractor. Proceedings of ECIR, pp. 806–810.
pdf
📂 File Structure
--demo
streamlit_app.py: The main Streamlit application file