You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Upload any file(s) or enter any path or url to create Knowledge Bases which can contain multiple files of any type, format and content and create Smart FAQs which are lists of curated numbered Q&As.
The data source or files are loaded and splitted into text document chunks
The text document chunks are embedded using openai or huggingface embeddings
The embeddings are stored as a vector dataset to activeloop's database hub
A langchain is created consisting of a custom selection of an LLM model (gpt-3.5-turbo by default), multiple vector store as knowledge bases and a single special smart FAQ vector store
When asking questions to the app, the chain embeds the input prompt and does a similarity search in in the provided vector stores and uses the best results as context for the LLM to generate an appropriate response
Finally the chat history is cached locally to enable a ChatGPT like Q&A conversation
Good to know
The app only runs on py>=3.10!
To run locally or deploy somewhere, execute cp .env.template .env and set credentials in the newly created .env file. Other options are manually setting of system environment variables, or storing them into .streamlit/secrets.toml when hosted via streamlit.
If you have credentials set like explained above, you can just hit submit in the authentication without reentering your credentials in the app.
If you run the app consider modifying the configuration in datachad/backend/constants.py, e.g enabling advanced options
Your data won't load? Feel free to open an Issue or PR and contribute!
Use previous releases like V1 or V2 for original functionality and UI
How does it look like?
TODO LIST
If you like to contribute, feel free to grab any task
Refactor utils, especially the loaders
Add option to choose model and embeddings
Enable fully local / private mode
Add option to upload multiple files to a single dataset
Decouple datachad modules from streamlit
remove all local mode and other V1 stuff
Load existing knowledge bases
Delete existing knowledge bases
Enable streaming responses
Show retrieved context
Refactor UI
Introduce smart FAQs
Exchange downloaded file storage with tempfile
Add user creation and login
Add chat history per user
Make all I/O asynchronous
Implement FastAPI routes and backend app
Implement a proper frontend (react or whatever)
containerize the app
About
Ask questions about any data source by leveraging langchains