You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GPT-Echo: Open-source research combining pretrained GPT models and Echo State Networks (ESNs) for memory in next token prediction tasks. Chatbot example included.
GPT-Echo is open source research which uses pretrained GPT models to generate embeddings that are then fed into an echo state network (ESN) for memory.
The ESN acts as a contextualizer, preserving semantic information from the GPT embeddings to aid downstream tasks. (thanks GPT4)
The only trainable layer is the readout layer which makes training costs potentially comparable to a fine-tune.
This will train the echo network and save emoji.pth after training for 10 epochs with a 1024 reservoir size, context length of 128 training with cross entropy loss.
Scaling
This approach has not been scaled. In toy tasks larger foundation models do better.
If you do scale this make sure to do a grid search, a lot of options have to be just right.
Grid search
Edit search.py and set your options.
Run it with the same arguments you'd use to train.
Experimental features
These are experimental. They work but do not guarantee better results and can slow down training.
--usecot - this trains 2 different ESN networks, a mediator and a generator. The mediator than then potentially be used to direct generator sampling(needs more research).
--forwardforward - uses Hinton's forward-forward training the readout layer. For negative samples it support either random uniform, a custom negative dataset, or sampling from the base model.
GPT-Echo: Open-source research combining pretrained GPT models and Echo State Networks (ESNs) for memory in next token prediction tasks. Chatbot example included.