You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Then this data is processed by train_trade_data_deepseek_sentiment.py and train_trade_data_deepseek_risk.py to generate agent-ready datasets.
For plain PPO and CPPO, train_trade_data.py is used.
Training and Environments
For training PPO, run: nohup mpirun --allow-run-as-root -np 8 python train_ppo.py > output_ppo.log 2>&1 &
For CPPO: train_cppo.py
For PPO-DeepSeek: train_ppo_llm.py
For CPPO-DeepSeek: train_cppo_llm_risk.py
Environment files are:
env_stocktrading.py for PPO and CPPO, same as in the original FinRL
env_stocktrading_llm.py or env_stocktrading_llm_01.py for PPO-DeepSeek (depending on the desired LLM influence. More tweaking would be interesting)
env_stocktrading_llm_risk.py or env_stocktrading_llm_risk_01.py for CPPO-DeepSeek
Log files are output_ppo.log, etc., and should be monitored during training, especially:
AverageEpRet
KL
ClipFrac
Evaluation
Evaluation in the trading phase (2019-2023) happens in the FinRL_DeepSeek_backtest.ipynb Colab notebook.
Metrics used are Information Ratio, CVaR, and Rachev Ratio, but adding others like Outperformance frequency would be nice.
About
Code for the paper "FinRL-DeepSeek: LLM-Infused Risk-Sensitive Reinforcement Learning for Trading Agents" arXiv:2502.07393