You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The AggMo class also has an "exponential form" constructor. In this case the damping vector is specified by two hyparameters, K - the number of beta values, and a - the exponential scale factor. For i=0...K-1 , each beta_i = 1 - a^i .
The following is equivalent to using the beta values [0, 0.9, 0.99]:
Code to run experiments can be found in the src directory. Each task and optimizer has their own config file which can be easily overridden from the command line.
The first argument points to the task configuration. The optimizer is specified with --optim <optimizer_name>. Additional config overrides can be given after -o in the format e.g. -o optim.lr_schedule.lr_decay=0.5.
The optimizer configs do not provide optimal hyperparameters for every task.
The LSTM code is not directly included here. We made direct use of the official code from "Regularizing and Optimizing LSTM Language Models". You can run these experiments by using the AggMo optimizer within this repository. The model hyperparameters used are detailed in the appendix.
About
Code for "Aggregated Momentum: Stability Through Passive Damping", Lucas et al. 2018