You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
jpeg729 edited this page Jan 22, 2018
·
2 revisions
What happens if I replace sigmoid with tanh in RNN gates?
Motivation
sigmoid(x) is typically used for gates, but it is not symetric. Now sigmoid(+2) ~= 0.88 and the gate is open, but sigmoid(-2) ~= 0.12 and the gate is closed.
Now my input data is largely symetric so I wonder whether a more symetric gating function would speed up learning. The Strongly Typed RNN paper mentioned above uses tanh as an output gating function in some cases. With tanh we have tanh(+2) ~= 0.96, tanh(-2) ~= -0.96, and tanh(0) = 0. Nice and symetric.
Another option would be to use the TernaryTanh activation function that is like tanh but is flat around 0. f(x) = 1.5 * tanh(x) + 0.5 * tanh(−3 * x)
Method
I tried it out on the mackey glass series using a variety of RNN types. I opted for one layer of 50 units, keeping the number of units constant rather than seeking to keep the number of parameters constant.
The command line I used was python experiment.py --data mackey_glass --epochs 15 --layers ???_50 --sigmoid ???
15 epochs is sufficient in most cases for training to slow to a crawl. For better results I should let the tests run much longer and calculate an average of 5 runs. Nevertheless it is interesting to note that some models learn really quickly from the get go.