You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 1, 2021. It is now read-only.
Some common distributed learning algorithms built in Torch
with the help of the the ipc library.
AllReduceSGD
Spreads the computation of gradients for mini-batch of items
across N processes. Uses AllReduce to quickly sum the gradients
and distribute the total back out to every process.
localallReduceSGD=require'distlearn.AllReduceSGD'(tree)
-- Make sure all the nodes start with the same parameter valuesallReduceSGD.synchronizeParameters(params)
for_=1,epochsdofor_=1,steps-- Compute your gradients as normallocalgrads=computeYourGrads(...)
-- Sum and normalize themallReduceSGD.sumAndNormalizeGradients(grads)
-- Do your SGD as normalSGD(params, grads)
end-- Before validating we should make sure all nodes have-- the exact same parameter valuesallReduceSGD.synchronizeParameters(params)
-- Validate...end
When used in combination with Dataset you can quickly parallelize
the processing of large datasets without a ton of effort. See the
MNIST example for a complete working setup.
AllReduceEA
We also have a AllReduce based implementation of the Elastic
Averaging algorithm as described in Deep learning with Elastic Averaging SGD.
Its just as easy to add this to your training script, there
are only two parameters required tau and alpha. Tau is how
many steps to run before averaging the nodes and alpha is
the weight used during the averaging step. You can read
more about our implementation of AllReduceEA.
-- Use a tau of 10 and an alpha of 0.2localallReduceEA=require'distlearn.AllReduceEA'(tree, 10, 0.2)
-- Make sure all the nodes start with the same parameter valuesallReduceEA.synchronizeParameters(params)
for_=1,epochsdofor_=1,steps-- Compute your gradients as normallocalgrads=computeYourGrads(...)
-- Do your SGD as normalSGD(params, grads)
-- Average the paramsallReduceEA.averageParameters(params)
end-- Make sure the center's haven't drifted too far due to-- floating point precision error build upallReduceEA.synchronizeCenter(params)
-- Validate...end