You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Requirement
Check the requirements.txt
TODO
replay buffer
Run
It will automatically download the data in data.py.
python3train.py
Distributed Training
GAN-like networks are particularly challenging given that they often use multiple optimizers.
In addition, GANs also consume a large amont of GPU memory and are usually batch-size sensitive.
To speed up training, we thus use a novel KungFu distributed training library.
KungFu is easy to install and run (compared to today's Horovod library
which depends on OpenMPI). You can install it using a few lines by following
the instruction. KungFu is also very fast and scalable, compared
to Horovod and parameter servers, making it an attractive option for GAN networks.
In the following, we assume that you have added kungfu-run into the $PATH.
The default KungFu optimizer is sma which implements synchronous model averaging.
The sma decouple batch size and the number of GPUs, making it hyper-parameter-robust during scaling.
You can also use other KungFu optimizers: sync-sgd (which is the same as the DistributedOptimizer in Horovod)
and async-sgd if you train your model in a cluster that has limited bandwidth and straggelers.
(ii) To run on 2 machines (which have the nic eth0 with IPs as 192.168.0.1 and 192.168.0.2):