| CARVIEW |
Train machine learning models fast.
$ conda create -n ffcv python=3.9 cupy pkg-config libjpeg-turbo opencv pytorch torchvision cudatoolkit=11.6 numba -c conda-forge -c pytorch && conda activate ffcv && conda update ffmpeg && pip install ffcv
Keep your training code intact
Drop-in replacement for existing loaders
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
train_ds = datasets.ImageFolder('/pth/to/data',
transform=transforms.Compose([
transforms.ToTensor(),
transforms.RandomResizedCrop(),
transforms.RandomHorizontalFlip(p=0.5),
transforms.Normalize(MEAN, STDEV)
])
train_loader = DataLoader(train_ds,
shuffle=True,
batch_size=512,
num_workers=8)
for ims, labs in train_loader:
ims = ims.half()
.cuda(non_blocking=True)
.to(memory_format=ch.channels_last)
# Model training...
from ffcv.loader import Loader, OrderOption
from ffcv.fields.decoders import \
RandomResizedCropRGBImageDecoder
from ffcv.transforms import *
import torchvision as tv
train_loader = Loader('/pth/to/data.beton', batch_size=512,
num_workers=8, order=OrderOption.RANDOM,
pipelines={'image': [
RandomResizedCropRGBImageDecoder((224, 224)),
ToTensor(),
# Move to GPU asynchronously as uint8:
ToDevice(ch.device('cuda:0')),
# Automatically channels-last:
ToTorchImage(),
Convert(ch.float16),
# Standard torchvision transforms still work!
tv.transforms.Normalize(MEAN, STDEV)
]})
# Prefetching, caching, move to GPU, all handled!
for ims, labs in train_loader:
# Model training (FAST!)
Train ImageNet in minutes (not days)
FFCV cuts training times and comes with simple optimized code for standard datasets
Optimized for speed and usability
Drop-in speed
FFCV doesn't require you to change any training code: make training faster by just replacing the data loading and augmenattion pipeline.
More models per GPU
Thanks to fully asynchronous thread-based data loading, you can now interleave training multiple models on the same GPU efficiently, without any data overhead.
Remove bottlenecks
FFCV allows you to shift compute load between GPU, CPU, disk, and memory to eliminate bottlenecks under almost any resource constraint.
Custom (fast) pipelines
This isn't just about fast data loading: FFCV automatically fuses and compiles the data processing pipeline into machine code. Users can build their own compiled data transformations through a simple Python API, or just continue using standard PyTorch data transformations.
Hyper-optimized
Everything about FFCV is optimized: it carefully handles the caching, preloading, threading, scheduling, compilation, etc. so that you don't have to. The numbers speak for themselves.
Docs and support
FFCV comes with continually updating documentation that includes a variety of example use cases. The projects maintainers can also be reached through an FFCV Slack workspace.