CLIP (With Haiku + Jax!)

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision.

Details

The ViT model and checkpoints have been ported to Haiku, while preserving the same output. See tests/test_consistency.py for details.

No JIT/pmap is performed, but pure inference functions for both the text and image encoders are provided from the the clip_jax.load() function which should be easy to run/parallelize how you wish. See test/tpu_bench.py for an example of using pmap.

Usage Example

import numpy as np
from PIL import Image
import clip_jax
image_fn, text_fn, jax_params, jax_preprocess = clip_jax.load('ViT-B/32', "cpu")
image = np.expand_dims(jax_preprocess(Image.open("CLIP.png")), 0)
text = clip_jax.tokenize(["a diagram", "a dog", "a cat"])
image_embed = image_fn(jax_params, image)
text_embed = text_fn(jax_params, text)

Usage Example (RN50)

import numpy as np
from PIL import Image
import clip_jax
image_fn, text_fn, jax_params, jax_preprocess, state = clip_jax.load('RN50', "cpu")
image = np.expand_dims(jax_preprocess(Image.open("CLIP.png")), 0)
text = clip_jax.tokenize(["a diagram", "a dog", "a cat"])
image_embed, state = image_fn(jax_params, state, image)
text_embed, state  = text_fn(jax_params, state, text)

TPU performance

On a TPU v3-8 with Jax tpu-vm alpha (test/tpu_bench.py):

10.1361s to compile model
43.9599s for 16 batches
5963.25 examples/s

TODOs

Test on TPUs
Easier control over precision and device placement
Mixed precision training support
Support RN50 model

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
clip_jax		clip_jax
notebooks		notebooks
tests		tests
.gitignore		.gitignore
CLIP.png		CLIP.png
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
model-card.md		model-card.md
requirements.txt		requirements.txt
setup.py		setup.py
tpu_install.sh		tpu_install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CLIP (With Haiku + Jax!)

Details

Usage Example

Usage Example (RN50)

TPU performance

TODOs

About

Uh oh!

Releases

Packages

Languages

License

kingoflolz/CLIP_JAX

Folders and files

Latest commit

History

Repository files navigation

CLIP (With Haiku + Jax!)

Details

Usage Example

Usage Example (RN50)

TPU performance

TODOs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages