This repo contains code examples that demonstrate how to use cleanlab with real-world models/datasets, how its underlying algorithms work, how to get better results from cleanlab via more advanced functionality than is demonstrated in the quickstart tutorials, and how to train certain models used in some tutorials.
To quickly learn the basics of running cleanlab on your own data, we recommend first starting here before diving into the examples below.
Example | Description | |
---|---|---|
1 | find_label_errors_iris | Find label errors introduced into the Iris classification dataset. |
2 | classifier_comparison | Use CleanLearning to train 10 different classifiers on 4 dataset distributions with label errors. |
3 | hyperparameter_optimization | Hyperparameter optimization to find the best settings of CleanLearning's optional parameters. |
4 | simplifying_confident_learning | Straightforward implementation of Confident Learning algorithm with raw numpy code. |
5 | visualizing_confident_learning | See how cleanlab estimates parameters of the label error distribution (noise matrix). |
6 | cnn_mnist | Finding label errors in MNIST image data with a Convolutional Neural Network |
7 | huggingface_keras_imdb | CleanLearning for text classification with Keras Model + pretrained BERT backbone and Tensorflow Dataset. |
8 | fasttext_amazon_reviews | Finding label errors in Amazon Reviews text dataset using a cleanlab-compatible FastText model |
9 | multiannotator_cifar10 | Iteratively improve consensus labels and trained classifier from data labeled by mulitple annotators. |
10 | outlier_detection_cifar10 | Train AutoML for image classification and use it to detect out-of-distribution images. |
11 | entity_recognition | Train Transformer model for Named Entity Recognition and produce out-of-sample pred_probs for cleanlab.token_classification. |
12 | cnn_coteaching_cifar10 | Train a Convolutional Neural Network on noisily labeled Cifar10 image data using cleanlab with coteaching. |
To run the latest example notebooks, execute the commands below which will install the required libraries in a virtual environment.
$ python -m pip install virtualenv
$ python -m venv cleanlab-examples # creates a new venv named cleanlab-examples
$ source cleanlab-examples/bin/activate
$ python -m pip install -r requirements.txt
It is recommended to run the examples with the latest stable cleanlab release (pip install cleanlab
).
However be aware that notebooks in the master branch of this repository are assumed to correspond to master branch version of cleanlab, hence some very-recently added examples may require you to instead install the master branch of cleanlab (pip install git+https://github.com/cleanlab/cleanlab.git
).
You may run the notebooks individually or run the bash script below which will execute and save each notebook (for examples: 1-7). Note that before executing the script to run all notebooks for the first time you will need to create a jupyter kernel named cleanlab-examples
. Be sure that you have already created and activated the virtual environment (steps provided above) before running the following command to create the jupyter kernel.
$ python -m ipykernel install --user --name=cleanlab-examples
Bash script to run all notebooks:
$ bash ./run_all_notebooks.sh
Instead of installing the requirements for all examples simultaneously via pip install -r requirements.txt
, you can alternatively install only the requirements for one particular example by executing this same command inside of the corresponding folder. This will require that you have installed cleanlab (pip install cleanlab
), and some examples may require you to have the latest developer version of cleanlab from github (pip install git+https://github.com/cleanlab/cleanlab.git
).
For running older versions of cleanlab, you can look at the Tagged Releases of this repository to see the corresponding older versions of the example notebooks.
See the contrib
folder for examples from v1 of cleanlab which may be helpful for reproducing results from the Confident Learning paper.
Copyright (c) 2017-2022 Cleanlab Inc.
All files listed above and contained in this folder (https://github.com/cleanlab/examples) are part of cleanlab.
cleanlab is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
cleanlab is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License in LICENSE.