You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data is food for AI, and there is vast potential for model performance improvement by shifting from a model-centric to a data-centric approach. That is the motivation behind the recent Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI.
In this repo, I unveil the methods (and codes) of my Top 5% ranked submission (~84% accuracy, ranked 24), including the various techniques that worked and did not work for me. Do check out the Medium article for a more in-depth look at my thought process and methods behind the submission.
A collaboration between DeepLearning.AI and Landing AI, the Data-Centric AI Competition aims to elevate data-centric approaches to improving the performance of machine learning models.
In most machine learning competitions, you are asked to build a high-performance model given a fixed dataset.
However, machine learning has matured to the point that high-performance model architectures are widely available, while approaches to engineering datasets have lagged.
The Data-Centric AI Competition inverts the traditional format and instead asks you to improve a dataset given a fixed model. We will provide you with a dataset to improve by applying data-centric techniques such as fixing incorrect labels, adding examples that represent edge cases, apply data augmentation, etc.
Contents
Full_Notebook_Best_Submission.ipynb (Complete walkthrough codes for the best submission I submitted for the competition)
experiment_tracker.csv (Spreadsheet tracker I used to monitor my various experiments)
/data (Public Roman MNIST dataset released by the competition)
About
Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI