Main requirement:
- Pytorch 0.4.1
If you miss any package, please install it by: pip install missing_package
Change following parameters:
skiprows: Number of row you want to skip
Ex: If you want to skip 30k data,skiprows=(1,30000)nrows: Number of data / class
Default: 50000 data / classroot_csv: Directory of yourtrain_simplifiedfolder
Ex:/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/csv/train_simplified/split_csv: Directory where you want to save splited data into
Ex:
/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/50k/
Run:
python split_data_top.py
Output:
There are 340 csv files of train and valid are saved at your split_csv. Each csv file has nrows data.
-
Configure
train.yml
In this file, please change themain parametersas following:-
train_split
Path totrainfolder:{split_csv}/train -
train_token
Dont care, but it is same astrain_split -
valid_split
Path tovalidfolder:{split_csv}/valid -
valid_token
Dont care, but it is same asvalid_split
You can change other parameters
workers,batch_size, ... to be suiatable for your environement -
-
Run
bash run_model.sh
Log and checkpoints will be saved to
./logs/se_resnext101_50k. Change it as you want
-
Configure
inference.yml
In this file, please change:infer_csv
Path to yourtest_simplified.csvfile
-
Run
bash predict_5best.sh
We use multiple checkpoints (snapshot) during training. Ensembling 5 best checkpoints will give free 0.0005 boost.
Outputs are thelogitswill be saved into yourlogfolder that you defined above
-
Configure
inference.yml
In this file, please change:infer_csv
Comment this linedata_clean_train
Path to train data you want to cleandata_clean_valid
Path to valid data you want to clean
-
Run
bash predict_data_for_clean.sh
Please change to the best checkpoint of your model you use for clean data
Ex:LOGDIR=$(pwd)/logs/clean_model_2_resnet34/
In this file, change following parameter correct to your environment
-
data_clean_train
Path totrain datayou want to clean.
Ex:/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/30k/data_2/train/ -
data_clean_valid
Path tovalid datayou want to clean.
Ex:/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/30k/data_2/valid/ -
data_clean_train_out
Output of train data after clean.
Ex:/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/30k/data_2/train/ -
data_clean_valid_out
Output of valid data after clean.
Ex:/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/30k/data_2_cleannn/valid/ -
data_train_predict
Logit prediction ofdata_clean_trainwhen using a model to predict.
Ex:./logs/clean_model_1_resnet34/dataset.predictions.data_2_train.logits.satge1.5.npy -
data_valid_predict
Logit prediction ofdata_clean_validwhen using a model to predict.
Ex:./logs/clean_model_1_resnet34/dataset.predictions.data_2_valid.logits.satge1.5.npy
python make_submission.pyMake sure you change correct log_dir in make_submission.py
Define the resume in train.yml and Run model again. Usually, we will resume from checkpoint.best.pth.tar
in the logs folder.
From torchvision package:
- ResNet (
resnet18,resnet34,resnet50,resnet101,resnet152) - DenseNet (
densenet121,densenet169,densenet201,densenet161) - Inception v3 (
inception_v3) - VGG (
vgg11,vgg11_bn,vgg13,vgg13_bn,vgg16,vgg16_bn,vgg19,vgg19_bn) - SqueezeNet (
squeezenet1_0,squeezenet1_1) - AlexNet (
alexnet)
From Pretrained models for PyTorch package:
- ResNeXt (
resnext101_32x4d,resnext101_64x4d) - NASNet-A Large (
nasnetalarge) - NASNet-A Mobile (
nasnetamobile) - Inception-ResNet v2 (
inceptionresnetv2) - Dual Path Networks (
dpn68,dpn68b,dpn92,dpn98,dpn131,dpn107) - Inception v4 (
inception_v4) - Xception (
xception) - Squeeze-and-Excitation Networks (
senet154,se_resnet50,se_resnet101,se_resnet152,se_resnext50_32x4d,se_resnext101_32x4d) - PNASNet-5-Large (
pnasnet5large) - PolyNet (
polynet)