The source code has been used for our papers at the ICCV 2023 workshop and BMVC 2023. If you are involving the source code in your research, please consider citing our papers:
@InProceedings{Le_2023_ICCV,
author = {L\^e, Ho\`ang-\^An and Pham, Minh-Tan},
title = {Self-Training and Multi-Task Learning for Limited Data: Evaluation Study on Object Detection},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {October},
year = {2023},
pages = {1003-1009}
}
@inproceedings{Le_2023_BMVC,
author = {L\^e, Ho\`ang-\^An and Pham, Minh-Tan},
title = {Data exploitation: multi-task learning of object detection and semantic segmentation on partially annotated data},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year = {2023}
}We use the anaconda for managing environment, all the packages and installation can be found in the environments.yml and installed by running the following command.
conda env create --name envname --file=environments.yml
Download the Pascal VOC2007
and VOC212 datasets and place
them in the datasets directory folow the following structure.
datasets/VOCdevkit
|-- VOC2007
| |-- Annotations
| |-- ImageSets
| |-- JPEGImages
| |-- SegmentationClass
| `-- SegmentationObject
|-- VOC2012
| |-- Annotations
| |-- ImageSets
| |-- JPEGImages
| |-- SegmentationClass
| `-- SegmentationObjectThe splits tailored for the experiments used in the paper for the VOC dataset
are provided at datasets/imgsetVOC. They are to replace the original
ImageSets given in the 2 directories VOC2007 and VOC2012. The following
script backs up the original directories and creates a symlink to the provided
ones.
cd multas/datasets/
mv VOCdevkit/VOC2007/ImageSets VOCdevkit/VOC2007/ImageSets_org # backing up
mv VOCdevkit/VOC2012/ImageSets VOCdevkit/VOC2012/ImageSets_org # backing up
ln -s $(pwd)/imgsetVOC/VOC2007/ImageSets VOCdevkit/VOC2007/
ln -s $(pwd)/imgsetVOC/VOC2012/ImageSets VOCdevkit/VOC2012/Scripts are provided in data/scripts to automate the process and can be run by
the following command
./data/scripts/VOC2007.sh datasets/Download the SBD dataset
and read the mat files using scipy.io.loadmat on python. The segmentation can
be accessed via mat["GTcls"][0]["Segmentation"][0].
Download the COCO 2017 dataset and place them in the datasets directory
following the structure
datasets/coco2017
|-- annotations
|-- subsets
|-- train2017
| |-- 00000000009.jpg
| |-- 00000000025.jpg
| |-- 00000000030.jpg
| |-- 00000000034.jpg
| |-- ...
|-- val2017
| |-- 00000000139.jpg
| |-- 00000000285.jpg
| |-- 00000000632.jpg
| |-- 00000000724.jpg
| |-- ...The subsets directory is provided in datasets/subsetsCOCO. You can create a
symlink using the following commands
cd multas/datasets/
ln -s $(pwd)/subsetsCOCO VOCdevkit/VOC2007/subsetsTraining teacher network
python train.py --seed 0 --size 320 --batch_size 5 --lr 0.01 --eval_epoch 1\
--double_aug --match mg --conf_loss gfc \
--backbone resnet50 --neck pafpn \
--dataset VOC --imgset Halfwhere imgset is in Half, Quarter, or Eighth
Training student network
python distil.py --seed 0 --size 320 --batch_size 10 --lr 0.01 --eval_epoch 1\
--double_aug --match iou --conf_loss gfc \
--backbone resnet18 --neck fpn \
--dataset VOC --imgset Half \
--teacher_backbone resnet50 --teacher_neck pafpn \
--kd hard+pdf --tdet_weights [path/to/teacher/weights.pth]where
kdcan behardfor supervised training orsoft,soft+mse,soft+pdf,soft+defeatfor self-training.imgsetcan beMain,Half,Quarter,Eighthfor the overlapping training sets orHalf2,3Quarter,7Eighthfor the complementary sets. To simulate the scenario of complete lack of tranining annotation, theMainimage set should only be used withsoft-based distillation.
python train.py --seed 0 --size 320 --batch_size 7 --lr .001 --nepoch 100 \
--backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
--task det+seg --eval_epoch 1- For
datasetandimgsetparameters:MXEanddet+seg: for the mutually-exclusive detection and segmentation subset of Pascal VOC. ReplaceMXEbyMXSorMXTfor the same images but modified label space.COCOandEighth+Ei2ght: for 2 mutually-exclusive subsets accounted for 1/8 of the original COCO dataset (14655 images).
- use
task_weightsto systematically scale the loss of each task, e.g.1.0+2.0means the losses for semantic segmentation are doubled while the losses of detection stay the same, default to1.0(=1.0+1.0). eval_epoch: per-epoch evaluation during training.0means none (default),1means every epoch starting after 3/4 ofnepochor an arbitrary non-zero integer to start after that epoch number.
We provide different weak losses to benefit one task using the provided ground truths of the other. This part has been published at BMVC 2024. If you are involving the source code in your research, please consider citing our paper:
@inproceedings{Le_2024_BMVC,
author = {L\^e, Ho\`ang-\^An and Berg, Paul and Pham, Minh-Tan},
title = {Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning},
booktitle = {35th British Machine Vision Conference 2023, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year = {2024},
url = {https://papers.bmvc2024.org/0753.pdf}
}To activate the Mask-for-Box module, use the --M4B flags with the argument
L+Cto optimize both localization and classification losses.- Add a zero
0in front of each letter to disable the respective loss, e.g.0L+Cto optimize only the classification loss andL+0Cto optimize only the localization loss.
Full training commands for M4B refined
# enforce only C loss
python train.py --seed 0 --size 320 --batch_size 5 --lr .001 --nepoch 100 \
--backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
--task det+seg --eval_epoch 1 --M4B 0L+C
# enforce only L loss
python train.py --seed 0 --size 320 --batch_size 5 --lr .001 --nepoch 100 \
--backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
--task det+seg --eval_epoch 1 --M4B L+0C
# enforce both losses
python train.py --seed 0 --size 320 --batch_size 5 --lr .001 --nepoch 100 \
--backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
--task det+seg --eval_epoch 1 --M4B L+CTo activate the Box-for-Mask module, use the --B4M, there are two relevant
arguments:
--queue(default to 1) sets the length of MOCO-style feature queue--alpha(default to 0.1) sets the margin for the triplet loss
Before training: The B4M module uses pseudo semantic masks generated by
cv2.grabcut, see utils/generate_semseg_grabcut.py for more information.
The generated masks for the MXE detection split (originally PascalVOC)
used in our experiments are given for your convenience at
SegmentationClassAug_MGGC.zip. Extract it into
datasets/VOCdevkit/VOC2012/SegmentationClass_MGGC/
mkdir datasets/VOCdevkit/VOC2012/SegmentationClass_MGGC
unzip SegmentationClassAug_MGGC/zip -d datasets/VOCdevkit/VOC2012/SegmentationClass_MGGC/Full training commands
# Activate B4M with default queue and alpha parameters
python train.py --seed 0 --size 320 --batch_size 5 --lr .001 --nepoch 100 \
--backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
--task det+seg --eval_epoch 1 --B4M
# Activate B4M and set the queue length to 5
python train.py --seed 0 --size 320 --batch_size 5 --lr .001 --nepoch 100 \
--backbone resnet18 --neck fpn --dataset MXE --imgset det+seg \
--task det+seg --eval_epoch 1 --B4M --queue 5The repo is based on this repo by @zhanghengdev.