Synergistic Perception-Guided Network for Camouflaged Object Detection > Baber Jan1,2, Saeed Anwar3, Aiman H. El-Maleh1, Abdul Jabbar Siddiqui1, Abdul Bais4 > 1King Fahd University of Petroleum and Minerals 2SDAIA-KFUPM Joint Research Center for Artificial Intelligence 3University of Western Australia 4University of Regina
Under Review at IEEE Transactions on Image Processing
Camouflaged object detection segments objects with intrinsic similarity and edge disruption. Current detection methods rely on accumulated complex components. Each approach adds boundary modules, attention mechanisms, and multi-scale processors independently. This accumulation creates computational burden without proportional gains. To manage complexity, methods process at reduced resolutions, eliminating fine details essential for camouflage. SPEGNet addresses fragmentation through synergistic design where complementary components work in concert. The architecture integrates multi-scale features via channel calibration and spatial enhancement. Boundaries emerge directly from context-rich representations, maintaining semantic-spatial alignment. Progressive refinement implements scale-adaptive edge modulation with peak influence at intermediate resolutions.
-
Synergistic architecture achieving state-of-the-art COD performance without modular accumulation. Excels at intricate details, small and large pattern-similar objects, multiple instances, occlusions, and ambiguous boundaries.
-
Channel recalibration with spatial pooling in a single module that amplifies object features while suppressing similar background patterns.
-
Direct boundary extraction from contextual features preserving semantic meaning. Edges know what object they belong to, preventing false boundaries in textured regions.
-
Non-monotonic edge influence (20%→33%→0%) across decoder stages. Peak influence at middle resolution captures camouflage boundaries most effectively.
SPEGNet architecture with data flow through four components: Feature Encoding, Contextual Feature Integration, Edge Feature Extraction, and Progressive Edge-guided Decoder with scale-adaptive modulation
SPEGNet processes camouflaged images through synergistic modules. A hierarchical vision transformer encoder transforms input into multi-scale features. Three complementary modules then process these features:
-
Contextual Feature Integration (CFI): Addresses intrinsic similarity through integrated channel-spatial processing. Performs channel recalibration to identify discriminative features, then applies spatial enhancement to capture multi-scale patterns.
-
Edge Feature Extraction (EFE): Derives boundary information directly from context-enhanced features. Maintains semantic understanding throughout edge extraction, ensuring boundaries align with actual objects rather than texture patterns.
-
Progressive Edge-guided Decoder (PED): Implements three-stage refinement with scale-adaptive edge modulation. Stage 1 establishes object localization with 20% edge influence. Stage 2 maximizes boundary precision with 33% edge influence. Stage 3 ensures region consistency with 0% edge influence.
| Dataset | Sα ↑ | Fβw ↑ | Fβm ↑ | Eφ ↑ | MAE ↓ |
|---|---|---|---|---|---|
| CAMO | 0.887 | 0.870 | 0.882 | 0.943 | 0.037 |
| COD10K | 0.890 | 0.839 | 0.847 | 0.949 | 0.020 |
| NC4K | 0.895 | 0.860 | 0.870 | 0.947 | 0.025 |
Inference Speed: 16.5ms per image (60+ FPS) with effective scaling to high resolutions
Visual comparison on challenging scenarios. Columns: (a) input, (b) ground truth, (c-h) competing methods, (i) SPEGNet. Rows demonstrate: (i) small object detection, (ii) large object with pattern similarity, (iii) multiple instances, (iv) occlusion handling, (v) ambiguous boundaries. SPEGNet consistently outperforms across all scenarios.
SPEGNet generalizes to medical imaging and agricultural applications
SPEGNet's synergistic design generalizes to related domains without modifications:
- Polyp Detection: Kvasir, CVC-Clinic, ColonDB, CVC300, ETIS datasets
- Skin Lesion Segmentation: ISIC, PH2, HAM10K datasets
- Breast Lesion Detection: BCSD, DMID datasets
- Pest Detection: Locust-mini dataset for insects with natural camouflage
SPEGNet requires the following environment:
- Python 3.10
- CUDA 11.8 with compatible GPU (Tested on NVIDIA H100)
- Anaconda or Miniconda for environment management
- Clone the repository:
git clone https://github.com/Baber-Jan/SPEGNet.git
cd SPEGNet- Run the setup script which handles all environment setup and validation:
chmod +x setup/setup.sh
./setup/setup.shThe setup script performs several key tasks:
- Creates the conda environment with required dependencies
- Validates dataset organization if present
- Generates edge maps for CAMO dataset if needed
- Verifies system requirements
Alternatively, you can manually create the environment:
conda env create -f setup/environment.yml
conda activate spegnetSPEGNet/
├── configs/
│ └── default.yaml # Training configuration
├── datasets/ # Dataset directory
├── checkpoints/ # Model weights
├── models/
│ ├── spegnet.py # Main SPEGNet model
│ ├── feature_encoding.py # Hiera encoder wrapper
│ ├── feature_integration.py # CFI module
│ └── object_detection.py # EFE and PED modules
├── utils/
│ ├── data_loader.py # Dataset loading
│ ├── loss_functions.py # Multi-scale loss
│ ├── metrics.py # Evaluation metrics
│ └── visualization.py # Result visualization
├── engine/
│ ├── trainer.py # Training engine
│ ├── evaluator.py # Evaluation engine
│ └── predictor.py # Prediction engine
├── setup/
│ ├── environment.yml # Conda environment
│ └── setup.sh # Setup script
└── main.py # Entry point
Download our COD datasets package containing CAMO, COD10K, and NC4K:
- Download Datasets (Google Drive) (~2.19GB)
After downloading, extract the datasets to the datasets/ folder in your SPEGNet root directory (as shown in Project Structure above):
unzip datasets.zip -d SPEGNet/Expected directory structure:
SPEGNet/
├── configs/
├── datasets/ # Extract datasets here
│ ├── CAMO/
│ │ ├── train/
│ │ │ ├── Imgs/ # Training images
│ │ │ ├── GT/ # Ground truth masks
│ │ │ └── Edges/ # Edge maps (auto-generated)
│ │ └── test/
│ │ ├── Imgs/
│ │ └── GT/
│ ├── COD10K/
│ │ ├── train/
│ │ │ ├── Imgs/
│ │ │ ├── GT/
│ │ │ └── Edges/
│ │ └── test/
│ │ ├── Imgs/
│ │ └── GT/
│ └── NC4K/
│ └── test/
│ ├── Imgs/
│ └── GT/
├── checkpoints/
└── ...
-
SAM2.1 Hiera-Large Encoder (Required for training only): Download from Meta AI (~897MB)
After downloading, place in the
checkpoints/folder:SPEGNet/ ├── configs/ ├── checkpoints/ # Place encoder weights here │ └── sam2.1_hiera_large.pt ├── datasets/ └── ... -
SPEGNet Model Checkpoint: Google Drive (~2.6GB)
- Trained SPEGNet model for evaluation/inference
- Place in:
checkpoints/model_best.pth
-
Test Predictions
Following are segmentation masks generated by SPEGNet (reported in the paper) for each test dataset:
- CAMO: Google Drive
- COD10K: Google Drive
- NC4K: Google Drive
All training configurations are specified in configs/default.yaml - review and modify hyperparameters, loss weights, and model settings as needed.
Train SPEGNet on COD10K + CAMO:
python main.py train --config configs/default.yamlTraining features:
- Automatic mixed precision (AMP) for efficient GPU utilization
- Multi-scale supervision with edge guidance
- Regular checkpointing and validation
- Early stopping with configurable patience
Evaluate on test datasets:
python main.py evaluate --model checkpoints/model_best.pthEvaluation provides:
- Standard COD metrics (Sα, Fβw, MAE, Eφ, Fβm)
- Quality-based result categorization
- Per-dataset performance analysis
For inference on new images:
python main.py predict \
--model checkpoints/model_best.pth \
--input path/to/imageThe prediction outputs include:
- Binary segmentation mask
- Confidence heatmap
- Edge map
- Original image overlay
SPEGNet has been tested on:
- NVIDIA H100 GPU with 93.6GB VRAM (optimal setup)
- CUDA 11.8 with PyTorch 2.1.0
- 64 CPU cores for efficient preprocessing
- 193GB system RAM for optimal batch processing
The model supports various GPU configurations through adjustable batch sizes and memory optimization settings.
If you find SPEGNet useful in your research, please cite:
@article{jan2024spegnet,
title={SPEGNet: Synergistic Perception-Guided Network for Camouflaged Object Detection},
author={Jan, Baber and Anwar, Saeed and El-Maleh, Aiman H. and Siddiqui, Abdul Jabbar and Bais, Abdul},
journal={arXiv preprint arXiv:2510.04472},
year={2024},
note={Under review at IEEE Transactions on Image Processing}
}This research was conducted at the SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum and Minerals. We gratefully acknowledge:
- King Fahd University of Petroleum and Minerals (KFUPM) for institutional support
- SDAIA-KFUPM Joint Research Center for Artificial Intelligence (JRCAI) for computational resources
- Meta AI for providing the SAM2.1 foundation model
- PySODMetrics for evaluation framework
- Authors of COD benchmark datasets (CAMO, COD10K, NC4K)
This project is licensed under the MIT License - see LICENSE for details.
- Open source for research and non-commercial use
- Commercial use requires explicit permission
- Attribution required when using or adapting the code
The benchmark datasets (CAMO, COD10K, NC4K) maintain their original licenses. Please refer to their respective papers for terms of use.
For questions or issues:
- GitHub Issues: Create an issue
- Email: baberjan008@gmail.com
- Research Collaborations: Contact Baber Jan at baberjan008@gmail.com
- 2025-01: Initial release with IEEE TIP submission