C3Net: Context-Contrast Network for Camouflaged Object Detection Baber Jan1,2, Aiman H. El-Maleh1, Abdul Jabbar Siddiqui1, Abdul Bais3, Saeed Anwar4 1King Fahd University of Petroleum and Minerals 2SDAIA-KFUPM Joint Research Center for Artificial Intelligence 3University of Regina 4University of Western Australia
Submitted to IEEE Transactions on Artificial Intelligence (TAI)
Camouflaged objects blend with surroundings through similar colors, textures, and patterns. Detection is challenging because traditional methods and foundation models fail on these objects. We identify six challenges: Intrinsic Similarity (IS), Edge Disruption (ED), Extreme Scale Variation (ESV), Environmental Complexities (EC), Contextual Dependencies (CD), and Salient-Camouflaged Object Disambiguation (SCOD). C3Net uses dual-pathway architecture to address all challenges. The Edge Refinement Pathway (ERP) processes early features with gradient-initialized modules for boundaries. The Contextual Localization Pathway (CLP) processes deep features with Image-based Context Guidance (ICG) for intrinsic saliency suppression without external models. The Attentive Fusion Module (AFM) combines both pathways through spatial gating. C3Net achieves 0.898 S-measure on COD10K, 0.904 on CAMO, and 0.913 on NC4K while maintaining efficient real-time processing.
- Dual-pathway decoder separating edge refinement from contextual localization prevents signal dilution
- Image-based Context Guidance (ICG) for intrinsic saliency suppression without external models
- Gradient-initialized Edge Enhancement Modules preserving classical edge detection principles while enabling learning
- Attentive Fusion Module for synergistic pathway integration through spatial attention gating
C3Net architecture with dual-pathway decoder: Edge Refinement Pathway (ERP) processes early features for boundaries, Contextual Localization Pathway (CLP) with ICG mechanism handles semantic understanding, and Attentive Fusion Module (AFM) combines both pathways
C3Net employs a dual-pathway decoder with three main components:
Processes early encoder features (Stages 1 & 2) to recover precise object boundaries through cascaded Edge Enhancement Modules (EEMs). Each EEM employs multi-path convolutions initialized from Sobel and Laplacian operators, maintaining classical edge detection principles while adapting to camouflage patterns through learning.
Processes deep encoder features (Stages 23 & 24) through Semantic Enhancement Units (SEUs) and our novel Image-based Context Guidance (ICG) mechanism. The ICG performs:
- Appearance Analysis: Direct extraction from input image
- Guided Contrast Module (GCM): Foreground-background differentiation using spatial aggregation and global context
- Iterative Attention Gating: Progressive saliency suppression through two-stage refinement
Synergistically combines ERP and CLP outputs through spatial attention gating, where contextual features spatially modulate edge information, emphasizing relevant boundaries while suppressing distractors.
| Dataset | Sα ↑ | Fβw ↑ | Fβm ↑ | Eφ ↑ | MAE ↓ |
|---|---|---|---|---|---|
| COD10K | 0.898 | 0.851 | 0.859 | 0.961 | 0.0162 |
| CAMO | 0.904 | 0.889 | 0.896 | 0.951 | 0.0311 |
| NC4K | 0.913 | 0.895 | 0.903 | 0.958 | 0.0220 |
C3Net achieves state-of-the-art on COD10K while outperforming previous best by 2.6% in Sα and 2.3% in Eφ. The model leads on CAMO with best Sα of 0.904 and MAE of 0.0311, and maintains top performance on NC4K with 0.913 Sα. Training uses 200 epochs with batch size 128 on NVIDIA H100 GPUs with efficient real-time inference on 392×392 resolution.
Visual comparison of C3Net with state-of-the-art methods on challenging COD cases. Each row exemplifies a specific challenge: (i) Intrinsic Similarity (IS), (ii) Edge Disruption (ED), (iii) Contextual Dependencies (CD), (iv) Multiple Instances, (v) Environmental Complexities (EC), (vi) Small Objects (ESV aspect), (vii) Large Objects (ESV aspect), and (viii) Salient-Camouflaged Object Disambiguation (SCOD). For each row, columns are: (a) Input Image, (b) OCENet, (c) BGNet, (d) ZoomNet, (e) SINetV2, (f) FSPNet, (g) FEDER, (h) C3Net (Ours), and (i) Ground Truth.
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.8+
- Clone the repository:
git clone https://github.com/Baber-Jan/C3Net.git
cd C3Net- Run the setup script which handles environment creation and dataset validation:
chmod +x setup/setup.sh
./setup/setup.shThe setup script performs:
- Creates conda environment with required dependencies
- Validates dataset organization if present
- Generates edge maps for training datasets if needed
Alternatively, manually create the environment:
conda env create -f setup/environment.yml
conda activate c3netC3Net/
├── configs/
│ └── default.yaml # Training configuration
├── checkpoints/ # Model weights
│ └── model_best.pth # Pre-trained C3Net model
├── datasets/ # Dataset directory
│ ├── COD10K/
│ ├── CAMO/
│ └── NC4K/
├── models/
│ ├── c3net.py # Main C3Net architecture
│ ├── backbone.py # DINOv2 encoder
│ ├── edge_branch.py # Edge Refinement Pathway (ERP)
│ ├── localization_branch.py # Contextual Localization Pathway (CLP)
│ ├── fusion_head.py # Attentive Fusion Module (AFM)
│ └── utils.py # Model utilities
├── utils/
│ ├── data_loader.py # Dataset loading
│ ├── distributed_data_loader.py # Multi-GPU data loading
│ ├── distributed_utils.py # Distributed training utilities
│ ├── edge_generator.py # Edge map generation
│ ├── image_processor.py # Image preprocessing
│ ├── loss_functions.py # Multi-scale loss functions
│ ├── metrics.py # Evaluation metrics
│ ├── run_manager.py # Result directory management
│ └── visualization.py # Result visualization
├── engine/
│ ├── trainer.py # Training engine
│ ├── evaluator.py # Evaluation engine
│ ├── predictor.py # Prediction engine
│ └── distributed_trainer.py # Multi-GPU support
├── setup/
│ ├── environment.yml # Conda environment
│ └── setup.sh # Setup script
├── slurm_scripts/
│ └── train_multi_gpu.slurm # SLURM cluster training script
└── main.py # Entry point
Download our COD datasets package containing CAMO, COD10K, and NC4K:
- Download Datasets (Google Drive) (~2.19GB)
After downloading, extract the datasets to the datasets/ folder in your C3Net root directory:
unzip datasets.zip -d C3Net/Expected directory structure:
C3Net/
├── configs/
├── datasets/ # Extract datasets here
│ ├── CAMO/
│ │ ├── train/
│ │ │ ├── Imgs/ # Training images
│ │ │ ├── GT/ # Ground truth masks
│ │ │ └── Edges/ # Edge maps (auto-generated)
│ │ └── test/
│ │ ├── Imgs/
│ │ └── GT/
│ ├── COD10K/
│ │ ├── train/
│ │ │ ├── Imgs/
│ │ │ ├── GT/
│ │ │ └── Edges/
│ │ └── test/
│ │ ├── Imgs/
│ │ └── GT/
│ └── NC4K/
│ └── test/
│ ├── Imgs/
│ └── GT/
├── checkpoints/
└── ...
-
C3Net Model Checkpoint: Download (Google Drive)
- Trained C3Net model for evaluation/inference
- Place in:
checkpoints/model_best.pth
-
Test Predictions
Following are segmentation masks generated by C3Net (reported in the paper) for each test dataset:
- CAMO (250 predictions): Download (Google Drive)
- COD10K (2,026 predictions): Download (Google Drive)
- NC4K (4,121 predictions): Download (Google Drive)
All training configurations are specified in configs/default.yaml - review and modify hyperparameters, loss weights, and model settings as needed.
Train C3Net on COD10K + CAMO:
python main.py train --config configs/default.yamlTraining features:
- Automatic mixed precision (AMP) for efficient GPU utilization
- Multi-scale supervision with pathway-specific objectives
- Regular checkpointing and validation
- Early stopping with configurable patience
Evaluate on test datasets:
python main.py evaluate --model checkpoints/model_best.pthEvaluation provides:
- Standard COD metrics (Sα, Fβw, Fβm, Eφ, MAE)
- Quality-based result categorization
- Per-dataset performance analysis
For inference on new images:
python main.py predict \
--model checkpoints/model_best.pth \
--input path/to/imageThe prediction outputs include:
- Binary segmentation mask
- Confidence heatmap
- Edge map
- Original image overlay
For multi-GPU training using PyTorch DDP:
# Single-node multi-GPU (e.g., 8 GPUs)
python -m torch.distributed.run \
--nproc_per_node=8 \
--master_port=29500 \
engine/distributed_trainer.py \
--config configs/default.yamlSLURM Cluster:
sbatch slurm_scripts/train_multi_gpu.slurmCustomize the SLURM script for your cluster (partition name, GPU type, paths).
C3Net has been tested on:
- NVIDIA H100 GPU (optimal setup)
- CUDA 11.8 with PyTorch 2.1.0
- Recommended: 16GB+ GPU memory for training
The model supports various GPU configurations through adjustable batch sizes and memory optimization settings.
If you find C3Net useful in your research, please cite:
@article{jan2025c3net,
title={C3Net: Context-Contrast Network for Camouflaged Object Detection},
author={Jan, Baber and El-Maleh, Aiman H. and Siddiqui, Abdul Jabbar and Bais, Abdul and Anwar, Saeed},
eprint={2511.12627},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.12627},
}This research was conducted at the SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum and Minerals. We gratefully acknowledge:
- King Fahd University of Petroleum and Minerals (KFUPM) for institutional support
- SDAIA-KFUPM Joint Research Center for Artificial Intelligence (JRCAI) for computational resources
- DINOv2 for robust vision transformer features
- DySample for content-adaptive upsampling
- PySODMetrics for evaluation framework
- Authors of COD benchmark datasets (COD10K, CAMO, NC4K)
This project is licensed under the MIT License - see LICENSE for details.
- Open source for research and non-commercial use
- Commercial use requires explicit permission
- Attribution required when using or adapting the code
The benchmark datasets (COD10K, CAMO, NC4K) maintain their original licenses. Please refer to their respective papers for terms of use.
For questions or issues:
- GitHub Issues: Create an issue
- Email: baberjan008@gmail.com
- Research Collaborations: Contact Baber Jan at baberjan008@gmail.com
Note: This repository contains the official implementation of C3Net submitted to IEEE Transactions on Artificial Intelligence (TAI).