This repository is the code implementation of the paper AgriFM: A Multi-source Temporal Remote Sensing Foundation Model for Crop Mapping.
AgriFM is a multi-source temporal remote sensing foundation model specifically designed for agricultural crop mapping. Our approach begins by establishing the necessity of simultaneous hierarchical spatiotemporal feature extraction, leading to the development of a modified Video Swin Transformer architecture where temporal down-sampling is synchronized with spatial down-sampling operations. This modified backbone enables efficient unified processing of long time-series satellite inputs while preserving critical multi-scale spatial patterns and phenological dynamics. AgriFM leverages temporally rich data streams from three satellite sources including MODIS, Landsat-8/9 and Sentinel-2, and is pre-trained on a global representative dataset comprising over 25 million image samples supervised by land cover products. The resulting framework incorporates a versatile decoder architecture that dynamically fuses these learned spatiotemporal representations, supporting diverse downstream tasks including cropland mapping, field-boundary delineation, early-season crop mapping, and specific crop mapping (e.g., winter wheat and paddy rice) with difference data sources.
- Introduction
- Table of Contents
- Installation
- Dataset Preparation
- Model Usage
- Citation
- License
- Contact
- Linux or Windows
- Python 3.9+, recommended 3.9.18
- PyTorch 2.0 or higher, recommended 2.1
- CUDA 11.7 or higher, recommended 12.1
- MMCV 2.0 or higher, recommended 2.1.0
We recommend using Miniconda for installation. The following command will create a virtual environment named AgriFM and install PyTorch,GDAL and other libraries.
Note: If you have experience with Conda, pytorch and have already installed them, you can skip to the next section. Otherwise, you can follow these steps to prepare.
Details
Step 0: Install Miniconda.
Then you can quickly create environment by running the following command in your terminal:
conda env create -f environment.yml If it does not work, you can manually install the dependencies by following the steps below.
Step 1: Create a virtual environment named AgriFM and activate it.
conda create -n AgriFM python=3.9 -y
[conda] activate AgriFM
Step 2: Install PyTorch.
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=12.1 -c pytorch -c nvidia
Step 3: Install MMCV.
pip install -U openmim
mim install "mmcv==2.1.0"
Step 4: Install other dependencies.
pip install ftfy tqdm regex h5py prettytable timm scipy einops numpy==1.26.2
You can download the dataset from OneDrive or GLASS Website. The dataset includes example data for quick start, formatted as H5 files that follow a unified multi-source remote sensing data structure.
The example dataset is organized as follows:
example_dataset/
├── data_lists/ # Text files listing sample filenames (one per line)
│ ├── train.txt # Training set samples (e.g., sample_001, sample_002)
│ └── val.txt # Validation set samples
│ └── test.txt # Validation set samples
└── h5_samples/ # Directory storing H5 format samples
├── 2018_T31TDK_2560_2304.h5 # Single sample file (contains multi-source data and labels)
├── 2018_T31TDK_2816_0768.h5
└── ...
Each H5 file contains multi-source remote sensing data and labels, structured as follows:
h5_file.h5
├── [Source1] # Customizable data source name (e.g., S2, Landsat, Modis and etc.)
│ shape: (T, C, H, W)
│ - T: Number of time steps (T=1 for single-temporal data)
│ - C: Number of bands
│ - H/W: Image height/width
├── [Source2] # Customizable data source name (e.g., S2, Landsat, Modis and etc.)
└── label # Mandatory label
shape: (H, W)
- Type: Integer (pixel-level classification labels)
For example, a Sentinel-2 sample with 10 bands and 256x256 pixles is provided. It can be loaded and processing using our provided dataset MappingDataset class, which handles multi-source data and labels.
If you want to use your own dataset, you can implement a custom dataset class by inheriting from the MappingDataset class.
You need to ensure that the outputs of the dataset class follow the same structure as the example dataset:
- A dictionary containing multi-source data (each value is a tensor of shape [T, C, H, W])
- A label tensor
# Output example
{
"data": {
"source1": torch.Tensor(T, C, H, W), # Multi-source data dictionary
"source2": torch.Tensor(T, C, H, W)
},
"label": torch.LongTensor(H, W), # Label (long integer type)
"file_name": str # Optional: Sample filename for debugging
}
Notes: source names in the data dictionary should be consistent with the names used in the model configuration.
The pretrained AgriFM weights can be downloaded from OneDrive or GLASS Website.
The AgriFM model consists of three main components:
- Multi-modal Encoder: Processes different remote sensing data sources
- Fusion Neck: Combines features from different modalities
- Prediction Head: Generates final crop classification maps
The model configuration is defined as follows (in cropland_config).
| Parameter | Description | Example Value |
|---|---|---|
patch_size |
Spatiotemporal patch size | (4,2,2) |
in_chans |
Input channels per timestep | 10 (for Sentinel-2) |
depths |
Number of transformer blocks per stage | [2, 2, 18, 2] |
num_heads |
Attention heads per stage | [4, 8, 16, 32] |
window_size |
Local attention window size | (8,7,7) |
| Parameter | Description | Example Value |
|---|---|---|
embed_dim |
Feature dimension | 1024 |
in_feature_key |
Input modality keys | ('S2',) |
feature_size |
Input feature map size | (img_size//16, img_size//16) |
out_size |
Output map size | (img_size, img_size) |
| Parameter | Description | Example Value |
|---|---|---|
num_classes |
Number of crop classes | Varies by dataset |
loss_model |
Loss function type | "CropCEloss" |
To train the model using the provided configuration:
python train.py configs/cropland_mapping.py --work_dir ./work_dirs/cropland_mappingThis command will start training the model with the specified configuration file and save the results in the work_dirs/cropland_mapping directory.
- The default configuration uses Sentinel-2 data ('S2' key) - add additional encoders for other data sources
To evaluate the model, you can use the following command:
python test.py configs/cropland_mapping.py work_dirs/cropland_mapping/best_mFscores_xx.pthTo get the visualization results, you can use the following command:
python inference.py configs/cropland_mapping.py work_dirs/cropland_mapping/best_mFscores_xx.pth path/to/your/visualization_outputIf you use the code or performance benchmarks of this project in your research, please refer to the bibtex below to cite.
@misc{li2025agrifmmultisourcetemporalremote,
title={AgriFM: A Multi-source Temporal Remote Sensing Foundation Model for Crop Mapping},
author={Wenyuan Li and Shunlin Liang and Keyan Chen and Yongzhe Chen and Han Ma and Jianglei Xu and Yichuan Ma and Shikang Guan and Husheng Fang and Zhenwei Shi},
year={2025},
eprint={2505.21357},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.21357},
}
This project is licensed under the Apache 2.0 license.
This project is built upon OpenMMLab. We thank the OpenMMLab developers.
Our model is built upon Video Swin Transformer.
If you have any other questions or suggestions, please contact Wenyuan Li (liwayne@hku.hk).
