| CARVIEW |
Multiview Aerial Visual Recognition (MAVREC):
Can Multi-view Improve Aerial Visual Perception?
Aritra Dutta1, Srijan Das2, Jacob Nielsen3, Rajatsubhra Chakraborty2, and Mubarak Shah1
1 University of Central Florida, 2 University of North Carolina at Charlotte, 3 University of Southern Denmark
TL;DR; MAVREC- Multiview Aerial Visual RECognition dataset with synchronized videos recorded from different perspectives covering rural and urban pastures from European geographies.
🎉 MAVREC got accepted in CVPR 2024 🎉
|
12 Multi-modal Scenes |
0.5M Frames |
1.1M Bounding Boxes |
10 Object Classes |
2.5 hours of 2.7K resolution video |
Abstract
Despite the commercial abundance of UAVs, aerial data acquisition remains challenging, and the existing Asia and North America-centric open-source UAV datasets are small-scale or low-resolution and lack diversity in scene contextuality. Additionally, the color content of the scenes, solar-zenith angle, and population density of different geographies influence the data diversity. These two factors conjointly render suboptimal aerial-visual perception of the deep neural network (DNN) models trained primarily on the ground-view data, including the open-world foundational models.
To pave the way for a transformative era of aerial detection, we present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives --- ground camera and drone-mounted camera. MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes.~This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets across all modalities and tasks.~Through our extensive benchmarking on MAVEREC, we recognize that augmenting object detectors with ground-view images from the corresponding geographical location is a superior pre-training strategy for aerial detection. Building on this strategy, we benchmark MAVREC with a curriculum-based semi-supervised object detection approach that leverages labeled (ground and aerial) and unlabeled (only aerial) images to enhance the aerial detection.
Sample Frames
Different sample scenes (with annotation) from our dataset; the first row is the aerial-view, second
row presents the same scenes from a ground camera. Similarly, the third row is the aerial-view, and the fourth
row presents the same scenes from a ground camera. Some scenes have a dense
object annotations, while some scenes have very few object annotations. This high variance in object
distribution across different scenes in MAVREC is complementary to datasets like VisDrone where object
detection is relatively straightforward due to their biased object distribution (dense), reflecting its
demographic characteristics.
Different sample scenes (with annotation) from our dataset; the first row is the aerial-view, second row presents the same scenes from a ground camera. Similarly, the third row is the aerial-view, and the fourth row presents the same scenes from a ground camera. Some scenes have a dense object annotations, while some scenes have very few object annotations. This high variance in object distribution across different scenes in MAVREC is complementary to datasets like VisDrone where object detection is relatively straightforward due to their biased object distribution (dense), reflecting its demographic characteristics.
10 Object Classes
|
|
Dominant colors in MAVREC and other datasets
Dominant colors in sample frames of MAVREC
Dominant colors in sample frames of other state-of-the-art drone datasets
MAVREC Toy Dataset
We provide a small low-resolution toy dataset of MAVREC consisting of 100 images from each view.
Download Toy DatasetAnnotation Format
We adopt the MSCOCO Annotation Format. We extend the format of images by adding a scene and a frame identifier. We provide aligned annotation files for corresponding ground and aerial.
{
"id": 1,
"file_name": "scene_12_sdu_30Sec_droneView_6_000826.PNG",
"height": 337,
"width": 600.0
"scene": 12,
"frameID": 826,
},
Qualitative Results
Citation
@InProceedings{Dutta_2024_CVPR,
author = {Dutta, Aritra and Das, Srijan and Nielsen, Jacob and Chakraborty, Rajatsubhra and Shah, Mubarak},
title = {Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {22678-22690}
}
Usage Licence
The dataset is protected under the CC-BY license of creative commons, which allows the users to distribute, remix, adapt, and build upon the material in any medium or format, as long as the creator is attributed. The license allows MAVREC for commercial use. As the authors of this manuscript and collectors of this dataset, we reserve the right to distribute the data.