| CARVIEW |
Select Language
HTTP/2 200
content-security-policy: frame-ancestors 'none'
server: Google Frontend
cache-control: max-age=3600
via: 1.1 google, 1.1 varnish, 1.1 varnish, 1.1 varnish
content-type: text/html; charset=utf-8
last-modified: Tue, 25 Apr 2023 00:16:42 GMT
x-cloud-trace-context: 20025c4d20f1feeb987cfc692046125a
x-frame-options: SAMEORIGIN
accept-ranges: bytes
age: 157788
date: Thu, 01 Jan 2026 02:54:49 GMT
x-served-by: cache-lga21990-LGA, cache-lga21973-LGA, cache-bom-vanm7210045-BOM
x-cache: MISS, HIT, MISS
x-timer: S1767236090.511157,VS0,VE249
content-length: 44853
[2304.11619] SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models
Skip to main content
[v1] Sun, 23 Apr 2023 11:23:05 UTC (8,314 KB)
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors.
Donate
Computer Science > Computer Vision and Pattern Recognition
arXiv:2304.11619 (cs)
[Submitted on 23 Apr 2023]
Title:SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models
View a PDF of the paper titled SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models, by Jonathan Roberts and 2 other authors
View PDF
Abstract:Interpreting remote sensing imagery enables numerous downstream applications ranging from land-use planning to deforestation monitoring. Robustly classifying this data is challenging due to the Earth's geographic diversity. While many distinct satellite and aerial image classification datasets exist, there is yet to be a benchmark curated that suitably covers this diversity. In this work, we introduce SATellite ImageNet (SATIN), a metadataset curated from 27 existing remotely sensed datasets, and comprehensively evaluate the zero-shot transfer classification capabilities of a broad range of vision-language (VL) models on SATIN. We find SATIN to be a challenging benchmark-the strongest method we evaluate achieves a classification accuracy of 52.0%. We provide a $\href{this https URL}{\text{public leaderboard}}$ to guide and track the progress of VL models in this important domain.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) |
| Cite as: | arXiv:2304.11619 [cs.CV] |
| (or arXiv:2304.11619v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2304.11619
arXiv-issued DOI via DataCite
|
Submission history
From: Jonathan Roberts [view email][v1] Sun, 23 Apr 2023 11:23:05 UTC (8,314 KB)
Full-text links:
Access Paper:
- View PDF
- TeX Source
View a PDF of the paper titled SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models, by Jonathan Roberts and 2 other authors
Current browse context:
cs.CV
References & Citations
export BibTeX citation
Loading...
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.