You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vim and VMamba adopt a flattening strategy similar to (a) and (b), transmuting two-dimensional images into one-dimensional sequences by row or column, while LocalMamba (c) performs the corresponding flattening within a local window. Notably, these sequences lack the inherent causal sequencing of tokens that is characteristic of the causal architecture of Mamba causal architecture. Differently, GlobalMamba (d) constructs a causal token sequence by frequency, while ensuring that tokens acquire global feature information.
The above code trains GlobalMamba based on Vim. We have reorganized the original token sequence based on frequency in the models_mamba.py file, so you only need to transfer this part to other vision mamba frameworks for comparison.
Results
Acknowledgement
This project is based on Vision Mamba (code), Mamba (code), Causal-Conv1d (code), DeiT (code). Thanks for their wonderful works.
Citation
If you find this project helpful, please consider citing the following paper:
@article{wang2024globalmamba,
title={GlobalMamba: Global Image Serialization for Vision Mamba},
author={Chengkun Wang and Wenzhao Zheng and Jie Zhou and Jiwen Lu},
journal={arXiv preprint arXiv:2410.10316},
year={2024}
}