VideoMamba

Model Details

VideoMamba is a purely SSM-based model for video understanding.

Developed by: OpenGVLab
Model type: An efficient backbone based on the bidirectional state space model.
License: Non-commercial license

Model Sources

Repository: https://github.com/OpenGVLab/VideoMamba
Paper: https://arxiv.org/abs/2403.06977

Uses

The primary use of VideoMamba is research on image and video tasks, e.g., image classification, action recognition, long-term video understanding, and video-text retrieval, with an SSM-based backbone. The primary intended users of the model are researchers and hobbyists in computer vision, machine learning, and artificial intelligence.

How to Get Started with the Model

You can replace the backbone for video tasks with the proposed VideoMamba: https://github.com/OpenGVLab/VideoMamba/blob/main/videomamba/video_sm/models/videomamba.py
Then you can load this checkpoint and start training.

Citation Information

@misc{li2024videomamba,
      title={VideoMamba: State Space Model for Efficient Video Understanding}, 
      author={Kunchang Li and Xinhao Li and Yi Wang and Yinan He and Yali Wang and Limin Wang and Yu Qiao},
      year={2024},
      eprint={2403.06977},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

OpenGVLab
/

VideoMamba

VideoMamba

Model Details

Model Sources

Uses

How to Get Started with the Model

Citation Information

Datasets used to train OpenGVLab/VideoMamba

Spaces using OpenGVLab/VideoMamba 2

Collection including OpenGVLab/VideoMamba

VideoMamba