Collections
Discover the best community collections!
Collections including paper arxiv:2409.01704
-
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Paper • 2409.02095 • Published • 32 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 72 -
CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation
Paper • 2409.03643 • Published • 18 -
UniDet3D: Multi-dataset Indoor 3D Object Detection
Paper • 2409.04234 • Published • 7
-
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Paper • 2406.16860 • Published • 55 -
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Paper • 2407.02477 • Published • 21 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 51 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 109