Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 1 day ago • 45
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild Paper • 2409.03753 • Published 14 days ago • 17
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published 16 days ago • 84
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI Paper • 2409.01392 • Published 17 days ago • 9
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published 22 days ago • 55
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search Paper • 2408.08152 • Published Aug 15 • 51
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12 • 52
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model Paper • 2407.16982 • Published Jul 24 • 40
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence Paper • 2407.16655 • Published Jul 23 • 28
Artist: Aesthetically Controllable Text-Driven Stylization without Training Paper • 2407.15842 • Published Jul 22 • 13
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 52
MUSCLE: A Model Update Strategy for Compatible LLM Evolution Paper • 2407.09435 • Published Jul 12 • 20
VEnhancer: Generative Space-Time Enhancement for Video Generation Paper • 2407.07667 • Published Jul 10 • 11
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10 • 40
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Paper • 2407.04051 • Published Jul 4 • 35
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 92
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds Paper • 2407.01494 • Published Jul 1 • 13