Collections
Discover the best community collections!
Collections including paper arxiv:2409.01704
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 54 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 24 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 66 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 69
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 7 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 25 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 32 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 26
-
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper • 2402.11131 • Published • 41 -
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 51 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 94 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 17
-
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 45 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 83 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 72 -
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
Paper • 2409.10173 • Published • 15
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 16 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 11 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 47
-
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Paper • 2305.06500 • Published • 4 -
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper • 2310.09199 • Published • 24 -
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Paper • 2306.05424 • Published • 7 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 72