ExceedZhang (zhangwenbin)

upvoted a paper 20 days ago

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published 22 days ago • 55

upvoted 4 papers 24 days ago

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published about 1 month ago • 54

upvoted a paper about 1 month ago

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15 • 51

upvoted an article about 1 month ago

Article

Welcome FalconMamba: The first strong attention-free 7B model

Aug 12

• 96

upvoted 2 papers about 1 month ago

A decoder-only foundation model for time-series forecasting

Paper • 2310.10688 • Published Oct 14, 2023 • 5

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

Paper • 2407.00088 • Published Jun 25 • 9

upvoted an article about 2 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 242

upvoted 2 papers about 2 months ago

Improving Retrieval Augmented Language Model with Self-Reasoning

Paper • 2407.19813 • Published Jul 29 • 6

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

Paper • 2407.14057 • Published Jul 19 • 41

upvoted 2 papers 2 months ago

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15 • 153

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Paper • 2407.09025 • Published Jul 12 • 122

upvoted 3 articles 3 months ago

Article

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

Jun 13

• 41

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Jun 24

• 166

Article

🧨 Diffusers welcomes Stable Diffusion 3

Jun 12

• 84

upvoted a paper 4 months ago

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19 • 149

upvoted an article 4 months ago

Article

Unlocking Longer Generation with Key-Value Cache Quantization

May 16

• 28

upvoted 6 papers 4 months ago

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20 • 45

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19 • 53

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 125

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15 • 86

Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Paper • 2405.06932 • Published May 11 • 16

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14 • 19

upvoted an article 4 months ago

Article

Hugging Face x LangChain : A new partner package in LangChain

May 14

• 103

upvoted a paper 5 months ago

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118

upvoted 2 articles 5 months ago

Article

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

Apr 29

• 71

Article

Fine-tune Llama 3 with ORPO

By

•

Apr 22

• 221

upvoted 2 papers 5 months ago

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9 • 29

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Paper • 2403.14403 • Published Mar 21 • 6

upvoted 6 papers 6 months ago

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 103

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 103

sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 38

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20 • 58

RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15 • 66

Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 61

upvoted 8 papers 7 months ago

Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5 • 93

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27 • 185

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Paper • 2402.11131 • Published Feb 16 • 41

Generative Representational Instruction Tuning

Paper • 2402.09906 • Published Feb 15 • 51

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 94

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6 • 48

Weaver: Foundation Models for Creative Writing

Paper • 2401.17268 • Published Jan 30 • 41

upvoted 7 papers 8 months ago

LongAlign: A Recipe for Long Context Alignment of Large Language Models

Paper • 2401.18058 • Published Jan 31 • 21

OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1 • 78

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5 • 67

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29 • 48

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Paper • 2401.16420 • Published Jan 29 • 54

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

Paper • 2401.14112 • Published Jan 25 • 17

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25 • 46

upvoted a paper 9 months ago

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 256

upvoted 2 papers 11 months ago

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 74

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96

upvoted 5 papers about 1 year ago

Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models

Paper • 2308.00304 • Published Aug 1, 2023 • 23

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Paper • 2307.15217 • Published Jul 27, 2023 • 36

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

Paper • 2307.14936 • Published Jul 27, 2023 • 42

On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models

Paper • 2307.09793 • Published Jul 19, 2023 • 46

Challenges and Applications of Large Language Models

Paper • 2307.10169 • Published Jul 19, 2023 • 47

zhangwenbin

AI & ML interests

Organizations

ExceedZhang's activity

Welcome FalconMamba: The first strong attention-free 7B model

SmolLM - blazingly fast and remarkably powerful

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

🧨 Diffusers welcomes Stable Diffusion 3

Unlocking Longer Generation with Key-Value Cache Quantization

Hugging Face x LangChain : A new partner package in LangChain

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

Fine-tune Llama 3 with ORPO