iofu728 commited on
Commit
962818f
1 Parent(s): c3766a1

Feature(MInference): update information

Browse files
Files changed (1) hide show
  1. app.py +0 -3
app.py CHANGED
@@ -26,9 +26,6 @@ _Huiqiang Jiang†, Yucheng Li†, Chengruidong Zhang†, Qianhui Wu, Xufang Luo
26
  - 🪗 [24/07/07] Thanks @AK for sponsoring. You can now use MInference online in the [HF Demo](https://huggingface.co/spaces/microsoft/MInference) with ZeroGPU.
27
  - 🧩 [24/07/03] We will present **MInference 1.0** at the _**Microsoft Booth**_ and _**ES-FoMo**_ at ICML'24. See you in Vienna!
28
 
29
- ## TL;DR
30
- **MInference 1.0** leverages the dynamic sparse nature of LLMs' attention, which exhibits some static patterns, to speed up the pre-filling for long-context LLMs. It first determines offline which sparse pattern each head belongs to, then approximates the sparse index online and dynamically computes attention with the optimal custom kernels. This approach achieves up to a **10x speedup** for pre-filling on an A100 while maintaining accuracy.
31
-
32
  <font color="brown"><b>This is only a deployment demo. You can follow the code below to try MInference locally.</b></font>
33
 
34
  ```bash
 
26
  - 🪗 [24/07/07] Thanks @AK for sponsoring. You can now use MInference online in the [HF Demo](https://huggingface.co/spaces/microsoft/MInference) with ZeroGPU.
27
  - 🧩 [24/07/03] We will present **MInference 1.0** at the _**Microsoft Booth**_ and _**ES-FoMo**_ at ICML'24. See you in Vienna!
28
 
 
 
 
29
  <font color="brown"><b>This is only a deployment demo. You can follow the code below to try MInference locally.</b></font>
30
 
31
  ```bash