zhiqings commited on
Commit
4aba632
1 Parent(s): 7709134

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -17,7 +17,7 @@ The base language model is LLaMA-70b, based on the transformer architecture.
17
 
18
  **NOTE: *Dromedary-2* is trained with [QLoRA](https://github.com/artidoro/qlora) and the bfloat16 data type.** While it is [possible](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) to merge the QLoRA weights with the quantized model and thus enable inference with libraries such as [TGI](https://github.com/huggingface/text-generation-inference) and [vLLM](https://github.com/vllm-project/vllm), we found the merged weights can lead to degenerated performance. Therefore, we recommend directly loading the QLoRA weights with the [PEFT-LoRA](https://github.com/huggingface/peft) framework.
19
 
20
- Please check the [inference section](https://github.com/IBM/SALMON/inference) of our repo for the complete inference code.
21
 
22
  ```python
23
  system_prompt = (
 
17
 
18
  **NOTE: *Dromedary-2* is trained with [QLoRA](https://github.com/artidoro/qlora) and the bfloat16 data type.** While it is [possible](https://gist.github.com/ChrisHayduk/1a53463331f52dca205e55982baf9930) to merge the QLoRA weights with the quantized model and thus enable inference with libraries such as [TGI](https://github.com/huggingface/text-generation-inference) and [vLLM](https://github.com/vllm-project/vllm), we found the merged weights can lead to degenerated performance. Therefore, we recommend directly loading the QLoRA weights with the [PEFT-LoRA](https://github.com/huggingface/peft) framework.
19
 
20
+ Please check the [inference section](https://github.com/IBM/SALMON/tree/main/inference) of our repo for the complete inference code.
21
 
22
  ```python
23
  system_prompt = (