[LLM]update ipex part in mistral example readme (#10239)

* update ipex part in mistral example readme
2024-02-26 14:35:20 +08:00 · 2024-02-26 14:35:20 +08:00 · ea23afc8ec
commit ea23afc8ec
parent 7c236e4c6d
1 changed files with 21 additions and 1 deletions
--- a/python/llm/example/CPU/Speculative-Decoding/mistral/README.md
+++ b/python/llm/example/CPU/Speculative-Decoding/mistral/README.md
@ -84,4 +84,24 @@ First token latency xx.xxxxs

 ### 4. Accelerate with BIGDL_OPT_IPEX

-BIGDL_OPT_IPEX can help to accelerate speculative decoding on Mistral, and please refer to [here](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/CPU/Speculative-Decoding/llama2#4-accelerate-with-bigdl_opt_ipex) for a try.
+To accelerate speculative decoding on CPU, you can install our validated version of [IPEX 2.2.0+cpu](https://github.com/intel/intel-extension-for-pytorch/tree/v2.2.0%2Bcpu) refering to [IPEX's installation guide](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=cpu&version=v2.2.0%2Bcpu), or by the following commands: (Other versions of IPEX may have some conflicts and can not accelerate speculative decoding correctly.)
+
+```bash
+# Install IPEX 2.2.0+cpu
+python -m pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cpu
+python -m pip install intel-extension-for-pytorch==2.2.0
+python -m pip install oneccl_bind_pt==2.2.0 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
+# if there is any installation problem for oneccl_binding, you can also find suitable index url at "https://pytorch-extension.intel.com/release-whl/stable/cpu/cn/" or "https://developer.intel.com/ipex-whl-stable-cpu" according to your environment.
+
+# Update transformers
+pip install transformers==4.35.2
+```
+
+After installed IPEX, you can set `BIGDL_OPT_IPEX=true` to get target model acceleration. Currently `Mistral-7B-Instruct-v0.1 and Mistral-7B-v0.1` are supported.
+
+```bash
+source bigdl-llm-init -t
+export BIGDL_OPT_IPEX=true
+export OMP_NUM_THREADS=48 # you can change 48 here to #cores of one processor socket
+numactl -C 0-47 -m 0 python ./speculative.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
+```