From cf0f5c432224812c7487a0fa9986030a100b5b0d Mon Sep 17 00:00:00 2001 From: Yishuo Wang Date: Thu, 27 Jun 2024 13:59:59 +0800 Subject: [PATCH] change npu document (#11446) --- .../HF-Transformers-AutoModels/Model/llama2/README.md | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/Model/llama2/README.md b/python/llm/example/NPU/HF-Transformers-AutoModels/Model/llama2/README.md index 4a6b41ab..dc289c8c 100644 --- a/python/llm/example/NPU/HF-Transformers-AutoModels/Model/llama2/README.md +++ b/python/llm/example/NPU/HF-Transformers-AutoModels/Model/llama2/README.md @@ -20,12 +20,7 @@ conda activate llm pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ # below command will install intel_npu_acceleration_library -conda install cmake -git clone https://github.com/intel/intel-npu-acceleration-library npu-library -cd npu-library -git checkout bcb1315 -python setup.py bdist_wheel -pip install dist\intel_npu_acceleration_library-1.2.0-cp310-cp310-win_amd64.whl +pip install intel-npu-acceleration-library==1.3 ``` ### 2. Runtime Configurations @@ -48,7 +43,7 @@ Arguments info: - `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Llama2 model (e.g. `meta-llama/Llama-2-7b-chat-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'meta-llama/Llama-2-7b-chat-hf'`. - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun'`. - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`. -- `--load_in_low_bit`: argument defining the load_in_low_bit format used. It is default to be `sym_int8`, `sym_int4` can also be used. +- `--load_in_low_bit`: argument defining the `load_in_low_bit` format used. It is default to be `sym_int8`, `sym_int4` can also be used. #### Sample Output #### [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)