Co-authored-by: rnwang04 <ruonan1.wang@intel.com>
This commit is contained in:
Guoqiong Song 2024-07-26 12:39:09 -07:00 committed by GitHub
parent ba01b85c13
commit 336dfc04b1
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 53 additions and 6 deletions

View file

@ -18,6 +18,8 @@ conda activate llm
# install the latest ipex-llm nightly build with 'all' option # install the latest ipex-llm nightly build with 'all' option
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
pip install transformers==3.36.2
pip install huggingface_hub
``` ```
On Windows: On Windows:
@ -27,9 +29,17 @@ conda create -n llm python=3.11
conda activate llm conda activate llm
pip install --pre --upgrade ipex-llm[all] pip install --pre --upgrade ipex-llm[all]
pip install transformers==3.36.2
pip install huggingface_hub
``` ```
### 2. Run ### 2. Run
Setup local MODEL_PATH and run python code to download the right version of model from hugginface.
```python
from huggingface_hub import snapshot_download
snapshot_download(repo_id=repo_id, local_dir=MODEL_PATH, local_dir_use_symlinks=False, revision="v1.1.0")
```
Then run the example with the downloaded model
``` ```
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
``` ```
@ -46,7 +56,7 @@ Arguments info:
#### 2.1 Client #### 2.1 Client
On client Windows machine, it is recommended to run directly with full utilization of all cores: On client Windows machine, it is recommended to run directly with full utilization of all cores:
```cmd ```cmd
python ./generate.py python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH
``` ```
#### 2.2 Server #### 2.2 Server
@ -59,7 +69,7 @@ source ipex-llm-init
# e.g. for a server with 48 cores per socket # e.g. for a server with 48 cores per socket
export OMP_NUM_THREADS=48 export OMP_NUM_THREADS=48
numactl -C 0-47 -m 0 python ./generate.py numactl -C 0-47 -m 0 python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH
``` ```
#### 2.3 Sample Output #### 2.3 Sample Output

View file

@ -19,6 +19,8 @@ conda activate llm
# install the latest ipex-llm nightly build with 'all' option # install the latest ipex-llm nightly build with 'all' option
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
pip install transformers==3.36.2
pip install huggingface_hub
``` ```
On Windows: On Windows:
@ -28,15 +30,30 @@ conda create -n llm python=3.11
conda activate llm conda activate llm
pip install --pre --upgrade ipex-llm[all] pip install --pre --upgrade ipex-llm[all]
pip install transformers==3.36.2
pip install huggingface_hub
``` ```
### 2. Run ### 2. Run
After setting up the Python environment, you could run the example by following steps. After setting up the Python environment, you could run the example by following steps.
Setup local MODEL_PATH and run python code to download the right version of model from hugginface.
```python
from huggingface_hub import snapshot_download
snapshot_download(repo_id=repo_id, local_dir=MODEL_PATH, local_dir_use_symlinks=False, revision="v1.1.0")
```
Then run the example with the downloaded model
```
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
```
Arguments info:
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the InternLM2 model (e.g. `internlm/internlm2-chat-7b`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'internlm/internlm2-chat-7b'`.
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么'`.
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
#### 2.1 Client #### 2.1 Client
On client Windows machines, it is recommended to run directly with full utilization of all cores: On client Windows machines, it is recommended to run directly with full utilization of all cores:
```cmd ```cmd
python ./generate.py --prompt 'What is AI?' python ./generate.py --prompt 'What is AI?' --repo-id-or-model-path REPO_ID_OR_MODEL_PATH
``` ```
More information about arguments can be found in [Arguments Info](#23-arguments-info) section. The expected output can be found in [Sample Output](#24-sample-output) section. More information about arguments can be found in [Arguments Info](#23-arguments-info) section. The expected output can be found in [Sample Output](#24-sample-output) section.
@ -50,7 +67,7 @@ source ipex-llm-init
# e.g. for a server with 48 cores per socket # e.g. for a server with 48 cores per socket
export OMP_NUM_THREADS=48 export OMP_NUM_THREADS=48
numactl -C 0-47 -m 0 python ./generate.py --prompt 'What is AI?' numactl -C 0-47 -m 0 python ./generate.py --prompt 'What is AI?' --repo-id-or-model-path REPO_ID_OR_MODEL_PATH
``` ```
More information about arguments can be found in [Arguments Info](#23-arguments-info) section. The expected output can be found in [Sample Output](#24-sample-output) section. More information about arguments can be found in [Arguments Info](#23-arguments-info) section. The expected output can be found in [Sample Output](#24-sample-output) section.

View file

@ -14,6 +14,8 @@ conda create -n llm python=3.11
conda activate llm conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers==3.36.2
pip install huggingface_hub
``` ```
#### 1.2 Installation on Windows #### 1.2 Installation on Windows
@ -24,6 +26,8 @@ conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers==3.36.2
pip install huggingface_hub
``` ```
### 2. Configures OneAPI environment variables for Linux ### 2. Configures OneAPI environment variables for Linux
@ -100,8 +104,14 @@ set SYCL_CACHE_PERSISTENT=1
> [!NOTE] > [!NOTE]
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. > For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
### 4. Running examples
### 4. Running examples
Setup local MODEL_PATH and run python code to download the right version of model from hugginface.
```python
from huggingface_hub import snapshot_download
snapshot_download(repo_id=repo_id, local_dir=MODEL_PATH, local_dir_use_symlinks=False, revision="v1.1.0")
```
Then run the example with the downloaded model
``` ```
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
``` ```

View file

@ -14,6 +14,8 @@ conda create -n llm python=3.11
conda activate llm conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers==3.36.2
pip install huggingface_hub
``` ```
#### 1.2 Installation on Windows #### 1.2 Installation on Windows
@ -24,6 +26,8 @@ conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers==3.36.2
pip install huggingface_hub
``` ```
### 2. Configures OneAPI environment variables for Linux ### 2. Configures OneAPI environment variables for Linux
@ -100,8 +104,14 @@ set SYCL_CACHE_PERSISTENT=1
> [!NOTE] > [!NOTE]
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. > For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
### 4. Running examples
### 4. Running examples
Setup local MODEL_PATH and run python code to download the right version of model from hugginface.
```python
from huggingface_hub import snapshot_download
snapshot_download(repo_id=repo_id, local_dir=MODEL_PATH, local_dir_use_symlinks=False, revision="v1.1.0")
```
Then run the example with the downloaded model
``` ```
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
``` ```