update cpp quickstart (#11031)
This commit is contained in:
		
							parent
							
								
									d9f71f1f53
								
							
						
					
					
						commit
						1d73fc8106
					
				
					 2 changed files with 12 additions and 6 deletions
				
			
		| 
						 | 
					@ -102,6 +102,12 @@ Then you can use following command to initialize `llama.cpp` with IPEX-LLM:
 | 
				
			||||||
   ``init-llama-cpp`` will create soft links of llama.cpp's executable files to current directory, if you want to use these executable files in other places, don't forget to run above commands again.
 | 
					   ``init-llama-cpp`` will create soft links of llama.cpp's executable files to current directory, if you want to use these executable files in other places, don't forget to run above commands again.
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```eval_rst
 | 
				
			||||||
 | 
					.. note::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					   If you have installed higher version ``ipex-llm[cpp]`` and want to upgrade your binary file, don't forget to remove old binary files first and initialize again with ``init-llama-cpp`` or ``init-llama-cpp.bat``.
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
**Now you can use these executable files by standard llama.cpp's usage.**
 | 
					**Now you can use these executable files by standard llama.cpp's usage.**
 | 
				
			||||||
 | 
					
 | 
				
			||||||
#### Runtime Configuration
 | 
					#### Runtime Configuration
 | 
				
			||||||
| 
						 | 
					@ -314,12 +320,6 @@ If your program hang after `llm_load_tensors:  SYCL_Host buffer size =    xx.xx
 | 
				
			||||||
 | 
					
 | 
				
			||||||
If `-ngl` is set to 0, it means that the entire model will run on CPU. If `-ngl` is set to greater than 0 and less than model layers, then it's mixed GPU + CPU scenario.
 | 
					If `-ngl` is set to 0, it means that the entire model will run on CPU. If `-ngl` is set to greater than 0 and less than model layers, then it's mixed GPU + CPU scenario.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```eval_rst
 | 
					 | 
				
			||||||
.. note::
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  Now Q4_0 /Q4_1 /Q8_0 precisons are not allowed to run on CPU or run with mixed CPU and GPU.
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
#### How to specificy GPU
 | 
					#### How to specificy GPU
 | 
				
			||||||
If your machine has multi GPUs, `llama.cpp` will default use all GPUs which may slow down your inference for model which can run on single GPU. You can add `-sm none` in your command to use one GPU only.
 | 
					If your machine has multi GPUs, `llama.cpp` will default use all GPUs which may slow down your inference for model which can run on single GPU. You can add `-sm none` in your command to use one GPU only.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -46,6 +46,12 @@ Activate the `llm-cpp` conda environment and initialize Ollama by executing the
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```eval_rst
 | 
				
			||||||
 | 
					.. note::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					   If you have installed higher version ``ipex-llm[cpp]`` and want to upgrade your ollama binary file, don't forget to remove old binary files first and initialize again with ``init-ollama`` or ``init-ollama.bat``.
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
**Now you can use this executable file by standard ollama's usage.**
 | 
					**Now you can use this executable file by standard ollama's usage.**
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 3 Run Ollama Serve
 | 
					### 3 Run Ollama Serve
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue