Small update of quickstart (#10772)

2024-04-16 10:46:58 +08:00 · 2024-04-16 10:46:58 +08:00 · ea5e46c8cb
commit ea5e46c8cb
parent 0a62933d36
2 changed files with 8 additions and 3 deletions
--- a/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md
@ -262,3 +262,8 @@ Log end
 #### Fail to quantize model
 If you encounter `main: failed to quantize model from xxx`, please make sure you have created related output directory.

+#### Program hang during model loading
+If your program hang after `llm_load_tensors:  SYCL_Host buffer size =    xx.xx MiB`, you can add `--no-mmap` in your command.
+
+#### How to set `-ngl` parameter
+`-ngl` means the number of layers to store in VRAM. If your VRAM is enough, we recommend putting all layers on GPU, you can just set `-ngl` to a large number like 999 to achieve this goal.
--- a/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md
@ -69,7 +69,7 @@ Launch the Ollama service:
         set ZES_ENABLE_SYSMAN=1
         call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

-         ollama.exe serve
+         ollama serve

 ```

@ -174,8 +174,8 @@ Then you can create the model in Ollama by `ollama create example -f Modelfile`
      .. code-block:: bash

         set no_proxy=localhost,127.0.0.1
-         ollama.exe create example -f Modelfile
-         ollama.exe run example
+         ollama create example -f Modelfile
+         ollama run example

 ```