From ea5e46c8cbe2bac558d71e416ff4b2ba7d3f5a20 Mon Sep 17 00:00:00 2001 From: Ruonan Wang Date: Tue, 16 Apr 2024 10:46:58 +0800 Subject: [PATCH] Small update of quickstart (#10772) --- .../source/doc/LLM/Quickstart/llama_cpp_quickstart.md | 5 +++++ .../source/doc/LLM/Quickstart/ollama_quickstart.md | 6 +++--- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md index e3e9840b..f3d064a5 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md @@ -262,3 +262,8 @@ Log end #### Fail to quantize model If you encounter `main: failed to quantize model from xxx`, please make sure you have created related output directory. +#### Program hang during model loading +If your program hang after `llm_load_tensors: SYCL_Host buffer size = xx.xx MiB`, you can add `--no-mmap` in your command. + +#### How to set `-ngl` parameter +`-ngl` means the number of layers to store in VRAM. If your VRAM is enough, we recommend putting all layers on GPU, you can just set `-ngl` to a large number like 999 to achieve this goal. \ No newline at end of file diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md index f2bdbca2..4c893a93 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md @@ -69,7 +69,7 @@ Launch the Ollama service: set ZES_ENABLE_SYSMAN=1 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" - ollama.exe serve + ollama serve ``` @@ -174,8 +174,8 @@ Then you can create the model in Ollama by `ollama create example -f Modelfile` .. code-block:: bash set no_proxy=localhost,127.0.0.1 - ollama.exe create example -f Modelfile - ollama.exe run example + ollama create example -f Modelfile + ollama run example ```