[Doc] Add note about avoiding sourcing oneAPI for flashmoe and llama.cpp portable zip (#13274)

* Add note about avoiding sourcing oneAPI * Move note ahead of cli
2025-07-30 13:58:52 +08:00 · 2025-07-30 13:58:52 +08:00 · 891e1f511b
commit 891e1f511b
parent 951c23739d
3 changed files with 19 additions and 0 deletions
--- a/docs/mddocs/Quickstart/flashmoe_quickstart.md
+++ b/docs/mddocs/Quickstart/flashmoe_quickstart.md
@ -24,6 +24,10 @@ Check your GPU driver version, and update it if needed; we recommend following [
 > - 500GB Disk space

 ## Run
+
+> [!NOTE]
+> Do not source oneAPI when using flash-moe.
+
 Before running, you should download or copy community GGUF model to your local directory. For instance,  `DeepSeek-R1-Q4_K_M.gguf` of [DeepSeek-R1-Q4_K_M.gguf](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-Q4_K_M).

 Run `DeepSeek-R1-Q4_K_M.gguf`as shown below (change `/PATH/TO/DeepSeek-R1-Q4_K_M-00001-of-00009.gguf` to your model path)
--- a/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md
+++ b/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md
@ -139,6 +139,10 @@ Here we provide a simple example to show how to run a community GGUF model with
 Before running, you should download or copy community GGUF model to your local directory. For instance,  `DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf` of [bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF](https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF/blob/main/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf).

 #### Run GGUF model
+
+> [!NOTE]
+> Do not source oneAPI when using llama.cpp portable zip.
+
 Please change `/PATH/TO/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf` to your model path before your run below command.  
 ```bash
 ./llama-cli -m /PATH/TO/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf -p "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>. User: Question:The product of the ages of three teenagers is 4590. How old is the oldest? a. 18 b. 19 c. 15 d. 17 Assistant: <think>" -n 2048  -t 8 -e -ngl 99 --color -c 2500 --temp 0 -no-cnv
@ -215,7 +219,11 @@ Before running, you should download or copy community GGUF model to your local d

 Change `/PATH/TO/DeepSeek-R1-Q4_K_M-00001-of-00009.gguf` to your model path, then run `DeepSeek-R1-Q4_K_M.gguf`

+> [!NOTE]
+> Do not source oneAPI when using flash-moe.
+
 ##### cli
+
 ```bash
 ./flash-moe -m /PATH/TO/DeepSeek-R1-Q4_K_M-00001-of-00009.gguf --prompt "What's AI?" -no-cnv
 ```
--- a/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md
+++ b/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md
@ -142,6 +142,9 @@ llama_perf_context_print:       total time =   xxxxx.xx ms /  1385 tokens

 #### 运行 GGUF 模型

+> [!注意]
+> 使用llama.cpp portable zip不需要source oneAPI.
+
 在运行以下命令之前，请将 `PATH\TO\DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf` 更改为你的模型路径。

 ```bash
@ -217,7 +220,11 @@ FlashMoE 是一款基于 `llama.cpp` 构建的命令行工具，针对 DeepSeek

 请将 `/PATH/TO/DeepSeek-R1-Q4_K_M-00001-of-00009.gguf` 更改为您的模型路径，然后运行 `DeepSeek-R1-Q4_K_M.gguf`

+> [!注意]
+> 使用flash-moe不需要source oneAPI.
+
 ##### 命令行
+
 ```bash
 ./flash-moe -m /PATH/TO/DeepSeek-R1-Q4_K_M-00001-of-00009.gguf --prompt "What's AI?" -no-cnv
 ```