Add SNC to llama.cpp portable zip quick start (#12972)

* Add SNC to quick start
2025-03-17 10:58:06 +08:00 · 2025-03-17 10:58:06 +08:00 · dd026db50b
commit dd026db50b
parent b0d56273a8
2 changed files with 2 additions and 0 deletions
--- a/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md
+++ b/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md
@ -204,6 +204,7 @@ Requirements:
 Note: 
 - Larger models and other precisions may require more resources.
 - For 1 ARC A770 platform, please reduce context length (e.g., 1024) to avoid OOM. Add this option `-c 1024` at the end of below command.
+- For dual-sockets platform, please enable `SNC (Sub-NUMA Clustering)` in BIOS and add `numactl --interleave=all` before launch command to gain *better decoding performance*.

 Before running, you should download or copy community GGUF model to your local directory. For instance,  `DeepSeek-R1-Q4_K_M.gguf` of [DeepSeek-R1-Q4_K_M.gguf](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-Q4_K_M).

--- a/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md
+++ b/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md
@ -206,6 +206,7 @@ FlashMoE 是一款基于 `llama.cpp` 构建的命令行工具，针对 DeepSeek
 提示： 
 - 更大的模型和其他精度可能需要更多的资源。
 - 对于 1 块 ARC A770 的平台，请减少上下文长度（例如 1024），以避免 OOM（内存溢出）。请在以下命令的末尾添加选项 `-c 1024`。
+- 对于拥有 2 块 CPU 的平台，请在 BIOS 上开启`子NUMA 集群`, 并在启动命令前增加 `numactl --interleave=all`, 以获得*更高的性能*。

 运行之前，你需要下载或复制社区的 GGUF 模型到你的当前目录。例如，[DeepSeek-R1-Q4_K_M.gguf](https://huggingface.co/unsloth/DeepSeek-R1-GGUF/tree/main/DeepSeek-R1-Q4_K_M) 的 `DeepSeek-R1-Q4_K_M.gguf`。