Jason Dai
2a8f624f4b
Update README ( #12956 )
2025-03-09 09:04:13 +08:00
binbin Deng
5ee09b4b28
[NPU] Small update about zip doc ( #12951 )
2025-03-07 15:22:14 +08:00
Shaojun Liu
015a4c8c43
Add CPU and GPU Frequency Locking Instructions to Documentation ( #12947 )
2025-03-07 09:20:40 +08:00
Jason Dai
cb3c4b26ad
Update llamacpp_portable_zip_gpu_quickstart.md ( #12945 )
2025-03-06 11:58:11 +08:00
Jason Dai
1432c5d9a0
Update llamacpp_portable_zip_gpu_quickstart ( #12941 )
2025-03-06 10:01:56 +08:00
Jason Dai
32480cc8ed
Update llamacpp_portable_zip_gpu_quickstart ( #12940 )
2025-03-06 08:42:18 +08:00
Jason Dai
975cf5f21f
Update README.md ( #12939 )
2025-03-06 08:04:27 +08:00
joan726
eccb5b817e
Add llamacpp_portable_zip_gpu_quickstart.zh-CN.md ( #12930 )
...
* Add llamacpp_portable_zip_gpu_quickstart.zh-CN.md
Add llamacpp_portable_zip_gpu_quickstart.zh-CN.md
* Update README.zh-CN.md
Changed and Linked to llamacpp portable zip.zh-CN.md.
* Update llamacpp_portable_zip_gpu_quickstart.md
Added CN version link
* Update README.zh-CN.md
Update all links to "llamacpp_portable_zip_gpu_quickstart.zh-CN.md
* Update llama_cpp_quickstart.zh-CN.md
* Update llamacpp_portable_zip_gpu_quickstart.zh-CN.md
Modify based on comments.
* Update llamacpp_portable_zip_gpu_quickstart.zh-CN.md
Modify based on comments.
* Update llamacpp_portable_zip_gpu_quickstart.zh-CN.md
Update the doc based on #12928
* Update llamacpp_portable_zip_gpu_quickstart.zh-CN.md
Add “More Details” on Table of Contents
* Update README.zh-CN.md
Update llamacpp_portable_zip_gpu_quickstart CN link
* Update README.zh-CN.md
Change llama.cpp link
* Update README.zh-CN.md
* Update README.md
2025-03-05 14:55:44 +08:00
Yuwen Hu
7c0c77cce3
Tiny fixes ( #12936 )
2025-03-05 14:55:26 +08:00
Yuwen Hu
68a770745b
Add moonlight GPU example ( #12929 )
...
* Add moonlight GPU example and update table
* Small fix
* Fix based on comments
* Small fix
2025-03-05 11:31:14 +08:00
Xin Qiu
33da3a3cb7
Update llama cpp portable zip quickstart ( #12928 )
...
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
2025-03-05 09:22:10 +08:00
Jason Dai
de09590ca3
Update llamacpp_portable_zip_gpu_quickstart.md ( #12932 )
2025-03-05 07:59:32 +08:00
Jason Dai
69edc8b6f6
Update quickstart ( #12927 )
2025-03-04 15:34:52 +08:00
Qiyuan Gong
0b5079833c
llama.cpp portable Zip for Linux quickstart ( #12923 )
...
* llamacpp Linux portable doc & flashmoe
2025-03-04 14:50:21 +08:00
binbin Deng
091ab2bd59
[NPU] Add troubleshooting in portable zip doc ( #12924 )
2025-03-04 10:41:39 +08:00
Yuwen Hu
b2d676f1c6
Further update Ollama portable zip quickstart ( #12921 )
...
* Update Chinese doc for ollama quickstart tips and troubleshooting
* Update for recommanded Windows OS
* Small fix
* Small fix
2025-03-03 18:07:57 +08:00
Shaojun Liu
f81d89d908
Remove Unnecessary --privileged Flag While Keeping It for WSL Users ( #12920 )
2025-03-03 11:11:42 +08:00
Shaojun Liu
7810b8fb49
OSPDT: update dockerfile header ( #12908 )
...
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
2025-03-03 09:59:11 +08:00
Yishuo Wang
b6f33d5c4d
optimize moonlight again ( #12909 )
2025-03-03 09:21:15 +08:00
Jason Dai
35e5fa851c
Update README.md ( #12911 )
2025-02-28 17:55:45 +08:00
binbin Deng
8351f6c455
[NPU] Add QuickStart for llama.cpp NPU portable zip ( #12899 )
2025-02-28 17:19:18 +08:00
Xin Qiu
029480f4a8
llama cpp portable zip Quickstart ( #12894 )
...
* llamacpp_quickstart
* update
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
* Update llamacpp_portable_zip_gpu_quickstart.md
2025-02-28 15:45:11 +08:00
Yuwen Hu
443cb5d4e0
Update Janus-Pro GPU example ( #12906 )
2025-02-28 15:39:03 +08:00
Yuwen Hu
8d94752c4b
Ollama portable zip QuickStart updates regarding more tips ( #12905 )
...
* Update for select multiple GPUs
* Update Ollama portable zip quickstarts regarding more tips
* Small fix
2025-02-28 15:10:56 +08:00
Yishuo Wang
39e360fe9d
add grouped topk optimization for moonlight ( #12903 )
2025-02-28 13:25:56 +08:00
Xin Qiu
e946127613
glm 4v 1st sdp for vision ( #12904 )
...
* glm4v 1st sdp
* update glm4v example
* meet code review
* fix style
2025-02-28 13:23:27 +08:00
Shaojun Liu
5c100ac105
Add ENTRYPOINT to Dockerfile to auto-start vllm service on container launch (for CVTE customer) ( #12901 )
...
* Add ENTRYPOINT to Dockerfile to auto-start service on container launch (for CVTE client)
* Update start-vllm-service.sh
* Update README.md
* Update README.md
* Update start-vllm-service.sh
* Update README.md
2025-02-27 17:33:58 +08:00
Yishuo Wang
be1f073866
add fuse moe optimization for moonlight ( #12898 )
2025-02-27 09:15:24 +08:00
Jason Dai
ad65e2b03a
Update README.md ( #12900 )
2025-02-27 08:30:06 +08:00
Yishuo Wang
5faba06409
simple optimization for moonlight moe decoding forward ( #12891 )
2025-02-25 16:18:27 +08:00
Xiangyu Tian
ae9f5320da
vLLM CPU: Fix Triton Version to Resolve Related Error( #12893 )
2025-02-25 15:00:41 +08:00
Yishuo Wang
ab3fc66eb7
optimize attention part of moonlight-14B-A3B ( #12886 )
2025-02-25 09:38:13 +08:00
Shaojun Liu
dd30d12cb6
Fix serving-cpu image: setuptools-scm requires setuptools>=61 ( #12876 )
...
* setuptools-scm requires setuptools>=61
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
2025-02-25 09:10:14 +08:00
Yuwen Hu
06694ba61a
Further fix portable zip file link ( #12885 )
2025-02-24 18:06:57 +08:00
Yuwen Hu
671ddfd847
Update wrong file name for portable zip quickstart ( #12883 )
2025-02-24 17:52:09 +08:00
Yuwen Hu
a9c8e73a77
Update llama.cpp Prerequisites guide regarding oneAPI 2025.0 ( #12881 )
...
* Update llama.cpp Prerequisites guide regarding oneAPI 2025.0
* Update based on comments
* Small fix
* Small fix
2025-02-24 16:32:23 +08:00
Wang, Jian4
4f2f92afa3
Update inference-cpp docker ( #12882 )
...
* remove nouse run.py
* add WORKDIR /llm
2025-02-24 14:32:44 +08:00
Yishuo Wang
3f6ecce508
support using xgrammar to get json output ( #12870 )
2025-02-24 14:10:58 +08:00
Shaojun Liu
afad979168
Add Apache 2.0 License Information in Dockerfile to Comply with OSPDT Requirements ( #12878 )
...
* ospdt: add Header for Dockerfile
* OSPDT: add Header for Dockerfile
* OSPDT: add Header for Dockerfile
* OSPDT: add Header for Dockerfile
2025-02-24 14:00:46 +08:00
Guancheng Fu
02ec313eab
Update README.md ( #12877 )
2025-02-24 09:59:17 +08:00
Shaojun Liu
10400abfb7
Fix CodeQL workflow ( #12875 )
...
* Update codeql.yml
* Update codeql.yml
2025-02-24 09:16:54 +08:00
Xu, Shuo
1e00bed001
Add GPU example for Janus-Pro ( #12869 )
...
* Add example for Janus-Pro
* Update model link
* Fixes
* Fixes
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2025-02-21 18:36:50 +08:00
Yuwen Hu
21d6a78be0
Update Ollama portable zip QuickStart to fit new version ( #12871 )
...
* Update ollama portable zip quickstart
* Update demo images
2025-02-21 17:54:14 +08:00
Wang, Jian4
3ea5389a99
Fix vllm api_server v1/models error ( #12867 )
2025-02-21 11:08:29 +08:00
binbin Deng
8077850452
[NPU GGUF] Add simple example ( #12853 )
2025-02-21 09:58:00 +08:00
Wang, Jian4
348dc8056d
Fix vllm gptq awq error ( #12863 )
...
* fix gptq awq error
* fix python style
2025-02-20 16:27:23 +08:00
Yuwen Hu
a488981f3f
Ollama portable zip QuickStart tiny fix ( #12862 )
...
* Tiny fix to ollama portable zip quickstart
* Tiny fix
2025-02-20 14:11:12 +08:00
Yuwen Hu
0f2706be42
Update CN Ollama portable zip QuickStart for troubleshooting & tips ( #12860 )
...
* Small fix for english version
* Update CN ollama portable zip quickstart for troubleshooting & tips
* Small fix
2025-02-20 11:32:06 +08:00
Jason Dai
38a682adb1
Update Readme ( #12855 )
2025-02-19 19:55:29 +08:00
Guancheng Fu
4eed0c7d99
initial implementation for low_bit_loader vLLM ( #12838 )
...
* initial
* add logic for handling tensor parallel models
* fix
* Add some comments
* add doc
* fix done
2025-02-19 19:45:34 +08:00