Commit graph

4001 commits

Author SHA1 Message Date
Shaojun Liu
015a4c8c43
Add CPU and GPU Frequency Locking Instructions to Documentation (#12947) 2025-03-07 09:20:40 +08:00
Jason Dai
cb3c4b26ad
Update llamacpp_portable_zip_gpu_quickstart.md (#12945) 2025-03-06 11:58:11 +08:00
Jason Dai
1432c5d9a0
Update llamacpp_portable_zip_gpu_quickstart (#12941) 2025-03-06 10:01:56 +08:00
Jason Dai
32480cc8ed
Update llamacpp_portable_zip_gpu_quickstart (#12940) 2025-03-06 08:42:18 +08:00
Jason Dai
975cf5f21f
Update README.md (#12939) 2025-03-06 08:04:27 +08:00
joan726
eccb5b817e
Add llamacpp_portable_zip_gpu_quickstart.zh-CN.md (#12930)
* Add llamacpp_portable_zip_gpu_quickstart.zh-CN.md

Add llamacpp_portable_zip_gpu_quickstart.zh-CN.md

* Update README.zh-CN.md

 Changed and Linked to llamacpp portable zip.zh-CN.md.

* Update llamacpp_portable_zip_gpu_quickstart.md

Added CN version link

* Update README.zh-CN.md

Update all links to "llamacpp_portable_zip_gpu_quickstart.zh-CN.md

* Update llama_cpp_quickstart.zh-CN.md

* Update llamacpp_portable_zip_gpu_quickstart.zh-CN.md

Modify based on comments.

* Update llamacpp_portable_zip_gpu_quickstart.zh-CN.md

Modify based on comments.

* Update llamacpp_portable_zip_gpu_quickstart.zh-CN.md

Update the doc based on #12928

* Update llamacpp_portable_zip_gpu_quickstart.zh-CN.md

Add “More Details” on Table of Contents

* Update README.zh-CN.md

Update llamacpp_portable_zip_gpu_quickstart CN link

* Update README.zh-CN.md

Change llama.cpp link

* Update README.zh-CN.md

* Update README.md
2025-03-05 14:55:44 +08:00
Yuwen Hu
7c0c77cce3
Tiny fixes (#12936) 2025-03-05 14:55:26 +08:00
Yuwen Hu
68a770745b
Add moonlight GPU example (#12929)
* Add moonlight GPU example and update table

* Small fix

* Fix based on comments

* Small fix
2025-03-05 11:31:14 +08:00
Xin Qiu
33da3a3cb7
Update llama cpp portable zip quickstart (#12928)
* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md
2025-03-05 09:22:10 +08:00
Jason Dai
de09590ca3
Update llamacpp_portable_zip_gpu_quickstart.md (#12932) 2025-03-05 07:59:32 +08:00
Jason Dai
69edc8b6f6
Update quickstart (#12927) 2025-03-04 15:34:52 +08:00
Qiyuan Gong
0b5079833c
llama.cpp portable Zip for Linux quickstart (#12923)
* llamacpp Linux portable doc & flashmoe
2025-03-04 14:50:21 +08:00
binbin Deng
091ab2bd59
[NPU] Add troubleshooting in portable zip doc (#12924) 2025-03-04 10:41:39 +08:00
Yuwen Hu
b2d676f1c6
Further update Ollama portable zip quickstart (#12921)
* Update Chinese doc for ollama quickstart tips and troubleshooting

* Update for recommanded Windows OS

* Small fix

* Small fix
2025-03-03 18:07:57 +08:00
Shaojun Liu
f81d89d908
Remove Unnecessary --privileged Flag While Keeping It for WSL Users (#12920) 2025-03-03 11:11:42 +08:00
Shaojun Liu
7810b8fb49
OSPDT: update dockerfile header (#12908)
* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2025-03-03 09:59:11 +08:00
Yishuo Wang
b6f33d5c4d
optimize moonlight again (#12909) 2025-03-03 09:21:15 +08:00
Jason Dai
35e5fa851c
Update README.md (#12911) 2025-02-28 17:55:45 +08:00
binbin Deng
8351f6c455
[NPU] Add QuickStart for llama.cpp NPU portable zip (#12899) 2025-02-28 17:19:18 +08:00
Xin Qiu
029480f4a8
llama cpp portable zip Quickstart (#12894)
* llamacpp_quickstart

* update

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md

* Update llamacpp_portable_zip_gpu_quickstart.md
2025-02-28 15:45:11 +08:00
Yuwen Hu
443cb5d4e0
Update Janus-Pro GPU example (#12906) 2025-02-28 15:39:03 +08:00
Yuwen Hu
8d94752c4b
Ollama portable zip QuickStart updates regarding more tips (#12905)
* Update for select multiple GPUs

* Update Ollama portable zip quickstarts regarding more tips

* Small fix
2025-02-28 15:10:56 +08:00
Yishuo Wang
39e360fe9d
add grouped topk optimization for moonlight (#12903) 2025-02-28 13:25:56 +08:00
Xin Qiu
e946127613
glm 4v 1st sdp for vision (#12904)
* glm4v 1st sdp

* update glm4v example

* meet code review

* fix style
2025-02-28 13:23:27 +08:00
Shaojun Liu
5c100ac105
Add ENTRYPOINT to Dockerfile to auto-start vllm service on container launch (for CVTE customer) (#12901)
* Add ENTRYPOINT to Dockerfile to auto-start service on container launch (for CVTE client)

* Update start-vllm-service.sh

* Update README.md

* Update README.md

* Update start-vllm-service.sh

* Update README.md
2025-02-27 17:33:58 +08:00
Yishuo Wang
be1f073866
add fuse moe optimization for moonlight (#12898) 2025-02-27 09:15:24 +08:00
Jason Dai
ad65e2b03a
Update README.md (#12900) 2025-02-27 08:30:06 +08:00
Yishuo Wang
5faba06409
simple optimization for moonlight moe decoding forward (#12891) 2025-02-25 16:18:27 +08:00
Xiangyu Tian
ae9f5320da
vLLM CPU: Fix Triton Version to Resolve Related Error(#12893) 2025-02-25 15:00:41 +08:00
Yishuo Wang
ab3fc66eb7
optimize attention part of moonlight-14B-A3B (#12886) 2025-02-25 09:38:13 +08:00
Shaojun Liu
dd30d12cb6
Fix serving-cpu image: setuptools-scm requires setuptools>=61 (#12876)
* setuptools-scm requires setuptools>=61

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2025-02-25 09:10:14 +08:00
Yuwen Hu
06694ba61a
Further fix portable zip file link (#12885) 2025-02-24 18:06:57 +08:00
Yuwen Hu
671ddfd847
Update wrong file name for portable zip quickstart (#12883) 2025-02-24 17:52:09 +08:00
Yuwen Hu
a9c8e73a77
Update llama.cpp Prerequisites guide regarding oneAPI 2025.0 (#12881)
* Update llama.cpp Prerequisites guide regarding oneAPI 2025.0

* Update based on comments

* Small fix

* Small fix
2025-02-24 16:32:23 +08:00
Wang, Jian4
4f2f92afa3
Update inference-cpp docker (#12882)
* remove nouse run.py

* add WORKDIR /llm
2025-02-24 14:32:44 +08:00
Yishuo Wang
3f6ecce508
support using xgrammar to get json output (#12870) 2025-02-24 14:10:58 +08:00
Shaojun Liu
afad979168
Add Apache 2.0 License Information in Dockerfile to Comply with OSPDT Requirements (#12878)
* ospdt: add Header for Dockerfile

* OSPDT: add Header for Dockerfile

* OSPDT: add Header for Dockerfile

* OSPDT: add Header for Dockerfile
2025-02-24 14:00:46 +08:00
Guancheng Fu
02ec313eab
Update README.md (#12877) 2025-02-24 09:59:17 +08:00
Shaojun Liu
10400abfb7
Fix CodeQL workflow (#12875)
* Update codeql.yml

* Update codeql.yml
2025-02-24 09:16:54 +08:00
Xu, Shuo
1e00bed001
Add GPU example for Janus-Pro (#12869)
* Add example for Janus-Pro

* Update model link

* Fixes

* Fixes

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2025-02-21 18:36:50 +08:00
Yuwen Hu
21d6a78be0
Update Ollama portable zip QuickStart to fit new version (#12871)
* Update ollama portable zip quickstart

* Update demo images
2025-02-21 17:54:14 +08:00
Wang, Jian4
3ea5389a99
Fix vllm api_server v1/models error (#12867) 2025-02-21 11:08:29 +08:00
binbin Deng
8077850452
[NPU GGUF] Add simple example (#12853) 2025-02-21 09:58:00 +08:00
Wang, Jian4
348dc8056d
Fix vllm gptq awq error (#12863)
* fix gptq awq error

* fix python style
2025-02-20 16:27:23 +08:00
Yuwen Hu
a488981f3f
Ollama portable zip QuickStart tiny fix (#12862)
* Tiny fix to ollama portable zip quickstart

* Tiny fix
2025-02-20 14:11:12 +08:00
Yuwen Hu
0f2706be42
Update CN Ollama portable zip QuickStart for troubleshooting & tips (#12860)
* Small fix for english version

* Update CN ollama portable zip quickstart for troubleshooting & tips

* Small fix
2025-02-20 11:32:06 +08:00
Jason Dai
38a682adb1
Update Readme (#12855) 2025-02-19 19:55:29 +08:00
Guancheng Fu
4eed0c7d99
initial implementation for low_bit_loader vLLM (#12838)
* initial

* add logic for handling tensor parallel models

* fix

* Add some comments

* add doc

* fix done
2025-02-19 19:45:34 +08:00
Xin Qiu
c81b7fc003
Add Portable zip Linux QuickStart (#12849)
* linux doc

* update

* Update ollama_portablze_zip_quickstart.md

* Update ollama_portablze_zip_quickstart.md

* Update ollama_portablze_zip_quickstart.zh-CN.md

* Update ollama_portablze_zip_quickstart.md

* meet code review

* update

* Add tips & troubleshooting sections for both Linux & Windows

* Rebase

* Fix based on comments

* Small fix

* Fix img

* Update table for linux

* Small fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2025-02-19 19:13:55 +08:00
Xiangyu Tian
b26409d53f
R1 Hybrid: Add Benchmark for DeepSeek R1 transformers example (#12854)
* init

* fix

* update

* update

* fix

* fix
2025-02-19 18:33:21 +08:00