Commit graph

4030 commits

Author SHA1 Message Date
Yuwen Hu
8d94752c4b
Ollama portable zip QuickStart updates regarding more tips (#12905)
* Update for select multiple GPUs

* Update Ollama portable zip quickstarts regarding more tips

* Small fix
2025-02-28 15:10:56 +08:00
Yishuo Wang
39e360fe9d
add grouped topk optimization for moonlight (#12903) 2025-02-28 13:25:56 +08:00
Xin Qiu
e946127613
glm 4v 1st sdp for vision (#12904)
* glm4v 1st sdp

* update glm4v example

* meet code review

* fix style
2025-02-28 13:23:27 +08:00
Shaojun Liu
5c100ac105
Add ENTRYPOINT to Dockerfile to auto-start vllm service on container launch (for CVTE customer) (#12901)
* Add ENTRYPOINT to Dockerfile to auto-start service on container launch (for CVTE client)

* Update start-vllm-service.sh

* Update README.md

* Update README.md

* Update start-vllm-service.sh

* Update README.md
2025-02-27 17:33:58 +08:00
Yishuo Wang
be1f073866
add fuse moe optimization for moonlight (#12898) 2025-02-27 09:15:24 +08:00
Jason Dai
ad65e2b03a
Update README.md (#12900) 2025-02-27 08:30:06 +08:00
Yishuo Wang
5faba06409
simple optimization for moonlight moe decoding forward (#12891) 2025-02-25 16:18:27 +08:00
Xiangyu Tian
ae9f5320da
vLLM CPU: Fix Triton Version to Resolve Related Error(#12893) 2025-02-25 15:00:41 +08:00
Yishuo Wang
ab3fc66eb7
optimize attention part of moonlight-14B-A3B (#12886) 2025-02-25 09:38:13 +08:00
Shaojun Liu
dd30d12cb6
Fix serving-cpu image: setuptools-scm requires setuptools>=61 (#12876)
* setuptools-scm requires setuptools>=61

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2025-02-25 09:10:14 +08:00
Yuwen Hu
06694ba61a
Further fix portable zip file link (#12885) 2025-02-24 18:06:57 +08:00
Yuwen Hu
671ddfd847
Update wrong file name for portable zip quickstart (#12883) 2025-02-24 17:52:09 +08:00
Yuwen Hu
a9c8e73a77
Update llama.cpp Prerequisites guide regarding oneAPI 2025.0 (#12881)
* Update llama.cpp Prerequisites guide regarding oneAPI 2025.0

* Update based on comments

* Small fix

* Small fix
2025-02-24 16:32:23 +08:00
Wang, Jian4
4f2f92afa3
Update inference-cpp docker (#12882)
* remove nouse run.py

* add WORKDIR /llm
2025-02-24 14:32:44 +08:00
Yishuo Wang
3f6ecce508
support using xgrammar to get json output (#12870) 2025-02-24 14:10:58 +08:00
Shaojun Liu
afad979168
Add Apache 2.0 License Information in Dockerfile to Comply with OSPDT Requirements (#12878)
* ospdt: add Header for Dockerfile

* OSPDT: add Header for Dockerfile

* OSPDT: add Header for Dockerfile

* OSPDT: add Header for Dockerfile
2025-02-24 14:00:46 +08:00
Guancheng Fu
02ec313eab
Update README.md (#12877) 2025-02-24 09:59:17 +08:00
Shaojun Liu
10400abfb7
Fix CodeQL workflow (#12875)
* Update codeql.yml

* Update codeql.yml
2025-02-24 09:16:54 +08:00
Xu, Shuo
1e00bed001
Add GPU example for Janus-Pro (#12869)
* Add example for Janus-Pro

* Update model link

* Fixes

* Fixes

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2025-02-21 18:36:50 +08:00
Yuwen Hu
21d6a78be0
Update Ollama portable zip QuickStart to fit new version (#12871)
* Update ollama portable zip quickstart

* Update demo images
2025-02-21 17:54:14 +08:00
Wang, Jian4
3ea5389a99
Fix vllm api_server v1/models error (#12867) 2025-02-21 11:08:29 +08:00
binbin Deng
8077850452
[NPU GGUF] Add simple example (#12853) 2025-02-21 09:58:00 +08:00
Wang, Jian4
348dc8056d
Fix vllm gptq awq error (#12863)
* fix gptq awq error

* fix python style
2025-02-20 16:27:23 +08:00
Yuwen Hu
a488981f3f
Ollama portable zip QuickStart tiny fix (#12862)
* Tiny fix to ollama portable zip quickstart

* Tiny fix
2025-02-20 14:11:12 +08:00
Yuwen Hu
0f2706be42
Update CN Ollama portable zip QuickStart for troubleshooting & tips (#12860)
* Small fix for english version

* Update CN ollama portable zip quickstart for troubleshooting & tips

* Small fix
2025-02-20 11:32:06 +08:00
Jason Dai
38a682adb1
Update Readme (#12855) 2025-02-19 19:55:29 +08:00
Guancheng Fu
4eed0c7d99
initial implementation for low_bit_loader vLLM (#12838)
* initial

* add logic for handling tensor parallel models

* fix

* Add some comments

* add doc

* fix done
2025-02-19 19:45:34 +08:00
Xin Qiu
c81b7fc003
Add Portable zip Linux QuickStart (#12849)
* linux doc

* update

* Update ollama_portablze_zip_quickstart.md

* Update ollama_portablze_zip_quickstart.md

* Update ollama_portablze_zip_quickstart.zh-CN.md

* Update ollama_portablze_zip_quickstart.md

* meet code review

* update

* Add tips & troubleshooting sections for both Linux & Windows

* Rebase

* Fix based on comments

* Small fix

* Fix img

* Update table for linux

* Small fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2025-02-19 19:13:55 +08:00
Xiangyu Tian
b26409d53f
R1 Hybrid: Add Benchmark for DeepSeek R1 transformers example (#12854)
* init

* fix

* update

* update

* fix

* fix
2025-02-19 18:33:21 +08:00
SONG Ge
5d041f9ebf
Add latest models list in ollama quickstart (#12850)
* Add latest models llist on ollama quickstart

* update oneapi version describe

* move models list to ollama_portable_zip doc

* update CN readme
2025-02-19 18:29:43 +08:00
Yishuo Wang
aee2db30f9
update sdp support (#12847) 2025-02-19 12:07:00 +08:00
Xiangyu Tian
93c10be762
LLM: Support hybrid convert for DeepSeek V3/R1 (#12834)
LLM: Support hybrid convert for DeepSeek V3/R1
2025-02-19 11:31:19 +08:00
Yuwen Hu
637543e135
Update Ollama portable zip QuickStart with troubleshooting (#12846)
* Update ollama portable zip quickstart with runtime configurations

* Small fix

* Update based on comments

* Small fix

* Small fix
2025-02-19 11:04:03 +08:00
binbin Deng
bde8acc303
[NPU] Update doc of gguf support (#12837) 2025-02-19 10:46:35 +08:00
Wang, Jian4
e1809a6295
Update multimodal on vllm 0.6.6 (#12816)
* add glm4v and minicpmv example

* fix
2025-02-19 10:04:42 +08:00
Xiangyu Tian
09150b6058
Initiate CPU-XPU Hybrid Inference for DeepSeek-R1 (#12832)
Initiate CPU-XPU Hybrid Inference for DeepSeek-R1 with DeepseekV3Attention
and DeepseekV3MLP to XPU
2025-02-18 13:34:14 +08:00
Xiangyu Tian
09ed96082b
Add DeepSeek V3/R1 CPU example (#12836)
Add DeepSeek V3/R1 CPU example for bf16 model
2025-02-18 12:45:49 +08:00
Yishuo Wang
8418450300
optimize minicpm-o's tts part (#12833) 2025-02-17 14:53:37 +08:00
Shaojun Liu
f7b5a093a7
Merge CPU & XPU Dockerfiles with Serving Images and Refactor (#12815)
* Update Dockerfile

* Update Dockerfile

* Ensure scripts are executable

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* update

* Update Dockerfile

* remove inference-cpu and inference-xpu

* update README
2025-02-17 14:23:22 +08:00
Jason Dai
eaec64baca
Update README.md (#12826) 2025-02-14 21:20:57 +08:00
joan726
59e8e1e91e
Added ollama_portablze_zip_quickstart.zh-CN.md (#12822) 2025-02-14 18:54:12 +08:00
Jason Dai
a09552e59a
Update ollama quickstart (#12823) 2025-02-14 09:55:48 +08:00
Yuwen Hu
f67986021c
Update download link for Ollama portable zip QuickStart (#12821)
* Update download link for Ollama portable zip quickstart

* Update based on comments
2025-02-13 17:48:02 +08:00
Jason Dai
16e63cbc18
Update readme (#12820) 2025-02-13 14:26:04 +08:00
Yuwen Hu
68414afcb9
Add initial QuickStart for Ollama portable zip (#12817)
* Add initial quickstart for Ollama portable zip

* Small fix

* Fixed based on comments

* Small fix

* Add demo image for run ollama

* Update download link
2025-02-13 13:18:14 +08:00
Wang, Jian4
1083fe5508
Reenable pp and lightweight-serving serving on 0.6.6 (#12814)
* reenable pp ang lightweight serving on 066

* update readme

* updat

* update tag
2025-02-13 10:16:00 +08:00
Guancheng Fu
af693425f1
Upgrade to vLLM 0.6.6 (#12796)
* init

* update engine init

* fix serving load_in_low_bit problem

* temp

* temp

* temp

* temp

* temp

* fix

* fixed

* done

* fix

* fix all arguments

* fix

* fix throughput script

* fix

* fix

* use official ipex-llm

* Fix readme

* fix

---------

Co-authored-by: hzjane <a1015616934@qq.com>
2025-02-12 16:47:51 +08:00
Yishuo Wang
f8ab833f74
support and optimize janus pro (#12813) 2025-02-12 15:07:24 +08:00
Shaojun Liu
bd815a4d96
Update the base image of inference-cpp image to oneapi 2025.0.2 (#12802)
* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2025-02-12 14:15:08 +08:00
Yishuo Wang
73cfe293fa
add basic support for Baichuan-M1-14B-Instruct (#12808) 2025-02-11 17:27:42 +08:00