Commit graph

3942 commits

Author SHA1 Message Date
Shaojun Liu
f7b5a093a7
Merge CPU & XPU Dockerfiles with Serving Images and Refactor (#12815)
* Update Dockerfile

* Update Dockerfile

* Ensure scripts are executable

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* update

* Update Dockerfile

* remove inference-cpu and inference-xpu

* update README
2025-02-17 14:23:22 +08:00
Jason Dai
eaec64baca
Update README.md (#12826) 2025-02-14 21:20:57 +08:00
joan726
59e8e1e91e
Added ollama_portablze_zip_quickstart.zh-CN.md (#12822) 2025-02-14 18:54:12 +08:00
Jason Dai
a09552e59a
Update ollama quickstart (#12823) 2025-02-14 09:55:48 +08:00
Yuwen Hu
f67986021c
Update download link for Ollama portable zip QuickStart (#12821)
* Update download link for Ollama portable zip quickstart

* Update based on comments
2025-02-13 17:48:02 +08:00
Jason Dai
16e63cbc18
Update readme (#12820) 2025-02-13 14:26:04 +08:00
Yuwen Hu
68414afcb9
Add initial QuickStart for Ollama portable zip (#12817)
* Add initial quickstart for Ollama portable zip

* Small fix

* Fixed based on comments

* Small fix

* Add demo image for run ollama

* Update download link
2025-02-13 13:18:14 +08:00
Wang, Jian4
1083fe5508
Reenable pp and lightweight-serving serving on 0.6.6 (#12814)
* reenable pp ang lightweight serving on 066

* update readme

* updat

* update tag
2025-02-13 10:16:00 +08:00
Guancheng Fu
af693425f1
Upgrade to vLLM 0.6.6 (#12796)
* init

* update engine init

* fix serving load_in_low_bit problem

* temp

* temp

* temp

* temp

* temp

* fix

* fixed

* done

* fix

* fix all arguments

* fix

* fix throughput script

* fix

* fix

* use official ipex-llm

* Fix readme

* fix

---------

Co-authored-by: hzjane <a1015616934@qq.com>
2025-02-12 16:47:51 +08:00
Yishuo Wang
f8ab833f74
support and optimize janus pro (#12813) 2025-02-12 15:07:24 +08:00
Shaojun Liu
bd815a4d96
Update the base image of inference-cpp image to oneapi 2025.0.2 (#12802)
* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2025-02-12 14:15:08 +08:00
Yishuo Wang
73cfe293fa
add basic support for Baichuan-M1-14B-Instruct (#12808) 2025-02-11 17:27:42 +08:00
binbin Deng
d093b75aa0
[NPU] Update driver installation in QuickStart (#12807) 2025-02-11 15:49:21 +08:00
Xiangyu Tian
b70ad902b4
Fix ipex-llm CPU linear dtype not match (#12805) 2025-02-11 10:34:44 +08:00
Shaojun Liu
2701a9d1e3
Remove Migrated Workflows to Avoid Duplication and Confusion (#12801)
* Delete .github/actions/llm directory

* Delete .github/workflows/release-ipex-llm.yaml

* Delete .github/workflows/llm-nightly-test.yml

* Delete .github/workflows/llm_unit_tests.yml

* Delete .github/workflows/llm-binary-build.yml

* Delete .github/workflows/llm_example_tests.yml

* Delete .github/workflows/llm_performance_tests.yml

* Delete .github/workflows/manually_build.yml

* Delete .github/workflows/manually_build_for_testing.yml

* Delete .github/workflows/release-pypi.yml
2025-02-10 14:58:08 +08:00
Yina Chen
eb2df5ed70
common.h -> npu/npu_common.h (#12800) 2025-02-10 14:38:22 +08:00
Yishuo Wang
e4ceb722b6
fix qwen2 vl (#12798) 2025-02-10 13:25:53 +08:00
binbin Deng
3fee838b14
[NPU] Fix of c++ convert example (#12797) 2025-02-10 11:17:58 +08:00
Kai Huang
468d3f22fc
Rename NPU public example to llm-cli (#12790)
* rename to llm-cli

* update readme
2025-02-08 10:19:59 +08:00
Ruonan Wang
e90a9ad196
[NPU] Support non-const parameter for decoder layers when keep_ir=True (#12789)
* support layernorm=False for decoder layers

* renbame to meet review

* fix style

* rename to const_parameter

* fix rebase error

* fix rebase error
2025-02-08 09:58:42 +08:00
Yishuo Wang
8aea5319bb
update more lora example (#12785) 2025-02-08 09:46:48 +08:00
Yuwen Hu
fd28cf1672
Upgrade ipex-llm[cpp] to oneAPI 2025.0 on Windows (#12778)
* Upgrade ipex-llm[cpp] to oneAPI 2025.0

* Fit oneapi pypi dependency on Windows for now
2025-02-07 18:29:34 +08:00
binbin Deng
ca1d7b7c2c
[NPU] Support qwen models with cos_sin_input=True (#12788) 2025-02-07 16:41:13 +08:00
binbin Deng
6ff7faa781
[NPU] Update deepseek support in python examples and quickstart (#12786) 2025-02-07 11:25:16 +08:00
Ruonan Wang
b4f2be2b09
[NPU] Update C++ example to add DeepSeek-R1 (#12787) 2025-02-07 11:23:34 +08:00
Yishuo Wang
d0d9c9d636
remove load_in_8bit usage as it is not supported a long time ago (#12779) 2025-02-07 11:21:29 +08:00
Xiangyu Tian
9e9b6c9f2b
Fix cpu serving docker image (#12783) 2025-02-07 11:12:42 +08:00
Yishuo Wang
b4c9e23f73
fix galore and peft finetune example (#12776) 2025-02-06 16:36:13 +08:00
Yishuo Wang
c0d6b282b8
fix lisa finetune example (#12775) 2025-02-06 16:35:43 +08:00
Yishuo Wang
2e5f2e5dda
fix dpo finetune (#12774) 2025-02-06 16:35:21 +08:00
Yishuo Wang
9697197f3e
fix qlora finetune example (#12769) 2025-02-06 11:18:28 +08:00
Ruonan Wang
094a25b740
[NPU] Expose parameter to control blob / IR save logic (#12767)
* update api

* fix convert.py

* fix style

* remove unnecessary bin file

* fix style
2025-02-06 10:07:45 +08:00
Jason Dai
9c0daf6396
Fix readme links (#12771) 2025-02-05 19:24:25 +08:00
Jason Dai
a1e7bfc638
Update Readme (#12770) 2025-02-05 19:19:57 +08:00
Yishuo Wang
0237ffb302
refactor xpu linear forward (#12768) 2025-02-05 17:40:38 +08:00
Danciu Georgian
413d6c2b66
Update check.py removing a twice defined function (#12760)
Remove duplicate function
2025-02-05 11:37:59 +08:00
Yuwen Hu
184adb2653
Small fix to MiniCPM-o-2_6 GPU example (#12766) 2025-02-05 11:32:26 +08:00
Shaojun Liu
ee809e71df
add troubleshooting section (#12755) 2025-01-26 11:03:58 +08:00
Shaojun Liu
5fb87d7486
remove ${HF_TOKEN} (#12742) 2025-01-26 10:31:42 +08:00
Xiangyu Tian
f924880694
vLLM: Fix vLLM-CPU docker image (#12741) 2025-01-24 10:00:29 +08:00
Yuwen Hu
69f13c78b8
[NPU] Update layernorm node on MTL/ARL (#12738)
* Update layernorm node on MTL/ARL

* Fix on style
2025-01-23 17:25:19 +08:00
Yuwen Hu
d11f257ee7
Add GPU example for MiniCPM-o-2_6 (#12735)
* Add init example for omni mode

* Small fix

* Small fix

* Add chat example

* Remove lagecy link

* Further update link

* Add readme

* Small fix

* Update main readme link

* Update based on comments

* Small fix

* Small fix

* Small fix
2025-01-23 16:10:19 +08:00
Yuwen Hu
dcca522618
Remove sdpa available patch (#12734) 2025-01-22 17:22:28 +08:00
Xiangyu Tian
c9b6c94a59
vLLM: Update vLLM-cpu to v0.6.6-post1 (#12728)
Update vLLM-cpu to v0.6.6-post1
2025-01-22 15:03:01 +08:00
Ruonan Wang
78cca0a68c
[NPU] update llm-npu-cli example (#12729)
* update cli example

* add license

* rename

* update readme sample output
2025-01-22 09:59:27 +08:00
Jason Dai
7e29edcc4b
Update Readme (#12730) 2025-01-22 08:43:32 +08:00
Yishuo Wang
6789e5d92f
small fix (#12727) 2025-01-21 17:27:18 +08:00
Jason Dai
412bfd6644
Update readme (#12724) 2025-01-21 10:59:14 +08:00
Wang, Jian4
716d4fe563
Add vllm 0.6.2 vision offline example (#12721)
* add vision offline example

* add to docker
2025-01-21 09:58:01 +08:00
Yishuo Wang
085974e307
fix nf4 to cpu (#12722) 2025-01-21 09:23:22 +08:00