Commit graph

2256 commits

Author SHA1 Message Date
Ruonan Wang
f5d9c49a2a
add rotary_half_with_cache_inplaced to ipex_llm.transformers.models.common (#13143)
* update

* small fix
2025-05-09 09:20:44 +08:00
Wang, Jian4
f2598b119e
update for bge-m3 (#13138) 2025-05-07 16:59:52 +08:00
Yishuo Wang
3a28b69202
Add qwen3 support (#13137) 2025-05-07 14:03:16 +08:00
Wang, Jian4
01bc7e9eb9
Fix 083 lm_head error (#13132)
* fix no quantize error

* update

* update style
2025-05-06 15:47:20 +08:00
Xiangyu Tian
51b41faad7
vLLM: update vLLM XPU to 0.8.3 version (#13118)
vLLM: update vLLM XPU to 0.8.3 version
2025-04-30 14:40:53 +08:00
Guancheng Fu
d222eaffd7
Update README.md (#13113) 2025-04-27 17:13:18 +08:00
Wang, Jian4
16fa778e65
enable glm4v and gemma-3 on vllm 083 (#13114)
* enable glm4v and gemma-3

* update

* add qwen2.5-vl
2025-04-27 17:10:56 +08:00
Guancheng Fu
0cfdd399e7
Update README.md (#13104) 2025-04-24 10:21:17 +08:00
Yishuo Wang
908fdb982e
small refactor and fix (#13101) 2025-04-22 14:45:31 +08:00
Guancheng Fu
14cd613fe1
Update vLLM docs with some new features (#13092)
* done

* fix

* done

* Update README.md
2025-04-22 14:39:28 +08:00
Yuwen Hu
0801d27a6f
Remove PyTorch 2.3 support for Intel GPU (#13097)
* Remove PyTorch 2.3 installation option for GPU

* Remove xpu_lnl option in installation guides for docs

* Update BMG quickstart

* Remove PyTorch 2.3 dependencies for GPU examples

* Update the graphmode example to use stable version 2.2.0

* Fix based on comments
2025-04-22 10:26:16 +08:00
Ruonan Wang
2f78afcd2a
Refactor some functions to ipex_llm.transformers.models.common (#13091)
* add quantize_linear & linear_forward

* add moe_group_topk

* rotary_two_with_cache_inplaced

* fix code style

* update related models
2025-04-18 11:15:43 +08:00
Ruonan Wang
e08c6bd018
Fix several models based on sdp api change (#13075)
* fix baichuan based on sdp api change

* fix several models based on api change

* fix style
2025-04-15 11:13:12 +08:00
Yishuo Wang
10c30cdba9
set woq_int4 as default int4 (#13021) 2025-04-14 14:10:59 +08:00
Ruonan Wang
6693e8ab04
Deepseek kv / sdp support (#13068)
* update kv

* fix

* fix style
2025-04-11 11:26:15 +08:00
Yuwen Hu
cd0d4857b8
ipex-llm 2.2.0 post-release update (#13053)
* Update ollama/llama.cpp release link to 2.2.0 (#13052)

* Post-update for releasing ipex-llm 2.2.0
2025-04-07 17:41:22 +08:00
Yishuo Wang
ef852dcb4a
add audio optimization for qwen2.5-omni (#13037) 2025-04-07 17:20:26 +08:00
Yishuo Wang
300eb01d98
Add basic optimization for Qwen2.5 omni (#13022) 2025-03-28 17:21:52 +08:00
Wang, Jian4
7809ca9864
Reuse --privileged (#13015)
* fix

* add
2025-03-27 10:00:50 +08:00
Guancheng Fu
f437b36678
Fix vllm glm edge model (#13007)
* fix done

* fix
2025-03-26 09:25:32 +08:00
Yuwen Hu
374747b492
Update bert optimization to fit higher transformers/torch version (#13006) 2025-03-25 16:12:03 +08:00
Ruonan Wang
27d669210f
remove fschat in EAGLE example (#13005)
* update fschat version

* fix
2025-03-25 15:48:48 +08:00
Shaojun Liu
08f96a5139
Rename LICENSE-Intel®-OpenMP*-Runtime-Library.txt to LICENSE-Intel®-OpenMP-Runtime-Library.txt (#13002) 2025-03-25 10:07:55 +08:00
Shaojun Liu
46a4f53967
OSPDT: add tpp licenses for release 2.2.0 (#12840)
* Create LICENSE-zstd.txt

* Create LICENSE-libcxx.txt

* Create LICENSE-libcxxabi.txt

* Create LICENSE-safestring.txt

* Create LICENSE-stb-image.txt

* Create LICENSE-cluster-agent.txt

* Create LICENSE-hd-agent.txt

* Create LICENSE-platform-telemetry-agent.txt

* Create LICENSE-platform-update-agent.txt

* Create LICENSE-OpenCL-ICD-Loader.txt

* Create LICENSE-xptifw.txt

* Create LICENSE-intel-openmp.txt

* Create LICENSE-Intel®-OpenMP*-Runtime-Library.txt

* Create LICENSE-Intel®-C-C++-Fortran-Compiler-Mainline.txt

* add TPP files

* Add TPP files

* add tpp

* add tpp

* update

* update
2025-03-21 15:52:22 +08:00
Yuwen Hu
5bdf57327d
Remove ipex import in fastchat loader (#12984) 2025-03-20 18:29:00 +08:00
Wang, Jian4
c9ecb7a113
Fix qwen nan value issue on vllm (#12971)
* add to fix qwen nan value issue

* update
2025-03-14 14:43:54 +08:00
Heyang Sun
cd109bb061
Gemma QLoRA example (#12969)
* Gemma QLoRA example

* Update README.md

* Update README.md

---------

Co-authored-by: sgwhat <ge.song@intel.com>
2025-03-14 14:27:51 +08:00
Yuwen Hu
8bc41c13ab
Support PyTorch 2.6 with Arrow Lake-H AOT on Windows (#12967) 2025-03-13 15:29:47 +08:00
Wang, Jian4
c8a0462507
Add vllm api_server input output log (#12962) 2025-03-12 20:58:04 +08:00
Shaojun Liu
6a2d87e40f
add --entrypoint /bin/bash (#12957)
Co-authored-by: gc-fu <guancheng.fu@intel.com>
2025-03-10 10:10:27 +08:00
Yuwen Hu
7c0c77cce3
Tiny fixes (#12936) 2025-03-05 14:55:26 +08:00
Yuwen Hu
68a770745b
Add moonlight GPU example (#12929)
* Add moonlight GPU example and update table

* Small fix

* Fix based on comments

* Small fix
2025-03-05 11:31:14 +08:00
Shaojun Liu
f81d89d908
Remove Unnecessary --privileged Flag While Keeping It for WSL Users (#12920) 2025-03-03 11:11:42 +08:00
Yishuo Wang
b6f33d5c4d
optimize moonlight again (#12909) 2025-03-03 09:21:15 +08:00
Yuwen Hu
443cb5d4e0
Update Janus-Pro GPU example (#12906) 2025-02-28 15:39:03 +08:00
Yishuo Wang
39e360fe9d
add grouped topk optimization for moonlight (#12903) 2025-02-28 13:25:56 +08:00
Xin Qiu
e946127613
glm 4v 1st sdp for vision (#12904)
* glm4v 1st sdp

* update glm4v example

* meet code review

* fix style
2025-02-28 13:23:27 +08:00
Yishuo Wang
be1f073866
add fuse moe optimization for moonlight (#12898) 2025-02-27 09:15:24 +08:00
Yishuo Wang
5faba06409
simple optimization for moonlight moe decoding forward (#12891) 2025-02-25 16:18:27 +08:00
Yishuo Wang
ab3fc66eb7
optimize attention part of moonlight-14B-A3B (#12886) 2025-02-25 09:38:13 +08:00
Yishuo Wang
3f6ecce508
support using xgrammar to get json output (#12870) 2025-02-24 14:10:58 +08:00
Guancheng Fu
02ec313eab
Update README.md (#12877) 2025-02-24 09:59:17 +08:00
Xu, Shuo
1e00bed001
Add GPU example for Janus-Pro (#12869)
* Add example for Janus-Pro

* Update model link

* Fixes

* Fixes

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2025-02-21 18:36:50 +08:00
Wang, Jian4
3ea5389a99
Fix vllm api_server v1/models error (#12867) 2025-02-21 11:08:29 +08:00
binbin Deng
8077850452
[NPU GGUF] Add simple example (#12853) 2025-02-21 09:58:00 +08:00
Wang, Jian4
348dc8056d
Fix vllm gptq awq error (#12863)
* fix gptq awq error

* fix python style
2025-02-20 16:27:23 +08:00
Guancheng Fu
4eed0c7d99
initial implementation for low_bit_loader vLLM (#12838)
* initial

* add logic for handling tensor parallel models

* fix

* Add some comments

* add doc

* fix done
2025-02-19 19:45:34 +08:00
Xiangyu Tian
b26409d53f
R1 Hybrid: Add Benchmark for DeepSeek R1 transformers example (#12854)
* init

* fix

* update

* update

* fix

* fix
2025-02-19 18:33:21 +08:00
Yishuo Wang
aee2db30f9
update sdp support (#12847) 2025-02-19 12:07:00 +08:00
Xiangyu Tian
93c10be762
LLM: Support hybrid convert for DeepSeek V3/R1 (#12834)
LLM: Support hybrid convert for DeepSeek V3/R1
2025-02-19 11:31:19 +08:00