Yina Chen
9cfdf143a2
delete the deprecated llm win test ( #13275 )
2025-08-01 11:27:46 +08:00
Emmanuel Ferdman
68c5103a0a
[NPU] Update quickstart reference ( #13262 )
...
Fix the wrong QuickStart URLs in NPU `Save-Load/README.md`.
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-07-21 09:55:40 +08:00
zxue2
6ba3138d7c
Fix ambiguous boolean evaluation in bert.py ( #13236 )
...
Signed-off-by: Xue, Zhan <zhan.xue@intel.com>
2025-06-30 14:14:01 +08:00
Guancheng Fu
3f6d407be4
Fix engine.py ( #13215 )
2025-06-09 09:03:17 +08:00
Guancheng Fu
ac04992278
Update engine.py ( #13209 )
2025-06-06 15:47:33 +08:00
Ruonan Wang
dd49368e0c
only install onednn for windows when torch 2.6 ( #13207 )
2025-06-05 17:28:21 +08:00
Wang, Jian4
5a1c1297e1
Fix internvl fp16 error ( #13205 )
2025-06-05 11:17:44 +08:00
Wang, Jian4
45864790f7
Enable phi-4 with vision and audio ( #13203 )
...
* add phi4
* update
* enable audio
* update and add readme
2025-06-05 10:15:20 +08:00
Yina Chen
e032156518
Support torch_fp8 ( #13196 )
...
* support torch_fp8
2025-06-04 20:08:01 +08:00
Guancheng Fu
bb50cd0881
Update api_server.py ( #13198 )
2025-05-30 09:26:53 +08:00
Ruonan Wang
9df610f80d
fix trl import when not running speculative ( #13187 )
...
* fix trl import when not running speculative
* fix style
2025-05-26 13:21:54 +08:00
Xiangyu Tian
531bef2810
vLLM: Fix conver_to_half condition ( #13177 )
...
* fix
* format
2025-05-22 15:44:10 +08:00
Wang, Jian4
e3130a06ed
Fix multimodal errors ( #13178 )
...
* fix glm4v int4 output error
* fix glm-4v qwen2.5-vl fp16 error
* update
2025-05-22 15:39:27 +08:00
Xiangyu Tian
154af7d7f7
vLLM: set convert_to_half to False by default ( #13172 )
...
* init
* remove
* fix
2025-05-21 18:41:28 +08:00
Wang, Jian4
d83e5068d2
Enable whisper ( #13162 )
...
* fix error
* update dockerfile
2025-05-19 14:07:51 +08:00
Yina Chen
8ba57b41cd
Add merge quantized qkv ( #13160 )
...
* add merge quantized qkv
* fix style & device
* add check
2025-05-16 15:46:47 +08:00
Emmanuel Ferdman
1e4e1353a0
Resolve messages formatting issues ( #13095 )
...
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-15 16:46:52 +08:00
Kai Huang
35b49e4d91
Add trl version in error message ( #13049 )
...
* add version in error msg
* fix style
2025-05-15 09:16:27 +08:00
Yina Chen
f6441b4e3d
Add moe_softmax_topk ( #13157 )
...
* add moe_softmax_topk
* address comments
* update
2025-05-13 14:50:59 +08:00
Shaojun Liu
45f7bf6688
Refactor vLLM Documentation: Centralize Benchmarking and Improve Readability ( #13141 )
...
* update vllm doc
* update image name
* update
* update
* update
* update
2025-05-09 10:19:42 +08:00
Ruonan Wang
f5d9c49a2a
add rotary_half_with_cache_inplaced to ipex_llm.transformers.models.common ( #13143 )
...
* update
* small fix
2025-05-09 09:20:44 +08:00
Wang, Jian4
f2598b119e
update for bge-m3 ( #13138 )
2025-05-07 16:59:52 +08:00
Yishuo Wang
3a28b69202
Add qwen3 support ( #13137 )
2025-05-07 14:03:16 +08:00
Wang, Jian4
01bc7e9eb9
Fix 083 lm_head error ( #13132 )
...
* fix no quantize error
* update
* update style
2025-05-06 15:47:20 +08:00
Xiangyu Tian
51b41faad7
vLLM: update vLLM XPU to 0.8.3 version ( #13118 )
...
vLLM: update vLLM XPU to 0.8.3 version
2025-04-30 14:40:53 +08:00
Guancheng Fu
d222eaffd7
Update README.md ( #13113 )
2025-04-27 17:13:18 +08:00
Wang, Jian4
16fa778e65
enable glm4v and gemma-3 on vllm 083 ( #13114 )
...
* enable glm4v and gemma-3
* update
* add qwen2.5-vl
2025-04-27 17:10:56 +08:00
Guancheng Fu
0cfdd399e7
Update README.md ( #13104 )
2025-04-24 10:21:17 +08:00
Yishuo Wang
908fdb982e
small refactor and fix ( #13101 )
2025-04-22 14:45:31 +08:00
Guancheng Fu
14cd613fe1
Update vLLM docs with some new features ( #13092 )
...
* done
* fix
* done
* Update README.md
2025-04-22 14:39:28 +08:00
Yuwen Hu
0801d27a6f
Remove PyTorch 2.3 support for Intel GPU ( #13097 )
...
* Remove PyTorch 2.3 installation option for GPU
* Remove xpu_lnl option in installation guides for docs
* Update BMG quickstart
* Remove PyTorch 2.3 dependencies for GPU examples
* Update the graphmode example to use stable version 2.2.0
* Fix based on comments
2025-04-22 10:26:16 +08:00
Ruonan Wang
2f78afcd2a
Refactor some functions to ipex_llm.transformers.models.common ( #13091 )
...
* add quantize_linear & linear_forward
* add moe_group_topk
* rotary_two_with_cache_inplaced
* fix code style
* update related models
2025-04-18 11:15:43 +08:00
Ruonan Wang
e08c6bd018
Fix several models based on sdp api change ( #13075 )
...
* fix baichuan based on sdp api change
* fix several models based on api change
* fix style
2025-04-15 11:13:12 +08:00
Yishuo Wang
10c30cdba9
set woq_int4 as default int4 ( #13021 )
2025-04-14 14:10:59 +08:00
Ruonan Wang
6693e8ab04
Deepseek kv / sdp support ( #13068 )
...
* update kv
* fix
* fix style
2025-04-11 11:26:15 +08:00
Yuwen Hu
cd0d4857b8
ipex-llm 2.2.0 post-release update (#13053 )
...
* Update ollama/llama.cpp release link to 2.2.0 (#13052 )
* Post-update for releasing ipex-llm 2.2.0
2025-04-07 17:41:22 +08:00
Yishuo Wang
ef852dcb4a
add audio optimization for qwen2.5-omni ( #13037 )
2025-04-07 17:20:26 +08:00
Yishuo Wang
300eb01d98
Add basic optimization for Qwen2.5 omni ( #13022 )
2025-03-28 17:21:52 +08:00
Wang, Jian4
7809ca9864
Reuse --privileged ( #13015 )
...
* fix
* add
2025-03-27 10:00:50 +08:00
Guancheng Fu
f437b36678
Fix vllm glm edge model ( #13007 )
...
* fix done
* fix
2025-03-26 09:25:32 +08:00
Yuwen Hu
374747b492
Update bert optimization to fit higher transformers/torch version ( #13006 )
2025-03-25 16:12:03 +08:00
Ruonan Wang
27d669210f
remove fschat in EAGLE example ( #13005 )
...
* update fschat version
* fix
2025-03-25 15:48:48 +08:00
Shaojun Liu
08f96a5139
Rename LICENSE-Intel®-OpenMP*-Runtime-Library.txt to LICENSE-Intel®-OpenMP-Runtime-Library.txt ( #13002 )
2025-03-25 10:07:55 +08:00
Shaojun Liu
46a4f53967
OSPDT: add tpp licenses for release 2.2.0 ( #12840 )
...
* Create LICENSE-zstd.txt
* Create LICENSE-libcxx.txt
* Create LICENSE-libcxxabi.txt
* Create LICENSE-safestring.txt
* Create LICENSE-stb-image.txt
* Create LICENSE-cluster-agent.txt
* Create LICENSE-hd-agent.txt
* Create LICENSE-platform-telemetry-agent.txt
* Create LICENSE-platform-update-agent.txt
* Create LICENSE-OpenCL-ICD-Loader.txt
* Create LICENSE-xptifw.txt
* Create LICENSE-intel-openmp.txt
* Create LICENSE-Intel®-OpenMP*-Runtime-Library.txt
* Create LICENSE-Intel®-C-C++-Fortran-Compiler-Mainline.txt
* add TPP files
* Add TPP files
* add tpp
* add tpp
* update
* update
2025-03-21 15:52:22 +08:00
Yuwen Hu
5bdf57327d
Remove ipex import in fastchat loader ( #12984 )
2025-03-20 18:29:00 +08:00
Wang, Jian4
c9ecb7a113
Fix qwen nan value issue on vllm ( #12971 )
...
* add to fix qwen nan value issue
* update
2025-03-14 14:43:54 +08:00
Heyang Sun
cd109bb061
Gemma QLoRA example ( #12969 )
...
* Gemma QLoRA example
* Update README.md
* Update README.md
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2025-03-14 14:27:51 +08:00
Yuwen Hu
8bc41c13ab
Support PyTorch 2.6 with Arrow Lake-H AOT on Windows ( #12967 )
2025-03-13 15:29:47 +08:00
Wang, Jian4
c8a0462507
Add vllm api_server input output log ( #12962 )
2025-03-12 20:58:04 +08:00
Shaojun Liu
6a2d87e40f
add --entrypoint /bin/bash ( #12957 )
...
Co-authored-by: gc-fu <guancheng.fu@intel.com>
2025-03-10 10:10:27 +08:00