Xu, Shuo
1e00bed001
Add GPU example for Janus-Pro ( #12869 )
...
* Add example for Janus-Pro
* Update model link
* Fixes
* Fixes
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2025-02-21 18:36:50 +08:00
Wang, Jian4
3ea5389a99
Fix vllm api_server v1/models error ( #12867 )
2025-02-21 11:08:29 +08:00
binbin Deng
8077850452
[NPU GGUF] Add simple example ( #12853 )
2025-02-21 09:58:00 +08:00
Wang, Jian4
348dc8056d
Fix vllm gptq awq error ( #12863 )
...
* fix gptq awq error
* fix python style
2025-02-20 16:27:23 +08:00
Guancheng Fu
4eed0c7d99
initial implementation for low_bit_loader vLLM ( #12838 )
...
* initial
* add logic for handling tensor parallel models
* fix
* Add some comments
* add doc
* fix done
2025-02-19 19:45:34 +08:00
Xiangyu Tian
b26409d53f
R1 Hybrid: Add Benchmark for DeepSeek R1 transformers example ( #12854 )
...
* init
* fix
* update
* update
* fix
* fix
2025-02-19 18:33:21 +08:00
Yishuo Wang
aee2db30f9
update sdp support ( #12847 )
2025-02-19 12:07:00 +08:00
Xiangyu Tian
93c10be762
LLM: Support hybrid convert for DeepSeek V3/R1 ( #12834 )
...
LLM: Support hybrid convert for DeepSeek V3/R1
2025-02-19 11:31:19 +08:00
Wang, Jian4
e1809a6295
Update multimodal on vllm 0.6.6 ( #12816 )
...
* add glm4v and minicpmv example
* fix
2025-02-19 10:04:42 +08:00
Xiangyu Tian
09150b6058
Initiate CPU-XPU Hybrid Inference for DeepSeek-R1 ( #12832 )
...
Initiate CPU-XPU Hybrid Inference for DeepSeek-R1 with DeepseekV3Attention
and DeepseekV3MLP to XPU
2025-02-18 13:34:14 +08:00
Xiangyu Tian
09ed96082b
Add DeepSeek V3/R1 CPU example ( #12836 )
...
Add DeepSeek V3/R1 CPU example for bf16 model
2025-02-18 12:45:49 +08:00
Yishuo Wang
8418450300
optimize minicpm-o's tts part ( #12833 )
2025-02-17 14:53:37 +08:00
Wang, Jian4
1083fe5508
Reenable pp and lightweight-serving serving on 0.6.6 ( #12814 )
...
* reenable pp ang lightweight serving on 066
* update readme
* updat
* update tag
2025-02-13 10:16:00 +08:00
Guancheng Fu
af693425f1
Upgrade to vLLM 0.6.6 ( #12796 )
...
* init
* update engine init
* fix serving load_in_low_bit problem
* temp
* temp
* temp
* temp
* temp
* fix
* fixed
* done
* fix
* fix all arguments
* fix
* fix throughput script
* fix
* fix
* use official ipex-llm
* Fix readme
* fix
---------
Co-authored-by: hzjane <a1015616934@qq.com>
2025-02-12 16:47:51 +08:00
Yishuo Wang
f8ab833f74
support and optimize janus pro ( #12813 )
2025-02-12 15:07:24 +08:00
Yishuo Wang
73cfe293fa
add basic support for Baichuan-M1-14B-Instruct ( #12808 )
2025-02-11 17:27:42 +08:00
Xiangyu Tian
b70ad902b4
Fix ipex-llm CPU linear dtype not match ( #12805 )
2025-02-11 10:34:44 +08:00
Yina Chen
eb2df5ed70
common.h -> npu/npu_common.h ( #12800 )
2025-02-10 14:38:22 +08:00
Yishuo Wang
e4ceb722b6
fix qwen2 vl ( #12798 )
2025-02-10 13:25:53 +08:00
binbin Deng
3fee838b14
[NPU] Fix of c++ convert example ( #12797 )
2025-02-10 11:17:58 +08:00
Kai Huang
468d3f22fc
Rename NPU public example to llm-cli ( #12790 )
...
* rename to llm-cli
* update readme
2025-02-08 10:19:59 +08:00
Ruonan Wang
e90a9ad196
[NPU] Support non-const parameter for decoder layers when keep_ir=True ( #12789 )
...
* support layernorm=False for decoder layers
* renbame to meet review
* fix style
* rename to const_parameter
* fix rebase error
* fix rebase error
2025-02-08 09:58:42 +08:00
Yishuo Wang
8aea5319bb
update more lora example ( #12785 )
2025-02-08 09:46:48 +08:00
Yuwen Hu
fd28cf1672
Upgrade ipex-llm[cpp] to oneAPI 2025.0 on Windows ( #12778 )
...
* Upgrade ipex-llm[cpp] to oneAPI 2025.0
* Fit oneapi pypi dependency on Windows for now
2025-02-07 18:29:34 +08:00
binbin Deng
ca1d7b7c2c
[NPU] Support qwen models with cos_sin_input=True ( #12788 )
2025-02-07 16:41:13 +08:00
binbin Deng
6ff7faa781
[NPU] Update deepseek support in python examples and quickstart ( #12786 )
2025-02-07 11:25:16 +08:00
Ruonan Wang
b4f2be2b09
[NPU] Update C++ example to add DeepSeek-R1 ( #12787 )
2025-02-07 11:23:34 +08:00
Yishuo Wang
d0d9c9d636
remove load_in_8bit usage as it is not supported a long time ago ( #12779 )
2025-02-07 11:21:29 +08:00
Yishuo Wang
b4c9e23f73
fix galore and peft finetune example ( #12776 )
2025-02-06 16:36:13 +08:00
Yishuo Wang
c0d6b282b8
fix lisa finetune example ( #12775 )
2025-02-06 16:35:43 +08:00
Yishuo Wang
2e5f2e5dda
fix dpo finetune ( #12774 )
2025-02-06 16:35:21 +08:00
Yishuo Wang
9697197f3e
fix qlora finetune example ( #12769 )
2025-02-06 11:18:28 +08:00
Ruonan Wang
094a25b740
[NPU] Expose parameter to control blob / IR save logic ( #12767 )
...
* update api
* fix convert.py
* fix style
* remove unnecessary bin file
* fix style
2025-02-06 10:07:45 +08:00
Yishuo Wang
0237ffb302
refactor xpu linear forward ( #12768 )
2025-02-05 17:40:38 +08:00
Danciu Georgian
413d6c2b66
Update check.py removing a twice defined function ( #12760 )
...
Remove duplicate function
2025-02-05 11:37:59 +08:00
Yuwen Hu
184adb2653
Small fix to MiniCPM-o-2_6 GPU example ( #12766 )
2025-02-05 11:32:26 +08:00
Shaojun Liu
5fb87d7486
remove ${HF_TOKEN} ( #12742 )
2025-01-26 10:31:42 +08:00
Yuwen Hu
69f13c78b8
[NPU] Update layernorm node on MTL/ARL ( #12738 )
...
* Update layernorm node on MTL/ARL
* Fix on style
2025-01-23 17:25:19 +08:00
Yuwen Hu
d11f257ee7
Add GPU example for MiniCPM-o-2_6 ( #12735 )
...
* Add init example for omni mode
* Small fix
* Small fix
* Add chat example
* Remove lagecy link
* Further update link
* Add readme
* Small fix
* Update main readme link
* Update based on comments
* Small fix
* Small fix
* Small fix
2025-01-23 16:10:19 +08:00
Yuwen Hu
dcca522618
Remove sdpa available patch ( #12734 )
2025-01-22 17:22:28 +08:00
Xiangyu Tian
c9b6c94a59
vLLM: Update vLLM-cpu to v0.6.6-post1 ( #12728 )
...
Update vLLM-cpu to v0.6.6-post1
2025-01-22 15:03:01 +08:00
Ruonan Wang
78cca0a68c
[NPU] update llm-npu-cli example ( #12729 )
...
* update cli example
* add license
* rename
* update readme sample output
2025-01-22 09:59:27 +08:00
Yishuo Wang
6789e5d92f
small fix ( #12727 )
2025-01-21 17:27:18 +08:00
Yishuo Wang
085974e307
fix nf4 to cpu ( #12722 )
2025-01-21 09:23:22 +08:00
Yuwen Hu
9aa4be8ced
Update runtime configuration on MTL ( #12720 )
2025-01-20 11:06:37 +08:00
Yishuo Wang
bda87c21eb
add support and optimization for minicpmo audio part ( #12716 )
2025-01-16 16:39:00 +08:00
Yuwen Hu
534e0e6774
Update dependency for PyTorch 2.6 RC support for woq int4 ( #12714 )
2025-01-16 15:51:57 +08:00
Zhao Changmin
54d6328b3c
woq int4 fwd ( #12711 )
2025-01-16 15:48:05 +08:00
Yishuo Wang
b62734748f
add support and optimization for minicpmo vision part ( #12713 )
2025-01-16 14:51:00 +08:00
Yuwen Hu
c52bdff76b
Update Deepseek coder GPU example ( #12712 )
...
* Update Deepseek coder GPU example
* Fix based on comment
2025-01-16 14:05:31 +08:00