Ruonan Wang
e90a9ad196
[NPU] Support non-const parameter for decoder layers when keep_ir=True ( #12789 )
...
* support layernorm=False for decoder layers
* renbame to meet review
* fix style
* rename to const_parameter
* fix rebase error
* fix rebase error
2025-02-08 09:58:42 +08:00
Yishuo Wang
8aea5319bb
update more lora example ( #12785 )
2025-02-08 09:46:48 +08:00
Yuwen Hu
fd28cf1672
Upgrade ipex-llm[cpp] to oneAPI 2025.0 on Windows ( #12778 )
...
* Upgrade ipex-llm[cpp] to oneAPI 2025.0
* Fit oneapi pypi dependency on Windows for now
2025-02-07 18:29:34 +08:00
binbin Deng
ca1d7b7c2c
[NPU] Support qwen models with cos_sin_input=True ( #12788 )
2025-02-07 16:41:13 +08:00
binbin Deng
6ff7faa781
[NPU] Update deepseek support in python examples and quickstart ( #12786 )
2025-02-07 11:25:16 +08:00
Ruonan Wang
b4f2be2b09
[NPU] Update C++ example to add DeepSeek-R1 ( #12787 )
2025-02-07 11:23:34 +08:00
Yishuo Wang
d0d9c9d636
remove load_in_8bit usage as it is not supported a long time ago ( #12779 )
2025-02-07 11:21:29 +08:00
Yishuo Wang
b4c9e23f73
fix galore and peft finetune example ( #12776 )
2025-02-06 16:36:13 +08:00
Yishuo Wang
c0d6b282b8
fix lisa finetune example ( #12775 )
2025-02-06 16:35:43 +08:00
Yishuo Wang
2e5f2e5dda
fix dpo finetune ( #12774 )
2025-02-06 16:35:21 +08:00
Yishuo Wang
9697197f3e
fix qlora finetune example ( #12769 )
2025-02-06 11:18:28 +08:00
Ruonan Wang
094a25b740
[NPU] Expose parameter to control blob / IR save logic ( #12767 )
...
* update api
* fix convert.py
* fix style
* remove unnecessary bin file
* fix style
2025-02-06 10:07:45 +08:00
Yishuo Wang
0237ffb302
refactor xpu linear forward ( #12768 )
2025-02-05 17:40:38 +08:00
Danciu Georgian
413d6c2b66
Update check.py removing a twice defined function ( #12760 )
...
Remove duplicate function
2025-02-05 11:37:59 +08:00
Yuwen Hu
184adb2653
Small fix to MiniCPM-o-2_6 GPU example ( #12766 )
2025-02-05 11:32:26 +08:00
Shaojun Liu
5fb87d7486
remove ${HF_TOKEN} ( #12742 )
2025-01-26 10:31:42 +08:00
Yuwen Hu
69f13c78b8
[NPU] Update layernorm node on MTL/ARL ( #12738 )
...
* Update layernorm node on MTL/ARL
* Fix on style
2025-01-23 17:25:19 +08:00
Yuwen Hu
d11f257ee7
Add GPU example for MiniCPM-o-2_6 ( #12735 )
...
* Add init example for omni mode
* Small fix
* Small fix
* Add chat example
* Remove lagecy link
* Further update link
* Add readme
* Small fix
* Update main readme link
* Update based on comments
* Small fix
* Small fix
* Small fix
2025-01-23 16:10:19 +08:00
Yuwen Hu
dcca522618
Remove sdpa available patch ( #12734 )
2025-01-22 17:22:28 +08:00
Xiangyu Tian
c9b6c94a59
vLLM: Update vLLM-cpu to v0.6.6-post1 ( #12728 )
...
Update vLLM-cpu to v0.6.6-post1
2025-01-22 15:03:01 +08:00
Ruonan Wang
78cca0a68c
[NPU] update llm-npu-cli example ( #12729 )
...
* update cli example
* add license
* rename
* update readme sample output
2025-01-22 09:59:27 +08:00
Yishuo Wang
6789e5d92f
small fix ( #12727 )
2025-01-21 17:27:18 +08:00
Yishuo Wang
085974e307
fix nf4 to cpu ( #12722 )
2025-01-21 09:23:22 +08:00
Yuwen Hu
9aa4be8ced
Update runtime configuration on MTL ( #12720 )
2025-01-20 11:06:37 +08:00
Yishuo Wang
bda87c21eb
add support and optimization for minicpmo audio part ( #12716 )
2025-01-16 16:39:00 +08:00
Yuwen Hu
534e0e6774
Update dependency for PyTorch 2.6 RC support for woq int4 ( #12714 )
2025-01-16 15:51:57 +08:00
Zhao Changmin
54d6328b3c
woq int4 fwd ( #12711 )
2025-01-16 15:48:05 +08:00
Yishuo Wang
b62734748f
add support and optimization for minicpmo vision part ( #12713 )
2025-01-16 14:51:00 +08:00
Yuwen Hu
c52bdff76b
Update Deepseek coder GPU example ( #12712 )
...
* Update Deepseek coder GPU example
* Fix based on comment
2025-01-16 14:05:31 +08:00
Yuwen Hu
9d65dcd7ef
Fix deepseek coder with linear rope type support on GPU ( #12709 )
...
* Fix deepseek coder with linear rope type
* Style fix
* Move to optimize_pre
* Small fix
* Small fix
* Small fix to not affect other cases
* Style fixes
* Update function name
* Small fix
* Small fix
* Small fix
* Fix for low transformers version first
* Style fix
* Small fix
2025-01-15 21:12:34 +08:00
Cengguang Zhang
9930351112
LLM: add new qtype woq_int4 to support gemm int4 temporary. ( #12706 )
...
This PR add temporary qtype woq_int4 to avoid affecting other qtype and models.
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2025-01-15 14:41:33 +08:00
Xu, Shuo
350fae285d
Add Qwen2-VL HF GPU example with ModelScope Support ( #12606 )
...
* Add qwen2-vl example
* complete generate.py & readme
* improve lint style
* update 1-6
* update main readme
* Format and other small fixes
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2025-01-13 15:42:04 +08:00
Yuwen Hu
a1da7908b9
Fix name device is not found bug ( #12703 )
2025-01-13 10:11:02 +08:00
Yishuo Wang
db9db51e2c
fix lnl perf ( #12700 )
2025-01-10 18:00:58 +08:00
binbin Deng
da8bcb7db1
[NPU ] fix load logic of glm-edge models ( #12698 )
2025-01-10 16:08:37 +08:00
Yishuo Wang
f8dc408888
fix user issue ( #12692 )
2025-01-10 10:18:47 +08:00
Yishuo Wang
68857494a5
refactor to simplify following upgrade 2 ( #12685 )
2025-01-10 09:29:03 +08:00
Yishuo Wang
7234c9b27b
update quantize kv cache condition ( #12681 )
2025-01-09 15:23:04 +08:00
Yuwen Hu
5d8081afbc
Remove dummy model from performance tests ( #12682 )
2025-01-09 14:50:17 +08:00
Yishuo Wang
1ec40cd09e
refactor to simplify following upgrade ( #12680 )
2025-01-09 13:34:30 +08:00
Yishuo Wang
5c24276fc4
fix custom kernel registration ( #12674 )
2025-01-08 17:39:17 +08:00
Yishuo Wang
a22a8c21bb
small fix and remove ununsed code about ipex ( #12671 )
2025-01-08 17:39:04 +08:00
Yishuo Wang
c11f5f0fcd
also convert SdpaAttention in optimize_model ( #12673 )
2025-01-08 16:48:03 +08:00
Yishuo Wang
7dd156d292
small fix and add comment ( #12670 )
2025-01-08 10:56:50 +08:00
Yishuo Wang
ccf618ff4a
Remove all ipex usage ( #12666 )
2025-01-08 10:31:18 +08:00
Yuwen Hu
5db6f9dcde
Add option with PyTorch 2.6 RC version for testing purposes ( #12668 )
...
* Add option with PyTorch 2.6 RC version for testing purposes
* Small update
2025-01-07 18:28:55 +08:00
Yishuo Wang
f9ee7898c8
fix onednn dependency bug ( #12665 )
2025-01-07 16:26:56 +08:00
Yishuo Wang
29ad5c449e
refactor codegeex to remove ipex kernel usage ( #12664 )
2025-01-07 16:17:40 +08:00
Yuwen Hu
525b0ee991
[NPU] Tiny fixes on examples ( #12661 )
2025-01-07 14:30:38 +08:00
Yuwen Hu
ebdf19fa7e
[NPU] Further fix saving of generation config ( #12657 )
...
* Further fix saving of generation config
* Fix based on comments
* Small fix
2025-01-07 13:53:54 +08:00
Yuwen Hu
381d448ee2
[NPU] Example & Quickstart updates ( #12650 )
...
* Remove model with optimize_model=False in NPU verified models tables, and remove related example
* Remove experimental in run optimized model section title
* Unify model table order & example cmd
* Move embedding example to separate folder & update quickstart example link
* Add Quickstart reference in main NPU readme
* Small fix
* Small fix
* Move save/load examples under NPU/HF-Transformers-AutoModels
* Add low-bit and polish arguments for LLM Python examples
* Small fix
* Add low-bit and polish arguments for Multi-Model examples
* Polish argument for Embedding models
* Polish argument for LLM CPP examples
* Add low-bit and polish argument for Save-Load examples
* Add accuracy tuning tips for examples
* Update NPU qucikstart accuracy tuning with low-bit optimizations
* Add save/load section to qucikstart
* Update CPP example sample output to EN
* Add installation regarding cmake for CPP examples
* Small fix
* Small fix
* Small fix
* Small fix
* Small fix
* Small fix
* Unify max prompt length to 512
* Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4
* Update based on comments
* Small fix
2025-01-07 13:52:41 +08:00
Yishuo Wang
ddc0ef3993
refactor device check and remove cohere/mixtral support ( #12659 )
2025-01-07 11:15:51 +08:00
Yishuo Wang
ea65e4fecc
remove falcon support and related UT ( #12656 )
2025-01-07 09:26:00 +08:00
Yina Chen
fae73eee79
[NPU] Support save npu quantized model without npu dependency ( #12647 )
...
* support save awq
* load quantized model & save npu compiled model
* fix style
* update
* fix dll load issue
* update error message
* fix style
2025-01-06 18:06:22 +08:00
Yishuo Wang
502461d836
remove unnecessary ipex kernel usage ( #12649 )
2025-01-03 16:45:24 +08:00
Yishuo Wang
9f8b134889
add ipex-llm custom kernel registration ( #12648 )
2025-01-03 16:45:04 +08:00
binbin Deng
0b377100c5
Add guide for save-load usage ( #12498 )
2025-01-03 16:30:15 +08:00
Wang, Jian4
6711a48a36
Enable internvl2-8b on vllm( #12645 )
2025-01-03 14:49:36 +08:00
Zijie Li
8fd2dcba86
Add benchmark_util for transformers >= 4.47.0 ( #12644 )
2025-01-03 10:48:29 +08:00
Yina Chen
8e5328e9b4
add disable opts for awq ( #12641 )
2025-01-02 15:45:22 +08:00
Xu, Shuo
62318964fa
Update llama example information ( #12640 )
...
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2025-01-02 13:48:39 +08:00
Yishuo Wang
81211fd010
remove unused code ( #12635 )
2025-01-02 13:31:09 +08:00
binbin Deng
534566e290
[NPU] Support minicpm-v with python cpp backend ( #12637 )
2025-01-02 11:13:15 +08:00
Yishuo Wang
f289f68d57
small fix ( #12634 )
2024-12-30 17:14:25 +08:00
Yishuo Wang
2d08155513
remove bmm, which is only required in ipex 2.0 ( #12630 )
2024-12-27 17:28:57 +08:00
binbin Deng
f17ccfa61a
[NPU] Fix save-load usage of minicpm models ( #12628 )
2024-12-27 15:56:46 +08:00
Yishuo Wang
c72a5db757
remove unused code again ( #12624 )
2024-12-27 14:17:11 +08:00
binbin Deng
46eeab4479
[NPU] Fix regression caused by layer_norm change ( #12627 )
2024-12-27 14:08:49 +08:00
Ruonan Wang
90f6709486
[remove pipeline examples ( #12626 )
2024-12-27 13:42:28 +08:00
Zijie Li
5f04ed7254
NPU] Update prompt format for baichuan2-pipeline ( #12625 )
2024-12-27 11:30:54 +08:00
Yishuo Wang
34dbdb8ee3
small fix ( #12623 )
2024-12-27 10:19:27 +08:00
Xu, Shuo
55ce091242
Add GLM4-Edge-V GPU example ( #12596 )
...
* Add GLM4-Edge-V examples
* polish readme
* revert wrong changes
* polish readme
* polish readme
* little polish in reference info and indent
* Small fix and sample output updates
* Update main readme
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-12-27 09:40:29 +08:00
binbin Deng
796ee571a5
[NPU doc] Update verified platforms ( #12621 )
2024-12-26 17:39:13 +08:00
Ruonan Wang
bbdbbb0d88
[NPU] Compatible with other third-party models like auto-round ( #12620 )
...
* support third party model
* simplify code
* fix sty;e
* fix sym int4 GW
* code refactor
* fix
2024-12-26 17:25:18 +08:00
Yishuo Wang
a9abde0b5d
support passing attn_scale to sdpa ( #12619 )
2024-12-26 16:58:09 +08:00
Shaojun Liu
40a7d2b4f0
Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Environments ( #12618 )
...
* run c-eval on multi-GPUs
* Update README.md
2024-12-26 15:23:32 +08:00
Zijie Li
ccc4055058
[NPU] Update prompt format for baichuan2 ( #12615 )
...
* Update baichuan2.py
* style fix
2024-12-26 11:41:37 +08:00
Yishuo Wang
1604b4ead8
small fix ( #12616 )
2024-12-26 11:35:12 +08:00
Ruonan Wang
d841e1dc0d
[NPU] update convert script based on latest usage ( #12617 )
2024-12-26 11:23:04 +08:00
Xu, Shuo
ef585d3360
Polish Readme for ModelScope-related examples ( #12603 )
2024-12-26 10:52:47 +08:00
Yishuo Wang
a596f1ae5f
remove bigdl-llm test to fix langchain UT ( #12613 )
2024-12-26 10:17:25 +08:00
Ruonan Wang
9e895f04ec
[NPU] fix npu save ( #12614 )
...
* fix npu save
* update
2024-12-26 09:21:16 +08:00
Yishuo Wang
6249c1e373
rewrite llama optimization ( #12609 )
2024-12-25 17:04:32 +08:00
Yishuo Wang
5f5ac8a856
fix llama related import ( #12611 )
2024-12-25 16:23:52 +08:00
Yishuo Wang
4e6b9d804f
add compresskv back for mistral ( #12607 )
...
* add compresskv back for mistral
* fix
* fix
2024-12-25 11:06:08 +08:00
Yishuo Wang
4135b895b3
refactor chatglm2, internlm, stablelm and qwen ( #12604 )
2024-12-24 18:18:00 +08:00
Yishuo Wang
073f936c37
refactor mistral and phi3 ( #12605 )
2024-12-24 17:52:32 +08:00
binbin Deng
45f8f72a28
[NPU] Fix minicpm on MTL ( #12599 )
2024-12-24 15:37:56 +08:00
Yishuo Wang
ad2dc965c5
refactor mllama, gpt2 and internvl ( #12602 )
2024-12-24 14:18:31 +08:00
Yishuo Wang
7aaf02f602
refactor baichuan, glm4 and minicpm3 ( #12600 )
2024-12-24 14:16:30 +08:00
Zijie Li
c410d9cf73
[NPU] support asym_int4 for baichuan ( #12576 )
...
* add npu support for baichuan
* Update baichuan_mp.py
* Update baichuan_mp.py
2024-12-24 09:17:50 +08:00
Yishuo Wang
098eb335b2
refactor sd 1.5 and qwen2-vl and fix ( #12590 )
2024-12-20 17:34:55 +08:00
Yishuo Wang
b050368efc
refactor yuan2 and starcoder2 and fix ( #12589 )
2024-12-20 16:41:50 +08:00
Yishuo Wang
6ea8033635
refactor glm edge ( #12588 )
2024-12-20 15:36:57 +08:00
Xu, Shuo
b0338c5529
Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internvl2 ( #12583 )
...
* Add --modelscope option for glm-v4 and MiniCPM-V-2_6
* glm-edge
* minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-20 13:54:17 +08:00
Yishuo Wang
f3b5fad3be
refactor qwen2 and llama3 ( #12587 )
2024-12-20 13:25:25 +08:00
Xu, Shuo
47da3c999f
Add --modelscope in GPU examples for minicpm, minicpm3, baichuan2 ( #12564 )
...
* Add --modelscope for more models
* minicpm
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-19 17:25:46 +08:00
Yishuo Wang
3eeb02f1be
support Megrez-3B-Omni ( #12582 )
2024-12-19 17:23:01 +08:00
binbin Deng
4e7e988f70
[NPU] Fix MTL and ARL support ( #12580 )
2024-12-19 16:55:30 +08:00
Yishuo Wang
80f2fdc37b
optimize new minicpm model ( #12579 )
2024-12-19 14:22:47 +08:00