Commit graph

1207 commits

Author SHA1 Message Date
Yishuo Wang
68857494a5
refactor to simplify following upgrade 2 (#12685) 2025-01-10 09:29:03 +08:00
Yishuo Wang
7234c9b27b
update quantize kv cache condition (#12681) 2025-01-09 15:23:04 +08:00
Yishuo Wang
1ec40cd09e
refactor to simplify following upgrade (#12680) 2025-01-09 13:34:30 +08:00
Yishuo Wang
5c24276fc4
fix custom kernel registration (#12674) 2025-01-08 17:39:17 +08:00
Yishuo Wang
a22a8c21bb
small fix and remove ununsed code about ipex (#12671) 2025-01-08 17:39:04 +08:00
Yishuo Wang
c11f5f0fcd
also convert SdpaAttention in optimize_model (#12673) 2025-01-08 16:48:03 +08:00
Yishuo Wang
7dd156d292
small fix and add comment (#12670) 2025-01-08 10:56:50 +08:00
Yishuo Wang
ccf618ff4a
Remove all ipex usage (#12666) 2025-01-08 10:31:18 +08:00
Yishuo Wang
29ad5c449e
refactor codegeex to remove ipex kernel usage (#12664) 2025-01-07 16:17:40 +08:00
Yuwen Hu
ebdf19fa7e
[NPU] Further fix saving of generation config (#12657)
* Further fix saving of generation config

* Fix based on comments

* Small fix
2025-01-07 13:53:54 +08:00
Yishuo Wang
ddc0ef3993
refactor device check and remove cohere/mixtral support (#12659) 2025-01-07 11:15:51 +08:00
Yishuo Wang
ea65e4fecc
remove falcon support and related UT (#12656) 2025-01-07 09:26:00 +08:00
Yina Chen
fae73eee79
[NPU] Support save npu quantized model without npu dependency (#12647)
* support save awq

* load quantized model & save npu compiled model

* fix style

* update

* fix dll load issue

* update error message

* fix style
2025-01-06 18:06:22 +08:00
Yishuo Wang
502461d836
remove unnecessary ipex kernel usage (#12649) 2025-01-03 16:45:24 +08:00
Yishuo Wang
9f8b134889
add ipex-llm custom kernel registration (#12648) 2025-01-03 16:45:04 +08:00
Wang, Jian4
6711a48a36
Enable internvl2-8b on vllm(#12645) 2025-01-03 14:49:36 +08:00
Zijie Li
8fd2dcba86
Add benchmark_util for transformers >= 4.47.0 (#12644) 2025-01-03 10:48:29 +08:00
Yina Chen
8e5328e9b4
add disable opts for awq (#12641) 2025-01-02 15:45:22 +08:00
Yishuo Wang
81211fd010
remove unused code (#12635) 2025-01-02 13:31:09 +08:00
binbin Deng
534566e290
[NPU] Support minicpm-v with python cpp backend (#12637) 2025-01-02 11:13:15 +08:00
Yishuo Wang
f289f68d57
small fix (#12634) 2024-12-30 17:14:25 +08:00
Yishuo Wang
2d08155513
remove bmm, which is only required in ipex 2.0 (#12630) 2024-12-27 17:28:57 +08:00
binbin Deng
f17ccfa61a
[NPU] Fix save-load usage of minicpm models (#12628) 2024-12-27 15:56:46 +08:00
Yishuo Wang
c72a5db757
remove unused code again (#12624) 2024-12-27 14:17:11 +08:00
binbin Deng
46eeab4479
[NPU] Fix regression caused by layer_norm change (#12627) 2024-12-27 14:08:49 +08:00
Yishuo Wang
34dbdb8ee3
small fix (#12623) 2024-12-27 10:19:27 +08:00
Ruonan Wang
bbdbbb0d88
[NPU] Compatible with other third-party models like auto-round (#12620)
* support third party model

* simplify code

* fix sty;e

* fix sym int4 GW

* code refactor

* fix
2024-12-26 17:25:18 +08:00
Yishuo Wang
a9abde0b5d
support passing attn_scale to sdpa (#12619) 2024-12-26 16:58:09 +08:00
Yishuo Wang
1604b4ead8
small fix (#12616) 2024-12-26 11:35:12 +08:00
Ruonan Wang
9e895f04ec
[NPU] fix npu save (#12614)
* fix npu save

* update
2024-12-26 09:21:16 +08:00
Yishuo Wang
6249c1e373
rewrite llama optimization (#12609) 2024-12-25 17:04:32 +08:00
Yishuo Wang
5f5ac8a856
fix llama related import (#12611) 2024-12-25 16:23:52 +08:00
Yishuo Wang
4e6b9d804f
add compresskv back for mistral (#12607)
* add compresskv back for mistral

* fix

* fix
2024-12-25 11:06:08 +08:00
Yishuo Wang
4135b895b3
refactor chatglm2, internlm, stablelm and qwen (#12604) 2024-12-24 18:18:00 +08:00
Yishuo Wang
073f936c37
refactor mistral and phi3 (#12605) 2024-12-24 17:52:32 +08:00
binbin Deng
45f8f72a28
[NPU] Fix minicpm on MTL (#12599) 2024-12-24 15:37:56 +08:00
Yishuo Wang
ad2dc965c5
refactor mllama, gpt2 and internvl (#12602) 2024-12-24 14:18:31 +08:00
Yishuo Wang
7aaf02f602
refactor baichuan, glm4 and minicpm3 (#12600) 2024-12-24 14:16:30 +08:00
Zijie Li
c410d9cf73
[NPU] support asym_int4 for baichuan (#12576)
* add npu support for baichuan

* Update baichuan_mp.py

* Update baichuan_mp.py
2024-12-24 09:17:50 +08:00
Yishuo Wang
098eb335b2
refactor sd 1.5 and qwen2-vl and fix (#12590) 2024-12-20 17:34:55 +08:00
Yishuo Wang
b050368efc
refactor yuan2 and starcoder2 and fix (#12589) 2024-12-20 16:41:50 +08:00
Yishuo Wang
6ea8033635
refactor glm edge (#12588) 2024-12-20 15:36:57 +08:00
Yishuo Wang
f3b5fad3be
refactor qwen2 and llama3 (#12587) 2024-12-20 13:25:25 +08:00
Yishuo Wang
3eeb02f1be
support Megrez-3B-Omni (#12582) 2024-12-19 17:23:01 +08:00
binbin Deng
4e7e988f70
[NPU] Fix MTL and ARL support (#12580) 2024-12-19 16:55:30 +08:00
Yishuo Wang
80f2fdc37b
optimize new minicpm model (#12579) 2024-12-19 14:22:47 +08:00
Yishuo Wang
4540424271
optimize siglip attention again (#12578) 2024-12-19 13:40:48 +08:00
Yishuo Wang
e0921f80c1
padding mask on torch side (#12577) 2024-12-19 10:53:02 +08:00
Yishuo Wang
e2ae42929a
small fix (#12573) 2024-12-18 15:48:22 +08:00
Yishuo Wang
a4eb561f36
optimize siglip attention on arc (#12569) 2024-12-18 14:19:43 +08:00