Yishuo Wang
|
29ad5c449e
|
refactor codegeex to remove ipex kernel usage (#12664)
|
2025-01-07 16:17:40 +08:00 |
|
Yuwen Hu
|
ebdf19fa7e
|
[NPU] Further fix saving of generation config (#12657)
* Further fix saving of generation config
* Fix based on comments
* Small fix
|
2025-01-07 13:53:54 +08:00 |
|
Yishuo Wang
|
ddc0ef3993
|
refactor device check and remove cohere/mixtral support (#12659)
|
2025-01-07 11:15:51 +08:00 |
|
Yishuo Wang
|
ea65e4fecc
|
remove falcon support and related UT (#12656)
|
2025-01-07 09:26:00 +08:00 |
|
Yina Chen
|
fae73eee79
|
[NPU] Support save npu quantized model without npu dependency (#12647)
* support save awq
* load quantized model & save npu compiled model
* fix style
* update
* fix dll load issue
* update error message
* fix style
|
2025-01-06 18:06:22 +08:00 |
|
Yishuo Wang
|
502461d836
|
remove unnecessary ipex kernel usage (#12649)
|
2025-01-03 16:45:24 +08:00 |
|
Yishuo Wang
|
9f8b134889
|
add ipex-llm custom kernel registration (#12648)
|
2025-01-03 16:45:04 +08:00 |
|
Wang, Jian4
|
6711a48a36
|
Enable internvl2-8b on vllm(#12645)
|
2025-01-03 14:49:36 +08:00 |
|
Zijie Li
|
8fd2dcba86
|
Add benchmark_util for transformers >= 4.47.0 (#12644)
|
2025-01-03 10:48:29 +08:00 |
|
Yina Chen
|
8e5328e9b4
|
add disable opts for awq (#12641)
|
2025-01-02 15:45:22 +08:00 |
|
Yishuo Wang
|
81211fd010
|
remove unused code (#12635)
|
2025-01-02 13:31:09 +08:00 |
|
binbin Deng
|
534566e290
|
[NPU] Support minicpm-v with python cpp backend (#12637)
|
2025-01-02 11:13:15 +08:00 |
|
Yishuo Wang
|
f289f68d57
|
small fix (#12634)
|
2024-12-30 17:14:25 +08:00 |
|
Yishuo Wang
|
2d08155513
|
remove bmm, which is only required in ipex 2.0 (#12630)
|
2024-12-27 17:28:57 +08:00 |
|
binbin Deng
|
f17ccfa61a
|
[NPU] Fix save-load usage of minicpm models (#12628)
|
2024-12-27 15:56:46 +08:00 |
|
Yishuo Wang
|
c72a5db757
|
remove unused code again (#12624)
|
2024-12-27 14:17:11 +08:00 |
|
binbin Deng
|
46eeab4479
|
[NPU] Fix regression caused by layer_norm change (#12627)
|
2024-12-27 14:08:49 +08:00 |
|
Yishuo Wang
|
34dbdb8ee3
|
small fix (#12623)
|
2024-12-27 10:19:27 +08:00 |
|
Ruonan Wang
|
bbdbbb0d88
|
[NPU] Compatible with other third-party models like auto-round (#12620)
* support third party model
* simplify code
* fix sty;e
* fix sym int4 GW
* code refactor
* fix
|
2024-12-26 17:25:18 +08:00 |
|
Yishuo Wang
|
a9abde0b5d
|
support passing attn_scale to sdpa (#12619)
|
2024-12-26 16:58:09 +08:00 |
|
Yishuo Wang
|
1604b4ead8
|
small fix (#12616)
|
2024-12-26 11:35:12 +08:00 |
|
Ruonan Wang
|
9e895f04ec
|
[NPU] fix npu save (#12614)
* fix npu save
* update
|
2024-12-26 09:21:16 +08:00 |
|
Yishuo Wang
|
6249c1e373
|
rewrite llama optimization (#12609)
|
2024-12-25 17:04:32 +08:00 |
|
Yishuo Wang
|
5f5ac8a856
|
fix llama related import (#12611)
|
2024-12-25 16:23:52 +08:00 |
|
Yishuo Wang
|
4e6b9d804f
|
add compresskv back for mistral (#12607)
* add compresskv back for mistral
* fix
* fix
|
2024-12-25 11:06:08 +08:00 |
|
Yishuo Wang
|
4135b895b3
|
refactor chatglm2, internlm, stablelm and qwen (#12604)
|
2024-12-24 18:18:00 +08:00 |
|
Yishuo Wang
|
073f936c37
|
refactor mistral and phi3 (#12605)
|
2024-12-24 17:52:32 +08:00 |
|
binbin Deng
|
45f8f72a28
|
[NPU] Fix minicpm on MTL (#12599)
|
2024-12-24 15:37:56 +08:00 |
|
Yishuo Wang
|
ad2dc965c5
|
refactor mllama, gpt2 and internvl (#12602)
|
2024-12-24 14:18:31 +08:00 |
|
Yishuo Wang
|
7aaf02f602
|
refactor baichuan, glm4 and minicpm3 (#12600)
|
2024-12-24 14:16:30 +08:00 |
|
Zijie Li
|
c410d9cf73
|
[NPU] support asym_int4 for baichuan (#12576)
* add npu support for baichuan
* Update baichuan_mp.py
* Update baichuan_mp.py
|
2024-12-24 09:17:50 +08:00 |
|
Yishuo Wang
|
098eb335b2
|
refactor sd 1.5 and qwen2-vl and fix (#12590)
|
2024-12-20 17:34:55 +08:00 |
|
Yishuo Wang
|
b050368efc
|
refactor yuan2 and starcoder2 and fix (#12589)
|
2024-12-20 16:41:50 +08:00 |
|
Yishuo Wang
|
6ea8033635
|
refactor glm edge (#12588)
|
2024-12-20 15:36:57 +08:00 |
|
Yishuo Wang
|
f3b5fad3be
|
refactor qwen2 and llama3 (#12587)
|
2024-12-20 13:25:25 +08:00 |
|
Yishuo Wang
|
3eeb02f1be
|
support Megrez-3B-Omni (#12582)
|
2024-12-19 17:23:01 +08:00 |
|
binbin Deng
|
4e7e988f70
|
[NPU] Fix MTL and ARL support (#12580)
|
2024-12-19 16:55:30 +08:00 |
|
Yishuo Wang
|
80f2fdc37b
|
optimize new minicpm model (#12579)
|
2024-12-19 14:22:47 +08:00 |
|
Yishuo Wang
|
4540424271
|
optimize siglip attention again (#12578)
|
2024-12-19 13:40:48 +08:00 |
|
Yishuo Wang
|
e0921f80c1
|
padding mask on torch side (#12577)
|
2024-12-19 10:53:02 +08:00 |
|
Yishuo Wang
|
e2ae42929a
|
small fix (#12573)
|
2024-12-18 15:48:22 +08:00 |
|
Yishuo Wang
|
a4eb561f36
|
optimize siglip attention on arc (#12569)
|
2024-12-18 14:19:43 +08:00 |
|
Zijie Li
|
1a2ab12876
|
[NPU] support asym_int4 for minicpm (#12567)
|
2024-12-18 10:55:35 +08:00 |
|
Zijie Li
|
fcb474820d
|
[NPU] support asym_int4 for llama (#12556)
* add llama-imatrix
* fix bugs in llama.py
* style fix
|
2024-12-17 14:01:17 +08:00 |
|
Yishuo Wang
|
a608f26cc8
|
use new fused layer norm (#12553)
|
2024-12-17 13:52:35 +08:00 |
|
Yishuo Wang
|
5ae0006103
|
remove old rope usage (#12552)
|
2024-12-16 15:59:36 +08:00 |
|
binbin Deng
|
caf15cc5ef
|
[NPU] Add IPEX_LLM_NPU_MTL to enable support on mtl (#12543)
|
2024-12-13 17:01:13 +08:00 |
|
Yishuo Wang
|
c090d167dc
|
remove old rope usage (#12544)
|
2024-12-13 16:54:58 +08:00 |
|
Yishuo Wang
|
15219944b8
|
optimize glm edge again (#12539)
|
2024-12-13 13:52:39 +08:00 |
|
binbin Deng
|
6596c18489
|
[NPU] Modify IPEX_LLM_NPU_DISABLE_COMPILE_OPT setting for long input (#12537)
|
2024-12-13 13:49:56 +08:00 |
|