Commit graph

658 commits

Author SHA1 Message Date
Yishuo Wang
a9abde0b5d
support passing attn_scale to sdpa (#12619) 2024-12-26 16:58:09 +08:00
Yishuo Wang
1604b4ead8
small fix (#12616) 2024-12-26 11:35:12 +08:00
Ruonan Wang
9e895f04ec
[NPU] fix npu save (#12614)
* fix npu save

* update
2024-12-26 09:21:16 +08:00
Yishuo Wang
6249c1e373
rewrite llama optimization (#12609) 2024-12-25 17:04:32 +08:00
Yishuo Wang
5f5ac8a856
fix llama related import (#12611) 2024-12-25 16:23:52 +08:00
Yishuo Wang
4e6b9d804f
add compresskv back for mistral (#12607)
* add compresskv back for mistral

* fix

* fix
2024-12-25 11:06:08 +08:00
Yishuo Wang
4135b895b3
refactor chatglm2, internlm, stablelm and qwen (#12604) 2024-12-24 18:18:00 +08:00
Yishuo Wang
073f936c37
refactor mistral and phi3 (#12605) 2024-12-24 17:52:32 +08:00
binbin Deng
45f8f72a28
[NPU] Fix minicpm on MTL (#12599) 2024-12-24 15:37:56 +08:00
Yishuo Wang
ad2dc965c5
refactor mllama, gpt2 and internvl (#12602) 2024-12-24 14:18:31 +08:00
Yishuo Wang
7aaf02f602
refactor baichuan, glm4 and minicpm3 (#12600) 2024-12-24 14:16:30 +08:00
Zijie Li
c410d9cf73
[NPU] support asym_int4 for baichuan (#12576)
* add npu support for baichuan

* Update baichuan_mp.py

* Update baichuan_mp.py
2024-12-24 09:17:50 +08:00
Yishuo Wang
098eb335b2
refactor sd 1.5 and qwen2-vl and fix (#12590) 2024-12-20 17:34:55 +08:00
Yishuo Wang
b050368efc
refactor yuan2 and starcoder2 and fix (#12589) 2024-12-20 16:41:50 +08:00
Yishuo Wang
6ea8033635
refactor glm edge (#12588) 2024-12-20 15:36:57 +08:00
Yishuo Wang
f3b5fad3be
refactor qwen2 and llama3 (#12587) 2024-12-20 13:25:25 +08:00
Yishuo Wang
3eeb02f1be
support Megrez-3B-Omni (#12582) 2024-12-19 17:23:01 +08:00
binbin Deng
4e7e988f70
[NPU] Fix MTL and ARL support (#12580) 2024-12-19 16:55:30 +08:00
Yishuo Wang
80f2fdc37b
optimize new minicpm model (#12579) 2024-12-19 14:22:47 +08:00
Yishuo Wang
4540424271
optimize siglip attention again (#12578) 2024-12-19 13:40:48 +08:00
Yishuo Wang
e0921f80c1
padding mask on torch side (#12577) 2024-12-19 10:53:02 +08:00
Yishuo Wang
e2ae42929a
small fix (#12573) 2024-12-18 15:48:22 +08:00
Yishuo Wang
a4eb561f36
optimize siglip attention on arc (#12569) 2024-12-18 14:19:43 +08:00
Zijie Li
1a2ab12876
[NPU] support asym_int4 for minicpm (#12567) 2024-12-18 10:55:35 +08:00
Zijie Li
fcb474820d
[NPU] support asym_int4 for llama (#12556)
* add llama-imatrix

* fix bugs in llama.py

* style fix
2024-12-17 14:01:17 +08:00
Yishuo Wang
a608f26cc8
use new fused layer norm (#12553) 2024-12-17 13:52:35 +08:00
Yishuo Wang
5ae0006103
remove old rope usage (#12552) 2024-12-16 15:59:36 +08:00
binbin Deng
caf15cc5ef
[NPU] Add IPEX_LLM_NPU_MTL to enable support on mtl (#12543) 2024-12-13 17:01:13 +08:00
Yishuo Wang
c090d167dc
remove old rope usage (#12544) 2024-12-13 16:54:58 +08:00
Yishuo Wang
15219944b8
optimize glm edge again (#12539) 2024-12-13 13:52:39 +08:00
binbin Deng
6596c18489
[NPU] Modify IPEX_LLM_NPU_DISABLE_COMPILE_OPT setting for long input (#12537) 2024-12-13 13:49:56 +08:00
Ruonan Wang
7cc01fdc86
[NPU] further fix of new_value_states (#12538) 2024-12-13 13:42:00 +08:00
binbin Deng
f36c23664f
[NPU] Fix abnormal output with latest driver (#12530) 2024-12-12 17:56:30 +08:00
Yishuo Wang
ffce86d69f
add basic glm-edge-v support (#12533) 2024-12-12 17:25:48 +08:00
Yishuo Wang
3e0823d2ae
add basic glm-edge support (#12531) 2024-12-12 16:02:22 +08:00
Yuwen Hu
dbaf4abcb3
[NPU] Update C++ example with repetition_penalty & update Python code accordingly (#12528)
* Update c++ npu examples with repetition penalty

* Fit python with updated C++ API

* Style fix

* Small fix

* Small fix
2024-12-12 13:42:55 +08:00
Shaojun Liu
2cce89691a
Enable use_batch_forward Optimization on Battlemage GPU (#12516)
* Update get_xpu_device_type() to support bmg

* enable use_batch_forward for bmg

* Update low_bit_linear.py

* Update utils.py

* use batch kernel for fp8e5
2024-12-12 12:44:36 +08:00
binbin Deng
509bdb4661
[NPU] Fix minicpm-2B error (#12527) 2024-12-11 16:49:32 +08:00
Ruonan Wang
41ef4974ab
[NPU] fix transpose_value = False for NPU optimize_model=True (#12525) 2024-12-11 15:51:39 +08:00
Ruonan Wang
588bfa24dc
support hqq (#12518)
* support

* fix
2024-12-11 15:43:02 +08:00
Yuwen Hu
68f2873bd3
[NPU] Support repetition penalty for simple generate, Python (cpp backend) (#12522)
* Initial support of repetition penalty on NPU (cpp backend) for simple generate

* Bug fix for generation config and others

* Remove unnecessary print and style fix

* Remove unnecessary print

* Fix based on comments
2024-12-11 14:55:25 +08:00
Yishuo Wang
77404d2a63
support new model (#12523) 2024-12-11 13:41:15 +08:00
binbin Deng
ea55235cbd
[NPU] Support glm-edge models (#12511) 2024-12-09 14:06:27 +08:00
Yuwen Hu
0918d3baca
[NPU] Fix hf generate with save/load generation config for Python (cpp backend) (#12509)
* Fix hf generate with save/load generation config

* Small fix

* Fix based on comments
2024-12-05 19:19:58 +08:00
Ruonan Wang
49ab8974fa
[NPU] initial support of asym_int4_rtn (#12484)
* initiail support of q4_1

* fix

* fix

* update

* update min to Z1

* update

* fix

* update

* fix style

* fix

* support qwen2 optimize_model=True mp version

* temp save

* fix

* fix style

* replace min with zero

* support split linear for q4_1

* fix lm_head with mixed_precision=True

* fix style

* revert test code

* add down proj back for q4_0

* remove print
2024-12-05 17:40:36 +08:00
Yuwen Hu
84f1c4ad57
Small fix for NPU Python cpp simple generate regarding eos tokens (#12501) 2024-12-04 18:54:06 +08:00
Kai Huang
d8b14a6305
Update save/load comments (#12500) 2024-12-04 18:51:38 +08:00
Kai Huang
b89ea1b0cf
Support save/load model for hf generate (#12499)
* change dummy model

* style

* meet review
2024-12-04 18:26:39 +08:00
Kai Huang
7d27f134dd
Fix hf generate for llama3.2 (#12497)
* fix kv condition]

* meet review
2024-12-04 17:54:40 +08:00
Yishuo Wang
a9e3f7f14c
optimize minicpm (#12496) 2024-12-04 17:14:16 +08:00