binbin Deng
d409d9d0eb
[NPU L0] Update streaming mode of example ( #12312 )
2024-11-01 15:38:10 +08:00
Jin, Qiao
126f95be80
Fix DPO finetuning example ( #12313 )
2024-11-01 13:29:44 +08:00
Yina Chen
05c5d0267a
[NPU] Llama2 prefill use ov sdp ( #12310 )
...
* prefill use sdp
* add param
* update
* fix style
* fix style
* meet comments
2024-11-01 11:05:20 +08:00
binbin Deng
eda764909c
Add minicpm-2b in L0 pipeline ( #12308 )
2024-11-01 09:30:01 +08:00
Yishuo Wang
b9853f98b3
fix qwen2 attention_mask slice ( #12307 )
2024-10-31 17:00:05 +08:00
Jin, Qiao
3df6195cb0
Fix application quickstart ( #12305 )
...
* fix graphrag quickstart
* fix axolotl quickstart
* fix ragflow quickstart
* fix ragflow quickstart
* fix graphrag toc
* fix comments
* fix comment
* fix comments
2024-10-31 16:57:35 +08:00
binbin Deng
4892df61c9
Add qwen2-1.5b in l0 pipeline example ( #12306 )
2024-10-31 16:44:25 +08:00
Jinhe
30f668c206
updated transformers & accelerate requirements ( #12301 )
2024-10-31 15:59:40 +08:00
Xin Qiu
97a0f7fd35
Codegeex support ( #12303 )
...
* new codegeex attn
* use kv cache
* add compress/quantize kv
* remove compress/quantize kv
* fix style check
* fix style
* fix codegeex
2024-10-31 15:28:56 +08:00
Yishuo Wang
72605c7016
fix llama3.1/3.2 quantize kv check ( #12302 )
2024-10-31 11:55:07 +08:00
Kai Huang
416c19165c
Add Qwen pipeline and example ( #12292 )
...
* support qwen pipeline
* update error msg
* style
* meet review
* minor
2024-10-31 11:25:25 +08:00
Rahul Nair
4cf1ccc43a
Update DPO EADME.md ( #12162 )
...
bitsanbytes multi backend is now available and is required , otherwise would error out saying that no cuda is available
2024-10-31 10:56:46 +08:00
Chu,Youcheng
29400e2e75
feat: change oneccl to internal ( #12296 )
...
* feat: change oneccl
* fix: restore llama-70b
* fix: remove tab
* fix: remove extra blank
* small fix
* add comments
* fix: add a blank space
2024-10-31 09:51:43 +08:00
Zijie Li
6f22133efc
Update AWQ and GPTQ GPU example ( #12300 )
2024-10-31 09:35:31 +08:00
Yina Chen
0763268e4c
[NPU]Qwen2 groupwise performance opt ( #12299 )
...
* qwen2 gw performance opt
* remove debug
2024-10-30 17:40:21 +08:00
binbin Deng
41b8064554
Support minicpm-1B in level0 pipeline ( #12297 )
2024-10-30 17:21:47 +08:00
Jinhe
46d8300f6b
bugfix for qlora finetuning on GPU ( #12298 )
...
* bugfix for qlora 100 step error
* indent fix
* annotation fix
2024-10-30 16:54:10 +08:00
Yina Chen
70037ad55f
Groupwise prefill optimization ( #12291 )
...
* except lm_head
* remove
* support gw lm_head
* update
* fix
* remove run.bat
* fix style
* support llama3
* slice -> split
* remove debug
* fix style
* add dpu
2024-10-30 14:59:45 +08:00
Yishuo Wang
540eaeb12c
refactor attention_softmax ( #12295 )
2024-10-30 13:20:50 +08:00
Ruonan Wang
2b2cb9c693
[NPU pipeline] Support save & load and update examples ( #12293 )
...
* support save & load, update llama examples
* update baichuan2 example
* update readme
2024-10-30 10:02:00 +08:00
Yuwen Hu
5a15098835
Initial support for quantized forward on CPU when quantization_group_size=0 ( #12282 )
...
* Initial support for quantized forward on CPU when quantization_group_size=0
* Style fix
* Style fix
* Small fix
* Small fix
2024-10-29 19:40:17 +08:00
binbin Deng
3feb58d1e4
Support baichuan2 for level0 pipeline ( #12289 )
2024-10-29 19:24:16 +08:00
Zhao Changmin
546f455e8e
Patch sdpa check function in specific module attributes table ( #12285 )
2024-10-29 18:41:09 +08:00
Ruonan Wang
821b0033ed
[NPU L0] update layernorm & code refactor ( #12287 )
...
* update layernorm & code refactor
* fix style
* add common utils
* change to Pool()
* remove print
2024-10-29 15:01:45 +08:00
Yina Chen
4467645088
[NPU] Support l0 Llama groupwise ( #12276 )
...
* except lm_head
* remove
* support gw lm_head
* update
* fix
* remove run.bat
* fix style
* support llama3
2024-10-28 17:06:55 +08:00
Ruonan Wang
3fe2ea3081
[NPU] Reuse prefill of acc lib for pipeline ( #12279 )
...
* first commit
* update example
* fix style
* update example
* embedding as const
* fix generate
* code refactor
* meet code review
* fix style
* change max_output_len to max_context_len
* fix all-in-one
* fix example
* add check for new tokens
2024-10-28 16:05:49 +08:00
binbin Deng
ec362e6133
Add llama3 level0 example ( #12275 )
2024-10-28 09:24:51 +08:00
SONG Ge
08cb065370
hot-fix redundant import funasr ( #12277 )
2024-10-25 19:40:39 +08:00
SONG Ge
a0c6432899
[NPU] Add support for loading a FunASR model ( #12073 )
...
* add support for loading funasr model
* add initial support for paraformer-encoder
* add npu ops impl
* add encoder-decoder npu pipeline
* move paraformer encoders prefix 30 layers to npu and keep the rest layers on cpu
2024-10-25 17:22:01 +08:00
Ruonan Wang
854398f6e0
update example to reduce peak memory usage ( #12274 )
2024-10-25 17:09:26 +08:00
Yuwen Hu
e713296090
Update all-in-one benchmark ( #12272 )
...
* Update all-in-one benchmark
* Small fix
* Small fix
* Small fix
2024-10-25 16:52:59 +08:00
Yuwen Hu
43b25a2fe7
Fix llama 3.2 vision on LNL ( #12264 )
...
* Fix llama 3.2 vision on LNL
* Small fix
2024-10-25 16:23:31 +08:00
Yuwen Hu
93895b2ac2
Openvino all in one benchmark small fix ( #12269 )
...
* Small update for all-in-one benchmark readme to support OpenVINO tests
* Small fix
2024-10-25 14:13:52 +08:00
Zijie Li
f7f62a3fef
Add OpenVINO performance tests to all-in-one benchmark ( #12238 )
...
* add-openvino-to-all-in-one
* update on openvino API
* Update save_openvino.py
* Update save_openvino.py
* Update save_openvino.py
* update on run.py and save_openvino
* update references
* Create openvino-requirements.txt
* fix on comments
* Small updates
* Small fix
* Fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-10-25 13:53:53 +08:00
Ruonan Wang
ae57e23e4f
fix incompatibility between llama GW & llama pipeline ( #12267 )
...
* fix
* fix
2024-10-25 10:31:44 +08:00
Yina Chen
b5e663854b
[NPU] Support llama groupwise ( #12260 )
...
* support llama gw
* support llama gw lm_head
* fix style
* remove unused code
2024-10-24 18:06:45 +08:00
Xin Qiu
39c9d1de52
fix code geex ( #12261 )
2024-10-24 14:34:01 +08:00
Yishuo Wang
f3a2b20e6b
Optimize gpt2 ( #12259 )
2024-10-24 13:44:24 +08:00
Ruonan Wang
821fd96367
Initial integrate our L0 Llama impl into ipex-llm ( #12255 )
...
* temp save
* initial support
* fix
* simplify code
* fix style
* fix example
* make default value of pipeline as False
2024-10-24 09:49:27 +08:00
Yishuo Wang
cacc891962
Fix PR validation ( #12253 )
2024-10-23 18:10:47 +08:00
binbin Deng
b685cf4349
Fix npu group size setting of optimize_model=False ( #12256 )
2024-10-23 17:53:54 +08:00
binbin Deng
567b77a76b
Support IR and blob format for llama level0 pipeline ( #12251 )
2024-10-23 16:02:35 +08:00
Yishuo Wang
578aef245d
Fix models auto choose SdpaAttention with ipex 2.3 ( #12252 )
2024-10-23 15:33:45 +08:00
Yishuo Wang
88dc120a4c
fix fp16 linear ( #12250 )
2024-10-23 14:35:19 +08:00
Yina Chen
e8cf7f32f5
npu gw small fix ( #12249 )
2024-10-23 14:26:01 +08:00
Shaojun Liu
aae2490cb8
fix UT ( #12247 )
...
* fix ut
* Update test_transformers_api_attention.py
* Update test_transformers_api_mlp.py
2024-10-23 14:13:06 +08:00
Yina Chen
e37f951cce
[NPU] Groupwise ( #12241 )
...
* dq divide
* fix
* support attn divide
* update qwen2 7b
* divide down_proj & other linear
* use concat & reduce sum
* support scale after
* support qwen2
* w/ mm
* update reshape
* spda
* split
* split 2+
* update
* lm head-> 28
* no scale
* update
* update
* update
* fix style
* fix style
* to split linear
* update
* update code
* address comments
* fix style & remove redundant code & revert benchmark scripts
* fix style & remove code
* update save & load
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com>
2024-10-23 14:10:58 +08:00
Jin, Qiao
8fa98e2742
Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" ( #12245 )
...
* Remove qwen2-7b from npu example readme
* fix
2024-10-22 17:07:51 +08:00
Yina Chen
ec465fbcd7
Add lookup generate in load_low_bit ( #12243 )
...
* add lookup generate in load_low_bit
* update comment
2024-10-22 15:51:52 +08:00
Yuwen Hu
b3df47486d
Fix Gemma 2 on LNL ( #12240 )
...
* Fix gemma 2 on LNL
* Python style fix
2024-10-21 18:25:53 +08:00