Commit graph

17 commits

Author SHA1 Message Date
binbin Deng
812d5cc32e
[NPU L0] Support llama3.2 in L0 pipeline (#12361) 2024-11-08 10:01:23 +08:00
Yina Chen
d872639395
[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327)
* support minicpm 1b & qwen 1.5b gw

* support minicpm 1b

* support minicpm 2b

* fix style & error

* fix style & update

* remove print
2024-11-05 15:51:31 +08:00
Kai Huang
c8679ad592
Qwen layernorm as input (#12309)
* qwen layernorm as input

* add group size
2024-11-04 09:51:15 +08:00
binbin Deng
d409d9d0eb
[NPU L0] Update streaming mode of example (#12312) 2024-11-01 15:38:10 +08:00
binbin Deng
eda764909c
Add minicpm-2b in L0 pipeline (#12308) 2024-11-01 09:30:01 +08:00
binbin Deng
4892df61c9
Add qwen2-1.5b in l0 pipeline example (#12306) 2024-10-31 16:44:25 +08:00
Kai Huang
416c19165c
Add Qwen pipeline and example (#12292)
* support qwen pipeline

* update error msg

* style

* meet review

* minor
2024-10-31 11:25:25 +08:00
binbin Deng
41b8064554
Support minicpm-1B in level0 pipeline (#12297) 2024-10-30 17:21:47 +08:00
Ruonan Wang
2b2cb9c693
[NPU pipeline] Support save & load and update examples (#12293)
* support save & load, update llama examples

* update baichuan2 example

* update readme
2024-10-30 10:02:00 +08:00
binbin Deng
3feb58d1e4
Support baichuan2 for level0 pipeline (#12289) 2024-10-29 19:24:16 +08:00
Yina Chen
4467645088
[NPU] Support l0 Llama groupwise (#12276)
* except lm_head

* remove

* support gw lm_head

* update

* fix

* remove run.bat

* fix style

* support llama3
2024-10-28 17:06:55 +08:00
Ruonan Wang
3fe2ea3081
[NPU] Reuse prefill of acc lib for pipeline (#12279)
* first commit

* update example

* fix style

* update example

* embedding as const

* fix generate

* code  refactor

* meet code review

* fix style

* change max_output_len to max_context_len

* fix all-in-one

* fix example

* add check for new tokens
2024-10-28 16:05:49 +08:00
binbin Deng
ec362e6133
Add llama3 level0 example (#12275) 2024-10-28 09:24:51 +08:00
Ruonan Wang
854398f6e0
update example to reduce peak memory usage (#12274) 2024-10-25 17:09:26 +08:00
Ruonan Wang
ae57e23e4f
fix incompatibility between llama GW & llama pipeline (#12267)
* fix

* fix
2024-10-25 10:31:44 +08:00
Ruonan Wang
821fd96367
Initial integrate our L0 Llama impl into ipex-llm (#12255)
* temp save

* initial support

* fix

* simplify code

* fix style

* fix example

* make default value of pipeline as False
2024-10-24 09:49:27 +08:00
Ruonan Wang
4d93bb81fe
Initial support of NPU level0 Model (#12177)
* first commit to support load dll and init llm pipeline

* add init generate

* fix style

* small updates

* fix style and check tokens number
2024-10-11 09:45:53 +08:00