Ruonan Wang
|
0e23bd779f
|
Add support of llama3.2 for NPU C++ (#12442)
* initial support of llama3.2
* update
* update
* fix style
* fix style
* fix
* small fix
|
2024-11-26 09:26:55 +08:00 |
|
Ruonan Wang
|
b9abb8a285
|
Support qwen2.5 3B for NPU & update related examples (#12438)
* update qwen2.5-3B
* update convert
* small fix
* replace load_in_low_bit with low_bit
* small fix
|
2024-11-25 16:38:31 +08:00 |
|
Yina Chen
|
b2e69a896c
|
[NPU] Support Baichuan groupwise & gw code refactor (#12337)
* support minicpm 1b & qwen 1.5b gw
* support minicpm 1b
* baichuan part
* update
* support minicpm 1b & qwen 1.5b gw
* support minicpm 1b
* baichuan part
* update
* update
* update
* baichuan support
* code refactor
* remove code
* fix style
* address comments
* revert
|
2024-11-08 11:42:42 +08:00 |
|
binbin Deng
|
812d5cc32e
|
[NPU L0] Support llama3.2 in L0 pipeline (#12361)
|
2024-11-08 10:01:23 +08:00 |
|
Yina Chen
|
d872639395
|
[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327)
* support minicpm 1b & qwen 1.5b gw
* support minicpm 1b
* support minicpm 2b
* fix style & error
* fix style & update
* remove print
|
2024-11-05 15:51:31 +08:00 |
|
Kai Huang
|
c8679ad592
|
Qwen layernorm as input (#12309)
* qwen layernorm as input
* add group size
|
2024-11-04 09:51:15 +08:00 |
|
binbin Deng
|
d409d9d0eb
|
[NPU L0] Update streaming mode of example (#12312)
|
2024-11-01 15:38:10 +08:00 |
|
binbin Deng
|
eda764909c
|
Add minicpm-2b in L0 pipeline (#12308)
|
2024-11-01 09:30:01 +08:00 |
|
binbin Deng
|
4892df61c9
|
Add qwen2-1.5b in l0 pipeline example (#12306)
|
2024-10-31 16:44:25 +08:00 |
|
Kai Huang
|
416c19165c
|
Add Qwen pipeline and example (#12292)
* support qwen pipeline
* update error msg
* style
* meet review
* minor
|
2024-10-31 11:25:25 +08:00 |
|
binbin Deng
|
41b8064554
|
Support minicpm-1B in level0 pipeline (#12297)
|
2024-10-30 17:21:47 +08:00 |
|
Ruonan Wang
|
2b2cb9c693
|
[NPU pipeline] Support save & load and update examples (#12293)
* support save & load, update llama examples
* update baichuan2 example
* update readme
|
2024-10-30 10:02:00 +08:00 |
|
binbin Deng
|
3feb58d1e4
|
Support baichuan2 for level0 pipeline (#12289)
|
2024-10-29 19:24:16 +08:00 |
|
Yina Chen
|
4467645088
|
[NPU] Support l0 Llama groupwise (#12276)
* except lm_head
* remove
* support gw lm_head
* update
* fix
* remove run.bat
* fix style
* support llama3
|
2024-10-28 17:06:55 +08:00 |
|
Ruonan Wang
|
3fe2ea3081
|
[NPU] Reuse prefill of acc lib for pipeline (#12279)
* first commit
* update example
* fix style
* update example
* embedding as const
* fix generate
* code refactor
* meet code review
* fix style
* change max_output_len to max_context_len
* fix all-in-one
* fix example
* add check for new tokens
|
2024-10-28 16:05:49 +08:00 |
|
binbin Deng
|
ec362e6133
|
Add llama3 level0 example (#12275)
|
2024-10-28 09:24:51 +08:00 |
|
Ruonan Wang
|
854398f6e0
|
update example to reduce peak memory usage (#12274)
|
2024-10-25 17:09:26 +08:00 |
|
Ruonan Wang
|
ae57e23e4f
|
fix incompatibility between llama GW & llama pipeline (#12267)
* fix
* fix
|
2024-10-25 10:31:44 +08:00 |
|
Ruonan Wang
|
821fd96367
|
Initial integrate our L0 Llama impl into ipex-llm (#12255)
* temp save
* initial support
* fix
* simplify code
* fix style
* fix example
* make default value of pipeline as False
|
2024-10-24 09:49:27 +08:00 |
|
Ruonan Wang
|
4d93bb81fe
|
Initial support of NPU level0 Model (#12177)
* first commit to support load dll and init llm pipeline
* add init generate
* fix style
* small updates
* fix style and check tokens number
|
2024-10-11 09:45:53 +08:00 |
|