Commit graph

14 commits

Author SHA1 Message Date
Yuwen Hu
dbaf4abcb3
[NPU] Update C++ example with repetition_penalty & update Python code accordingly (#12528)
* Update c++ npu examples with repetition penalty

* Fit python with updated C++ API

* Style fix

* Small fix

* Small fix
2024-12-12 13:42:55 +08:00
binbin Deng
12c78978dd
[NPU C++] Update example with conversation mode support (#12510) 2024-12-06 12:46:37 +08:00
Jinhe
5e1416c9aa
fix readme for npu cpp examples and llama.cpp (#12505)
* fix cpp readme

* fix cpp readme

* fix cpp readme
2024-12-05 12:32:42 +08:00
Yuwen Hu
ef4028ac2d
[NPU] Support split lm_head for Qwen2 with CPP (#12491)
* Use split for Qwen2 lm_head instead of slice in optimize_pre

* Support split lm_head in Qwen2 python cpp backend

* Fit with Python acc lib pipeline

* Removed default mixed_precision=True in all-in-one and related examples

* Small fix

* Style fix

* Fix based on comments

* Fix based on comments

* Stype fix
2024-12-04 14:41:08 +08:00
binbin Deng
14d8d3d8af
Integrate NPU C++ imple into ipex-llm (#12461) 2024-11-29 09:25:37 +08:00
Ruonan Wang
f8c2bb2943
[NPU] optimize qwen2 prefill performance for C++ (#12451) 2024-11-27 10:46:18 +08:00
Ruonan Wang
0e23bd779f
Add support of llama3.2 for NPU C++ (#12442)
* initial support of  llama3.2

* update

* update

* fix style

* fix style

* fix

* small fix
2024-11-26 09:26:55 +08:00
Ruonan Wang
b9abb8a285
Support qwen2.5 3B for NPU & update related examples (#12438)
* update qwen2.5-3B

* update convert

* small fix

* replace load_in_low_bit with low_bit

* small fix
2024-11-25 16:38:31 +08:00
Jinhe
b633fbf26c
add chinese prompt troubleshooting for npu cpp examples (#12437)
* add chinese prompt troubleshooting

* add chinese prompt troubleshooting
2024-11-25 15:28:47 +08:00
Ruonan Wang
f41405368a
Support minicpm for NPU C++ (#12434)
* support minicpm-1b

* update

* tune fused_layers

* update readme.md
2024-11-25 10:42:02 +08:00
Ruonan Wang
0819fad34e
support Llama2-7B / Llama3-8B for NPU C++ (#12431)
* support llama2

* update

* support fused_layers=4 for Llama2-7B
2024-11-22 18:47:19 +08:00
Ruonan Wang
4ffa6c752c
New convert support for C++ NPU (#12430)
* initial commit

* fix

* fix style

* fix style

* fix

* fix
2024-11-22 14:28:30 +08:00
Ruonan Wang
2935e97610
small fix of cpp readme(#12425) 2024-11-21 18:21:34 +08:00
Ruonan Wang
7288c759ce
Initial NPU C++ Example (#12417)
* temp save

* meet review, update

* update

* meet review, add license

* typo
2024-11-21 10:09:26 +08:00