Commit graph

10 commits

Author SHA1 Message Date
binbin Deng
14d8d3d8af
Integrate NPU C++ imple into ipex-llm (#12461) 2024-11-29 09:25:37 +08:00
Ruonan Wang
f8c2bb2943
[NPU] optimize qwen2 prefill performance for C++ (#12451) 2024-11-27 10:46:18 +08:00
Ruonan Wang
0e23bd779f
Add support of llama3.2 for NPU C++ (#12442)
* initial support of  llama3.2

* update

* update

* fix style

* fix style

* fix

* small fix
2024-11-26 09:26:55 +08:00
Ruonan Wang
b9abb8a285
Support qwen2.5 3B for NPU & update related examples (#12438)
* update qwen2.5-3B

* update convert

* small fix

* replace load_in_low_bit with low_bit

* small fix
2024-11-25 16:38:31 +08:00
Jinhe
b633fbf26c
add chinese prompt troubleshooting for npu cpp examples (#12437)
* add chinese prompt troubleshooting

* add chinese prompt troubleshooting
2024-11-25 15:28:47 +08:00
Ruonan Wang
f41405368a
Support minicpm for NPU C++ (#12434)
* support minicpm-1b

* update

* tune fused_layers

* update readme.md
2024-11-25 10:42:02 +08:00
Ruonan Wang
0819fad34e
support Llama2-7B / Llama3-8B for NPU C++ (#12431)
* support llama2

* update

* support fused_layers=4 for Llama2-7B
2024-11-22 18:47:19 +08:00
Ruonan Wang
4ffa6c752c
New convert support for C++ NPU (#12430)
* initial commit

* fix

* fix style

* fix style

* fix

* fix
2024-11-22 14:28:30 +08:00
Ruonan Wang
2935e97610
small fix of cpp readme(#12425) 2024-11-21 18:21:34 +08:00
Ruonan Wang
7288c759ce
Initial NPU C++ Example (#12417)
* temp save

* meet review, update

* update

* meet review, add license

* typo
2024-11-21 10:09:26 +08:00