Commit graph

94 commits

Author SHA1 Message Date
Ruonan Wang
e90a9ad196
[NPU] Support non-const parameter for decoder layers when keep_ir=True (#12789)
* support layernorm=False for decoder layers

* renbame to meet review

* fix style

* rename to const_parameter

* fix rebase error

* fix rebase error
2025-02-08 09:58:42 +08:00
binbin Deng
6ff7faa781
[NPU] Update deepseek support in python examples and quickstart (#12786) 2025-02-07 11:25:16 +08:00
Ruonan Wang
b4f2be2b09
[NPU] Update C++ example to add DeepSeek-R1 (#12787) 2025-02-07 11:23:34 +08:00
Ruonan Wang
094a25b740
[NPU] Expose parameter to control blob / IR save logic (#12767)
* update api

* fix convert.py

* fix style

* remove unnecessary bin file

* fix style
2025-02-06 10:07:45 +08:00
Ruonan Wang
78cca0a68c
[NPU] update llm-npu-cli example (#12729)
* update cli example

* add license

* rename

* update readme sample output
2025-01-22 09:59:27 +08:00
Yuwen Hu
525b0ee991
[NPU] Tiny fixes on examples (#12661) 2025-01-07 14:30:38 +08:00
Yuwen Hu
381d448ee2
[NPU] Example & Quickstart updates (#12650)
* Remove model with optimize_model=False in NPU verified models tables, and remove related example

* Remove experimental in run optimized model section title

* Unify model table order & example cmd

* Move embedding example to separate folder & update quickstart example link

* Add Quickstart reference in main NPU readme

* Small fix

* Small fix

* Move save/load examples under NPU/HF-Transformers-AutoModels

* Add low-bit and polish arguments for LLM Python examples

* Small fix

* Add low-bit and polish arguments for Multi-Model  examples

* Polish argument for Embedding models

* Polish argument for LLM CPP examples

* Add low-bit and polish argument for Save-Load examples

* Add accuracy tuning tips for examples

* Update NPU qucikstart accuracy tuning with low-bit optimizations

* Add save/load section to qucikstart

* Update CPP example sample output to EN

* Add installation regarding cmake for CPP examples

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Unify max prompt length to 512

* Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4

* Update based on comments

* Small fix
2025-01-07 13:52:41 +08:00
binbin Deng
0b377100c5
Add guide for save-load usage (#12498) 2025-01-03 16:30:15 +08:00
Ruonan Wang
90f6709486
[remove pipeline examples (#12626) 2024-12-27 13:42:28 +08:00
Zijie Li
5f04ed7254
NPU] Update prompt format for baichuan2-pipeline (#12625) 2024-12-27 11:30:54 +08:00
binbin Deng
796ee571a5
[NPU doc] Update verified platforms (#12621) 2024-12-26 17:39:13 +08:00
Zijie Li
ccc4055058
[NPU] Update prompt format for baichuan2 (#12615)
* Update baichuan2.py

* style fix
2024-12-26 11:41:37 +08:00
Ruonan Wang
d841e1dc0d
[NPU] update convert script based on latest usage (#12617) 2024-12-26 11:23:04 +08:00
binbin Deng
680ea7e4a8
[NPU doc] Update configuration for different platforms (#12554) 2024-12-17 10:15:09 +08:00
binbin Deng
caf15cc5ef
[NPU] Add IPEX_LLM_NPU_MTL to enable support on mtl (#12543) 2024-12-13 17:01:13 +08:00
binbin Deng
d20a968ce2
[NPU] Fix generate example (#12541) 2024-12-13 14:07:24 +08:00
Yuwen Hu
dbaf4abcb3
[NPU] Update C++ example with repetition_penalty & update Python code accordingly (#12528)
* Update c++ npu examples with repetition penalty

* Fit python with updated C++ API

* Style fix

* Small fix

* Small fix
2024-12-12 13:42:55 +08:00
binbin Deng
ea55235cbd
[NPU] Support glm-edge models (#12511) 2024-12-09 14:06:27 +08:00
binbin Deng
12c78978dd
[NPU C++] Update example with conversation mode support (#12510) 2024-12-06 12:46:37 +08:00
Jinhe
5e1416c9aa
fix readme for npu cpp examples and llama.cpp (#12505)
* fix cpp readme

* fix cpp readme

* fix cpp readme
2024-12-05 12:32:42 +08:00
Chu,Youcheng
ffa9a9e1b3
Update streaming in npu examples (#12495)
* feat: add streaming

* Update readme accordingly

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-12-04 17:51:10 +08:00
Yuwen Hu
ef4028ac2d
[NPU] Support split lm_head for Qwen2 with CPP (#12491)
* Use split for Qwen2 lm_head instead of slice in optimize_pre

* Support split lm_head in Qwen2 python cpp backend

* Fit with Python acc lib pipeline

* Removed default mixed_precision=True in all-in-one and related examples

* Small fix

* Style fix

* Fix based on comments

* Fix based on comments

* Stype fix
2024-12-04 14:41:08 +08:00
Jin, Qiao
7082844f3f
Fix NPU LLM example save/load tokenizer (#12485) 2024-12-03 16:30:55 +08:00
binbin Deng
ab01753b1c
[NPU] update save-load API usage (#12473) 2024-12-03 09:46:15 +08:00
Yuwen Hu
aee9acb303
Add NPU QuickStart & update example links (#12470)
* Add initial NPU quickstart (c++ part unfinished)

* Small update

* Update based on comments

* Update main readme

* Remove LLaMA description

* Small fix

* Small fix

* Remove subsection link in main README

* Small fix

* Update based on comments

* Small fix

* TOC update and other small fixes

* Update for Chinese main readme

* Update based on comments and other small fixes

* Change order
2024-12-02 17:03:10 +08:00
binbin Deng
c911026f03
[NPU C++] Update model support & examples & benchmark (#12466) 2024-11-29 13:35:58 +08:00
binbin Deng
14d8d3d8af
Integrate NPU C++ imple into ipex-llm (#12461) 2024-11-29 09:25:37 +08:00
Ruonan Wang
f8c2bb2943
[NPU] optimize qwen2 prefill performance for C++ (#12451) 2024-11-27 10:46:18 +08:00
Ruonan Wang
0e23bd779f
Add support of llama3.2 for NPU C++ (#12442)
* initial support of  llama3.2

* update

* update

* fix style

* fix style

* fix

* small fix
2024-11-26 09:26:55 +08:00
Ruonan Wang
b9abb8a285
Support qwen2.5 3B for NPU & update related examples (#12438)
* update qwen2.5-3B

* update convert

* small fix

* replace load_in_low_bit with low_bit

* small fix
2024-11-25 16:38:31 +08:00
Jinhe
b633fbf26c
add chinese prompt troubleshooting for npu cpp examples (#12437)
* add chinese prompt troubleshooting

* add chinese prompt troubleshooting
2024-11-25 15:28:47 +08:00
Ruonan Wang
f41405368a
Support minicpm for NPU C++ (#12434)
* support minicpm-1b

* update

* tune fused_layers

* update readme.md
2024-11-25 10:42:02 +08:00
Ruonan Wang
0819fad34e
support Llama2-7B / Llama3-8B for NPU C++ (#12431)
* support llama2

* update

* support fused_layers=4 for Llama2-7B
2024-11-22 18:47:19 +08:00
Ruonan Wang
4ffa6c752c
New convert support for C++ NPU (#12430)
* initial commit

* fix

* fix style

* fix style

* fix

* fix
2024-11-22 14:28:30 +08:00
Ruonan Wang
2935e97610
small fix of cpp readme(#12425) 2024-11-21 18:21:34 +08:00
Ruonan Wang
7288c759ce
Initial NPU C++ Example (#12417)
* temp save

* meet review, update

* update

* meet review, add license

* typo
2024-11-21 10:09:26 +08:00
Yina Chen
b2e69a896c
[NPU] Support Baichuan groupwise & gw code refactor (#12337)
* support minicpm 1b & qwen 1.5b gw

* support minicpm 1b

* baichuan part

* update

* support minicpm 1b & qwen 1.5b gw

* support minicpm 1b

* baichuan part

* update

* update

* update

* baichuan support

* code refactor

* remove code

* fix style

* address comments

* revert
2024-11-08 11:42:42 +08:00
binbin Deng
812d5cc32e
[NPU L0] Support llama3.2 in L0 pipeline (#12361) 2024-11-08 10:01:23 +08:00
SONG Ge
a7b66683f1
[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339)
* Add initial support for llama3.2-1b/3b

* move llama3.2 support into current llama_mp impl
2024-11-06 19:21:40 +08:00
Yina Chen
d872639395
[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327)
* support minicpm 1b & qwen 1.5b gw

* support minicpm 1b

* support minicpm 2b

* fix style & error

* fix style & update

* remove print
2024-11-05 15:51:31 +08:00
Kai Huang
c8679ad592
Qwen layernorm as input (#12309)
* qwen layernorm as input

* add group size
2024-11-04 09:51:15 +08:00
binbin Deng
d409d9d0eb
[NPU L0] Update streaming mode of example (#12312) 2024-11-01 15:38:10 +08:00
binbin Deng
eda764909c
Add minicpm-2b in L0 pipeline (#12308) 2024-11-01 09:30:01 +08:00
binbin Deng
4892df61c9
Add qwen2-1.5b in l0 pipeline example (#12306) 2024-10-31 16:44:25 +08:00
Kai Huang
416c19165c
Add Qwen pipeline and example (#12292)
* support qwen pipeline

* update error msg

* style

* meet review

* minor
2024-10-31 11:25:25 +08:00
binbin Deng
41b8064554
Support minicpm-1B in level0 pipeline (#12297) 2024-10-30 17:21:47 +08:00
Ruonan Wang
2b2cb9c693
[NPU pipeline] Support save & load and update examples (#12293)
* support save & load, update llama examples

* update baichuan2 example

* update readme
2024-10-30 10:02:00 +08:00
binbin Deng
3feb58d1e4
Support baichuan2 for level0 pipeline (#12289) 2024-10-29 19:24:16 +08:00
Yina Chen
4467645088
[NPU] Support l0 Llama groupwise (#12276)
* except lm_head

* remove

* support gw lm_head

* update

* fix

* remove run.bat

* fix style

* support llama3
2024-10-28 17:06:55 +08:00
Ruonan Wang
3fe2ea3081
[NPU] Reuse prefill of acc lib for pipeline (#12279)
* first commit

* update example

* fix style

* update example

* embedding as const

* fix generate

* code  refactor

* meet code review

* fix style

* change max_output_len to max_context_len

* fix all-in-one

* fix example

* add check for new tokens
2024-10-28 16:05:49 +08:00