Commit graph

79 commits

Author SHA1 Message Date
binbin Deng
71f03dcc39
Support qwen2-7b with fused decoderlayer optimization on NPU (#11912) 2024-08-29 13:34:20 +08:00
SONG Ge
5ca7390082
[NPU] Add minicpm-2b support for npu multi-processing (#11949)
* add minicpm-2b support

* update example for minicpm-2b

* add LNL NPU driver requirement in readme
2024-08-28 18:08:49 +08:00
Zijie Li
90f692937d
Update npu baichuan2 (#11939) 2024-08-27 16:56:26 +08:00
Jiao Wang
b4b6ddf73c
NPU Baichuan2 Multi- Process example (#11928) 2024-08-27 15:25:49 +08:00
SONG Ge
a81a329a5f
[NPU] Add example for NPU multi-processing minicpm-1b model (#11935)
* add minicpm example
2024-08-27 14:57:46 +08:00
Yina Chen
e246f1e258
update llama3 npu example (#11933) 2024-08-27 13:03:18 +08:00
binbin Deng
14dddfc0d6
Update NPU example readme (#11931) 2024-08-27 12:44:58 +08:00
Zijie Li
6c3eb1e1e8
refactor from_pretrained API for NPU (#11927) 2024-08-27 09:50:30 +08:00
binbin Deng
dd303776cf
Add troubleshooting about transpose value setting 2024-08-26 16:06:32 +08:00
Zijie Li
794abe2ce8
update npu-readme (#11900) 2024-08-22 17:49:35 +08:00
binbin Deng
72a7bf624b
Support qwen2-1.5b with fused decoderlayer optimization on NPU (#11888) 2024-08-22 11:09:12 +08:00
SONG Ge
8c5c7f32dd
Update doc for running npu generate example with ipex-llm[npu] (#11876)
* update doc for running npu generate example with ipex-llm[npu]

* switch max_prompt_len to 512 to fix compile error on mtl
2024-08-21 13:45:29 +08:00
SONG Ge
5b83493b1a
Add ipex-llm npu option in setup.py (#11858)
* add ipex-llm npu release

* update example doc

* meet latest release changes
2024-08-20 17:29:49 +08:00
SONG Ge
7380823f3f
Update Llama2 multi-processes example (#11852)
* update llama2 multi-processes examples

* update

* update readme

* update
2024-08-19 19:49:01 +08:00
Yang Wang
99b05ba1dc
separate prefill into a process (#11787)
* seperate prefill into a process

* using model.share_memory()

* might work

* worked

* use long prompt

* refactor

* cleanup

* fix bug

* clean up

* changable inter and intra process stages

* refactor

* add max output len

* fix npu_model changes that may cause generate down

* fix npu_model generate import error

* fix generare forward error

---------

Co-authored-by: sgwhat <ge.song@intel.com>
2024-08-19 17:53:36 +08:00
Yang Wang
51bcac1229
follow up on experimental support of fused decoder layer for llama2 (#11785)
* clean up and support transpose value cache

* refine

* fix style

* fix style
2024-08-13 18:53:55 -07:00
binbin Deng
23d3acdc77
Add experimental support of fused decoder layer for llama2 (#11768) 2024-08-13 14:41:36 +08:00
Jin, Qiao
c28b3389e6
Update npu multimodal example (#11773) 2024-08-13 14:14:59 +08:00
Jin, Qiao
05989ad0f9
Update npu example and all in one benckmark (#11766) 2024-08-12 16:46:46 +08:00
Jin, Qiao
a44ab32153
Switch to conhost when running on NPU (#11687) 2024-07-30 17:08:06 +08:00
Zhao Changmin
06745e5742
Add npu benchmark all-in-one script (#11571)
* npu benchmark
2024-07-15 10:42:37 +08:00
Zhao Changmin
b9c66994a5
add npu sdp (#11562) 2024-07-11 16:57:35 +08:00
Zhao Changmin
105e124752
optimize phi3-v encoder npu performance and add multimodal example (#11553)
* phi3-v

* readme
2024-07-11 13:59:14 +08:00
Zhao Changmin
3c16c9f725
Optimize baichuan on NPU (#11548)
* baichuan_npu
2024-07-10 13:18:48 +08:00
Zhao Changmin
76a5802acf
update NPU examples (#11540)
* update NPU examples
2024-07-09 17:19:42 +08:00
Yishuo Wang
319a3b36b2
fix npu llama2 (#11471) 2024-07-01 10:14:11 +08:00
Yishuo Wang
cf0f5c4322
change npu document (#11446) 2024-06-27 13:59:59 +08:00
Yishuo Wang
3b23de684a
update npu examples (#11422) 2024-06-25 13:32:53 +08:00
Zijie Li
ae452688c2
Add NPU HF example (#11358) 2024-06-19 18:07:28 +08:00