Yuwen Hu
aee9acb303
Add NPU QuickStart & update example links ( #12470 )
...
* Add initial NPU quickstart (c++ part unfinished)
* Small update
* Update based on comments
* Update main readme
* Remove LLaMA description
* Small fix
* Small fix
* Remove subsection link in main README
* Small fix
* Update based on comments
* Small fix
* TOC update and other small fixes
* Update for Chinese main readme
* Update based on comments and other small fixes
* Change order
2024-12-02 17:03:10 +08:00
Jin, Qiao
31c69a8d31
Fix MiniCPM-V models running on NPU ( #12478 )
2024-12-02 16:29:46 +08:00
binbin Deng
54d9a590d4
[NPU]Fix eos_token setting ( #12475 )
2024-12-02 14:18:22 +08:00
Guancheng Fu
59bd4a214f
add vLLM glm4 fix ( #12474 )
2024-12-02 14:05:16 +08:00
Ruonan Wang
4b6c3160be
Support imatrix-guided quantization for NPU CW ( #12468 )
...
* init commit
* remove print
* add interface
* fix
* fix
* fix style
2024-12-02 11:31:26 +08:00
binbin Deng
f99f188023
Hotfix of benchmark script ( #12467 )
2024-11-29 14:00:59 +08:00
binbin Deng
c911026f03
[NPU C++] Update model support & examples & benchmark ( #12466 )
2024-11-29 13:35:58 +08:00
binbin Deng
14d8d3d8af
Integrate NPU C++ imple into ipex-llm ( #12461 )
2024-11-29 09:25:37 +08:00
Ruonan Wang
490bb0ca53
[NPU] update fused layers for GW ( #12459 )
...
* update fused layers for GW
* fix
* fix llama condition for glm model
* update
2024-11-28 17:14:30 +08:00
Yina Chen
1b533a105c
[NPU] Add env to enable scale search ( #12462 )
...
* add env enable scale search
* address comment
* move logic
2024-11-28 17:06:00 +08:00
Heyang Sun
d272f6b471
remove nf4 unsupport comment in cpu finetuning ( #12460 )
...
Co-authored-by: Ariadne <wyn2000330@126.com>
2024-11-28 13:26:46 +08:00
Ruonan Wang
b29da30205
[NPU] Update C++ L0 ( #12458 )
...
* update
* fix style
2024-11-27 22:08:48 +08:00
Yishuo Wang
6f3441ba4c
fix glm4-9b overflow ( #12455 )
2024-11-27 17:39:13 +08:00
Ruonan Wang
281c9b0bb9
[NPU] Add L0 support for NPU C++ ( #12454 )
...
* add L0 models support
* meet review
* fix style
2024-11-27 17:04:13 +08:00
Chu,Youcheng
ce6fcaa9ba
update transformers version in example of glm4 ( #12453 )
...
* fix: update transformers version in example of glm4
* fix: textual adjustments
* fix: texual adjustment
2024-11-27 15:02:25 +08:00
Yuwen Hu
effb9bb41c
Small update to LangChain examples readme ( #12452 )
2024-11-27 14:02:25 +08:00
Chu,Youcheng
acd77d9e87
Remove env variable BIGDL_LLM_XMX_DISABLED in documentation ( #12445 )
...
* fix: remove BIGDL_LLM_XMX_DISABLED in mddocs
* fix: remove set SYCL_CACHE_PERSISTENT=1 in example
* fix: remove BIGDL_LLM_XMX_DISABLED in workflows
* fix: merge igpu and A-series Graphics
* fix: remove set BIGDL_LLM_XMX_DISABLED=1 in example
* fix: remove BIGDL_LLM_XMX_DISABLED in workflows
* fix: merge igpu and A-series Graphics
* fix: textual adjustment
* fix: textual adjustment
* fix: textual adjustment
2024-11-27 11:16:36 +08:00
Ruonan Wang
f8c2bb2943
[NPU] optimize qwen2 prefill performance for C++ ( #12451 )
2024-11-27 10:46:18 +08:00
Ruonan Wang
7b40f9b372
[NPU] Support GW for NPU C++ ( #12450 )
2024-11-26 17:46:40 +08:00
Jin, Qiao
c2efa264d9
Update LangChain examples to use upstream ( #12388 )
...
* Update LangChain examples to use upstream
* Update README and fix links
* Update LangChain CPU examples to use upstream
* Update LangChain CPU voice_assistant example
* Update CPU README
* Update GPU README
* Remove GPU Langchain vLLM example and fix comments
* Change langchain -> LangChain
* Add reference for both upstream llms and embeddings
* Fix comments
* Fix comments
* Fix comments
* Fix comments
* Fix comment
2024-11-26 16:43:15 +08:00
Ruonan Wang
24b46b2b19
[NPU] further fix of qwen2 int8 pipeline & C++ ( #12449 )
...
* fix
* fix style
2024-11-26 16:39:39 +08:00
Yuwen Hu
303b104c10
Fix abnormal output for Qwen2-7B when sym_int8 ( #12446 )
2024-11-26 15:53:04 +08:00
Ruonan Wang
52c17fe104
Optimize first token of C++ NPU by adding npu_dpu_groups ( #12443 )
...
* add npu_dpu_groups
* add check for env
* fix style
2024-11-26 11:41:32 +08:00
Jinhe
66bd7abae4
add sdxl and lora-lcm optimization ( #12444 )
...
* add sdxl and lora-lcm optimization
* fix openjourney speed drop
2024-11-26 11:38:09 +08:00
Ruonan Wang
0e23bd779f
Add support of llama3.2 for NPU C++ ( #12442 )
...
* initial support of llama3.2
* update
* update
* fix style
* fix style
* fix
* small fix
2024-11-26 09:26:55 +08:00
Yishuo Wang
cdd41f5e4c
optimize sdxl again ( #12441 )
2024-11-25 17:46:46 +08:00
Ruonan Wang
b9abb8a285
Support qwen2.5 3B for NPU & update related examples ( #12438 )
...
* update qwen2.5-3B
* update convert
* small fix
* replace load_in_low_bit with low_bit
* small fix
2024-11-25 16:38:31 +08:00
Jinhe
b633fbf26c
add chinese prompt troubleshooting for npu cpp examples ( #12437 )
...
* add chinese prompt troubleshooting
* add chinese prompt troubleshooting
2024-11-25 15:28:47 +08:00
Yishuo Wang
8164aed802
small change ( #12439 )
2024-11-25 14:35:49 +08:00
Yishuo Wang
be132c4209
fix and optimize sd ( #12436 )
2024-11-25 14:09:48 +08:00
Ruonan Wang
f41405368a
Support minicpm for NPU C++ ( #12434 )
...
* support minicpm-1b
* update
* tune fused_layers
* update readme.md
2024-11-25 10:42:02 +08:00
Ruonan Wang
0819fad34e
support Llama2-7B / Llama3-8B for NPU C++ ( #12431 )
...
* support llama2
* update
* support fused_layers=4 for Llama2-7B
2024-11-22 18:47:19 +08:00
Ruonan Wang
4ffa6c752c
New convert support for C++ NPU ( #12430 )
...
* initial commit
* fix
* fix style
* fix style
* fix
* fix
2024-11-22 14:28:30 +08:00
Yuwen Hu
e61ae88c5b
Upgrade denpendency for xpu_lnl and xpu_arl option ( #12424 )
2024-11-21 18:37:15 +08:00
Ruonan Wang
2935e97610
small fix of cpp readme( #12425 )
2024-11-21 18:21:34 +08:00
Yuwen Hu
8fdc36c140
Optimize with new batch kernel when batch_size=1 on LNL ( #12419 )
...
* Add use batch kernel condition for LNL
* Fix for other device judgement
* Fix based on comment
2024-11-21 16:21:35 +08:00
Jinhe
7e0a840f74
add optimization to openjourney ( #12423 )
...
* add optimization to openjourney
* add optimization to openjourney
2024-11-21 15:23:51 +08:00
Yishuo Wang
145e8b480f
update batch kernel condition ( #12421 )
2024-11-21 10:12:46 +08:00
Ruonan Wang
7288c759ce
Initial NPU C++ Example ( #12417 )
...
* temp save
* meet review, update
* update
* meet review, add license
* typo
2024-11-21 10:09:26 +08:00
Jinhe
d2a37b6ab2
add Stable diffusion examples ( #12418 )
...
* add openjourney example
* add timing
* add stable diffusion to model page
* 4.1 fix
* small fix
2024-11-20 17:18:36 +08:00
Ruonan Wang
54c62feb74
[NPU] dump prefill IR for further C++ solution ( #12402 )
...
* save prefill ir
* fix
* shorten convert time
* fix
* fix
* fix
* fix
* fix style
* dump config.json
* meet review
* small fix
2024-11-20 15:20:05 +08:00
SONG Ge
ff3f7cb25f
Fix speech_paraformer issue with unexpected changes ( #12416 )
...
* Fix speech_paraformer issue with unexpected changes
* Add paraformer version specified
2024-11-19 15:01:20 +08:00
Yuwen Hu
a69395f31f
Support performance mode of GLM4 model ( #12401 )
...
* Initial support of prepare generation args for transformers 445
* Small fix to chatglm4 model optimization
* Small fix
* fix glm4 position id
* fix glm4 error
* Small change in conditon & fix based on comments
* Style fixes
---------
Co-authored-by: cyita <yitastudy@gmail.com>
2024-11-18 18:46:52 +08:00
Song Fuchang
d2c821d458
Add missing arguments in pipeline parallel generate method ( #12142 )
...
Add two arguments: negative_prompt_ids and negative_prompt_attention_mask to generate method in pipeline_parallel.py.
2024-11-18 13:50:18 +08:00
Yishuo Wang
3d5fbf2069
update batch kernel condition ( #12408 )
2024-11-15 13:47:05 +08:00
binbin Deng
d4d949443f
[NPU] change attention_mask to fp16 ( #12400 )
2024-11-14 17:20:29 +08:00
Qiyuan Gong
7e50ff113c
Add padding_token=eos_token for GPU trl QLora example ( #12398 )
...
* Avoid tokenizer doesn't have a padding token error.
2024-11-14 10:51:30 +08:00
SONG Ge
d2cbcb060c
Add initial support for modeling_xlm encoder on NPU ( #12393 )
...
* Add initial support for modeling_xlm encoder on NPU
* Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert
* Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU
* Add related example and documents
2024-11-14 10:50:27 +08:00
Yina Chen
59b01fa7d2
small fix ( #12397 )
2024-11-14 10:03:36 +08:00
Yishuo Wang
00fce5c940
use new q4_0 batch kernel ( #12396 )
2024-11-13 18:37:34 +08:00