Commit graph

2020 commits

Author SHA1 Message Date
Chu,Youcheng
acd77d9e87
Remove env variable BIGDL_LLM_XMX_DISABLED in documentation (#12445)
* fix: remove BIGDL_LLM_XMX_DISABLED in mddocs

* fix: remove set SYCL_CACHE_PERSISTENT=1 in example

* fix: remove BIGDL_LLM_XMX_DISABLED in workflows

* fix: merge igpu and A-series Graphics

* fix: remove set BIGDL_LLM_XMX_DISABLED=1 in example

* fix: remove BIGDL_LLM_XMX_DISABLED in workflows

* fix: merge igpu and A-series Graphics

* fix: textual adjustment

* fix: textual adjustment

* fix: textual adjustment
2024-11-27 11:16:36 +08:00
Ruonan Wang
f8c2bb2943
[NPU] optimize qwen2 prefill performance for C++ (#12451) 2024-11-27 10:46:18 +08:00
Ruonan Wang
7b40f9b372
[NPU] Support GW for NPU C++ (#12450) 2024-11-26 17:46:40 +08:00
Jin, Qiao
c2efa264d9
Update LangChain examples to use upstream (#12388)
* Update LangChain examples to use upstream

* Update README and fix links

* Update LangChain CPU examples to use upstream

* Update LangChain CPU voice_assistant example

* Update CPU README

* Update GPU README

* Remove GPU Langchain vLLM example and fix comments

* Change langchain -> LangChain

* Add reference for both upstream llms and embeddings

* Fix comments

* Fix comments

* Fix comments

* Fix comments

* Fix comment
2024-11-26 16:43:15 +08:00
Ruonan Wang
24b46b2b19
[NPU] further fix of qwen2 int8 pipeline & C++ (#12449)
* fix

* fix style
2024-11-26 16:39:39 +08:00
Yuwen Hu
303b104c10
Fix abnormal output for Qwen2-7B when sym_int8 (#12446) 2024-11-26 15:53:04 +08:00
Ruonan Wang
52c17fe104
Optimize first token of C++ NPU by adding npu_dpu_groups (#12443)
* add npu_dpu_groups

* add check for env

* fix style
2024-11-26 11:41:32 +08:00
Jinhe
66bd7abae4
add sdxl and lora-lcm optimization (#12444)
* add sdxl and lora-lcm optimization

* fix openjourney speed drop
2024-11-26 11:38:09 +08:00
Ruonan Wang
0e23bd779f
Add support of llama3.2 for NPU C++ (#12442)
* initial support of  llama3.2

* update

* update

* fix style

* fix style

* fix

* small fix
2024-11-26 09:26:55 +08:00
Yishuo Wang
cdd41f5e4c
optimize sdxl again (#12441) 2024-11-25 17:46:46 +08:00
Ruonan Wang
b9abb8a285
Support qwen2.5 3B for NPU & update related examples (#12438)
* update qwen2.5-3B

* update convert

* small fix

* replace load_in_low_bit with low_bit

* small fix
2024-11-25 16:38:31 +08:00
Jinhe
b633fbf26c
add chinese prompt troubleshooting for npu cpp examples (#12437)
* add chinese prompt troubleshooting

* add chinese prompt troubleshooting
2024-11-25 15:28:47 +08:00
Yishuo Wang
8164aed802
small change (#12439) 2024-11-25 14:35:49 +08:00
Yishuo Wang
be132c4209
fix and optimize sd (#12436) 2024-11-25 14:09:48 +08:00
Ruonan Wang
f41405368a
Support minicpm for NPU C++ (#12434)
* support minicpm-1b

* update

* tune fused_layers

* update readme.md
2024-11-25 10:42:02 +08:00
Ruonan Wang
0819fad34e
support Llama2-7B / Llama3-8B for NPU C++ (#12431)
* support llama2

* update

* support fused_layers=4 for Llama2-7B
2024-11-22 18:47:19 +08:00
Ruonan Wang
4ffa6c752c
New convert support for C++ NPU (#12430)
* initial commit

* fix

* fix style

* fix style

* fix

* fix
2024-11-22 14:28:30 +08:00
Yuwen Hu
e61ae88c5b
Upgrade denpendency for xpu_lnl and xpu_arl option (#12424) 2024-11-21 18:37:15 +08:00
Ruonan Wang
2935e97610
small fix of cpp readme(#12425) 2024-11-21 18:21:34 +08:00
Yuwen Hu
8fdc36c140
Optimize with new batch kernel when batch_size=1 on LNL (#12419)
* Add use batch kernel condition for LNL

* Fix for other device judgement

* Fix based on comment
2024-11-21 16:21:35 +08:00
Jinhe
7e0a840f74
add optimization to openjourney (#12423)
* add optimization to openjourney

* add optimization to openjourney
2024-11-21 15:23:51 +08:00
Yishuo Wang
145e8b480f
update batch kernel condition (#12421) 2024-11-21 10:12:46 +08:00
Ruonan Wang
7288c759ce
Initial NPU C++ Example (#12417)
* temp save

* meet review, update

* update

* meet review, add license

* typo
2024-11-21 10:09:26 +08:00
Jinhe
d2a37b6ab2
add Stable diffusion examples (#12418)
* add openjourney example

* add timing

* add stable diffusion to model page

* 4.1 fix

* small fix
2024-11-20 17:18:36 +08:00
Ruonan Wang
54c62feb74
[NPU] dump prefill IR for further C++ solution (#12402)
* save prefill ir

* fix

* shorten convert time

* fix

* fix

* fix

* fix

* fix style

* dump config.json

* meet review

* small fix
2024-11-20 15:20:05 +08:00
SONG Ge
ff3f7cb25f
Fix speech_paraformer issue with unexpected changes (#12416)
* Fix speech_paraformer issue with unexpected changes

* Add paraformer version specified
2024-11-19 15:01:20 +08:00
Yuwen Hu
a69395f31f
Support performance mode of GLM4 model (#12401)
* Initial support of prepare generation args for transformers 445

* Small fix to chatglm4 model optimization

* Small fix

* fix glm4 position id

* fix glm4 error

* Small change in conditon & fix based on comments

* Style fixes

---------

Co-authored-by: cyita <yitastudy@gmail.com>
2024-11-18 18:46:52 +08:00
Song Fuchang
d2c821d458
Add missing arguments in pipeline parallel generate method (#12142)
Add two arguments: negative_prompt_ids and negative_prompt_attention_mask to generate method in pipeline_parallel.py.
2024-11-18 13:50:18 +08:00
Yishuo Wang
3d5fbf2069
update batch kernel condition (#12408) 2024-11-15 13:47:05 +08:00
binbin Deng
d4d949443f
[NPU] change attention_mask to fp16 (#12400) 2024-11-14 17:20:29 +08:00
Qiyuan Gong
7e50ff113c
Add padding_token=eos_token for GPU trl QLora example (#12398)
* Avoid tokenizer doesn't have a padding token error.
2024-11-14 10:51:30 +08:00
SONG Ge
d2cbcb060c
Add initial support for modeling_xlm encoder on NPU (#12393)
* Add initial support for modeling_xlm encoder on NPU

* Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert

* Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU

* Add related example and documents
2024-11-14 10:50:27 +08:00
Yina Chen
59b01fa7d2
small fix (#12397) 2024-11-14 10:03:36 +08:00
Yishuo Wang
00fce5c940
use new q4_0 batch kernel (#12396) 2024-11-13 18:37:34 +08:00
Yina Chen
d6d63d6b84
[NPU] Qwen prefill attn_mask type hotfix (#12395)
* qwen prefill attn_mask type fp16

* update
2024-11-13 17:51:34 +08:00
Yina Chen
9220babaab
qwen prefill attn_mask type fp16 (#12394) 2024-11-13 17:45:26 +08:00
Yuwen Hu
1158f91648
Fix llava with multi-image inputs (#12384) 2024-11-13 09:27:50 +08:00
Guancheng Fu
0ee54fc55f
Upgrade to vllm 0.6.2 (#12338)
* Initial updates for vllm 0.6.2

* fix

* Change Dockerfile to support v062

* Fix

* fix examples

* Fix

* done

* fix

* Update engine.py

* Fix Dockerfile to original path

* fix

* add option

* fix

* fix

* fix

* fix

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-11-12 20:35:34 +08:00
Ruonan Wang
6bf5a8c230
[NPU] Update qwen2 compile config (#12383)
* update

* fix
2024-11-12 16:59:44 +08:00
binbin Deng
7a97fbb779
Support vpm and resampler module of minicpm-v on NPU (#12375) 2024-11-12 15:59:55 +08:00
Yuwen Hu
e0918934c8
Add fused_mlp to glm4v models (#12378) 2024-11-11 17:10:25 +08:00
Yishuo Wang
dc34e8c51f
optimize glm4v vision attention (#12369) 2024-11-08 17:01:57 +08:00
Qiyuan Gong
2dfcc36825
Fix trl version and padding in trl qlora example (#12368)
* Change trl to 0.9.6
* Enable padding to avoid padding related errors.
2024-11-08 16:05:17 +08:00
Yishuo Wang
51f7f87768
fix ipex 2.3 bug (#12366) 2024-11-08 13:29:15 +08:00
Yina Chen
b2e69a896c
[NPU] Support Baichuan groupwise & gw code refactor (#12337)
* support minicpm 1b & qwen 1.5b gw

* support minicpm 1b

* baichuan part

* update

* support minicpm 1b & qwen 1.5b gw

* support minicpm 1b

* baichuan part

* update

* update

* update

* baichuan support

* code refactor

* remove code

* fix style

* address comments

* revert
2024-11-08 11:42:42 +08:00
binbin Deng
812d5cc32e
[NPU L0] Support llama3.2 in L0 pipeline (#12361) 2024-11-08 10:01:23 +08:00
Yuwen Hu
8fe294e01f
Small fix to all-in-one benchmark (#12362) 2024-11-07 18:56:34 +08:00
Yuwen Hu
1a6cbc473f
Add fused mlp optimizations to glm4 models (#12360)
* Add fused mlp to glm4 models

* Small fix
2024-11-07 18:52:47 +08:00
Yishuo Wang
ad68c56573
small improvement (#12359) 2024-11-07 15:57:41 +08:00
Yina Chen
d880e534d2
[NPU] acclib llama3.2 support groupwise (#12355)
* change inter_pp

* add comment
2024-11-07 11:19:55 +08:00