Commit graph

2052 commits

Author SHA1 Message Date
Kai Huang
b89ea1b0cf
Support save/load model for hf generate (#12499)
* change dummy model

* style

* meet review
2024-12-04 18:26:39 +08:00
Kai Huang
7d27f134dd
Fix hf generate for llama3.2 (#12497)
* fix kv condition]

* meet review
2024-12-04 17:54:40 +08:00
Chu,Youcheng
ffa9a9e1b3
Update streaming in npu examples (#12495)
* feat: add streaming

* Update readme accordingly

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-12-04 17:51:10 +08:00
Yishuo Wang
a9e3f7f14c
optimize minicpm (#12496) 2024-12-04 17:14:16 +08:00
Yishuo Wang
e0bf0054e1
small fix (#12493) 2024-12-04 16:37:39 +08:00
Kai Huang
7ff4533b39
Support hf generate (#12477)
* generate

* style

* update

* remove timing

* style

* style

* combine generate api

* simple in kwargs
2024-12-04 16:31:09 +08:00
Yuwen Hu
ef4028ac2d
[NPU] Support split lm_head for Qwen2 with CPP (#12491)
* Use split for Qwen2 lm_head instead of slice in optimize_pre

* Support split lm_head in Qwen2 python cpp backend

* Fit with Python acc lib pipeline

* Removed default mixed_precision=True in all-in-one and related examples

* Small fix

* Style fix

* Fix based on comments

* Fix based on comments

* Stype fix
2024-12-04 14:41:08 +08:00
Yishuo Wang
5629fdd518
optimize qwen2_vl multiple image input or video input (#12487) 2024-12-04 09:24:38 +08:00
binbin Deng
c59284418c
Hotfix of BCE-Emdedding model (#12490) 2024-12-03 18:16:04 +08:00
Yuwen Hu
4ac66db034
[NPU] Support streaming in Python (cpp backend) (#12488)
* Support streaming in NPU Python (cpp backend)

* Small fix
2024-12-03 17:17:26 +08:00
Jin, Qiao
7082844f3f
Fix NPU LLM example save/load tokenizer (#12485) 2024-12-03 16:30:55 +08:00
Jin, Qiao
5fe766788e
Fix MiniCPM-V-2_6 running on NPU (#12486) 2024-12-03 16:16:29 +08:00
Ruonan Wang
598603bea6
small fix of imatrix (#12480) 2024-12-03 10:46:36 +08:00
binbin Deng
ab01753b1c
[NPU] update save-load API usage (#12473) 2024-12-03 09:46:15 +08:00
Yuwen Hu
26adb82ee3
[NPU] Remove hard code (#12479) 2024-12-02 18:26:07 +08:00
Yuwen Hu
b2e56a2e03
Add release support for option xpu_arc (#12422)
* Add release support for xpu-arc

* Dependency update
2024-12-02 17:16:04 +08:00
Yuwen Hu
aee9acb303
Add NPU QuickStart & update example links (#12470)
* Add initial NPU quickstart (c++ part unfinished)

* Small update

* Update based on comments

* Update main readme

* Remove LLaMA description

* Small fix

* Small fix

* Remove subsection link in main README

* Small fix

* Update based on comments

* Small fix

* TOC update and other small fixes

* Update for Chinese main readme

* Update based on comments and other small fixes

* Change order
2024-12-02 17:03:10 +08:00
Jin, Qiao
31c69a8d31
Fix MiniCPM-V models running on NPU (#12478) 2024-12-02 16:29:46 +08:00
binbin Deng
54d9a590d4
[NPU]Fix eos_token setting (#12475) 2024-12-02 14:18:22 +08:00
Guancheng Fu
59bd4a214f
add vLLM glm4 fix (#12474) 2024-12-02 14:05:16 +08:00
Ruonan Wang
4b6c3160be
Support imatrix-guided quantization for NPU CW (#12468)
* init commit

* remove print

* add interface

* fix

* fix

* fix style
2024-12-02 11:31:26 +08:00
binbin Deng
f99f188023
Hotfix of benchmark script (#12467) 2024-11-29 14:00:59 +08:00
binbin Deng
c911026f03
[NPU C++] Update model support & examples & benchmark (#12466) 2024-11-29 13:35:58 +08:00
binbin Deng
14d8d3d8af
Integrate NPU C++ imple into ipex-llm (#12461) 2024-11-29 09:25:37 +08:00
Ruonan Wang
490bb0ca53
[NPU] update fused layers for GW (#12459)
* update fused layers for GW

* fix

* fix llama condition for glm model

* update
2024-11-28 17:14:30 +08:00
Yina Chen
1b533a105c
[NPU] Add env to enable scale search (#12462)
* add env enable scale search

* address comment

* move logic
2024-11-28 17:06:00 +08:00
Heyang Sun
d272f6b471
remove nf4 unsupport comment in cpu finetuning (#12460)
Co-authored-by: Ariadne <wyn2000330@126.com>
2024-11-28 13:26:46 +08:00
Ruonan Wang
b29da30205
[NPU] Update C++ L0 (#12458)
* update

* fix style
2024-11-27 22:08:48 +08:00
Yishuo Wang
6f3441ba4c
fix glm4-9b overflow (#12455) 2024-11-27 17:39:13 +08:00
Ruonan Wang
281c9b0bb9
[NPU] Add L0 support for NPU C++ (#12454)
* add L0 models support

* meet review

* fix style
2024-11-27 17:04:13 +08:00
Chu,Youcheng
ce6fcaa9ba
update transformers version in example of glm4 (#12453)
* fix: update transformers version in example of glm4

* fix: textual adjustments

* fix: texual adjustment
2024-11-27 15:02:25 +08:00
Yuwen Hu
effb9bb41c
Small update to LangChain examples readme (#12452) 2024-11-27 14:02:25 +08:00
Chu,Youcheng
acd77d9e87
Remove env variable BIGDL_LLM_XMX_DISABLED in documentation (#12445)
* fix: remove BIGDL_LLM_XMX_DISABLED in mddocs

* fix: remove set SYCL_CACHE_PERSISTENT=1 in example

* fix: remove BIGDL_LLM_XMX_DISABLED in workflows

* fix: merge igpu and A-series Graphics

* fix: remove set BIGDL_LLM_XMX_DISABLED=1 in example

* fix: remove BIGDL_LLM_XMX_DISABLED in workflows

* fix: merge igpu and A-series Graphics

* fix: textual adjustment

* fix: textual adjustment

* fix: textual adjustment
2024-11-27 11:16:36 +08:00
Ruonan Wang
f8c2bb2943
[NPU] optimize qwen2 prefill performance for C++ (#12451) 2024-11-27 10:46:18 +08:00
Ruonan Wang
7b40f9b372
[NPU] Support GW for NPU C++ (#12450) 2024-11-26 17:46:40 +08:00
Jin, Qiao
c2efa264d9
Update LangChain examples to use upstream (#12388)
* Update LangChain examples to use upstream

* Update README and fix links

* Update LangChain CPU examples to use upstream

* Update LangChain CPU voice_assistant example

* Update CPU README

* Update GPU README

* Remove GPU Langchain vLLM example and fix comments

* Change langchain -> LangChain

* Add reference for both upstream llms and embeddings

* Fix comments

* Fix comments

* Fix comments

* Fix comments

* Fix comment
2024-11-26 16:43:15 +08:00
Ruonan Wang
24b46b2b19
[NPU] further fix of qwen2 int8 pipeline & C++ (#12449)
* fix

* fix style
2024-11-26 16:39:39 +08:00
Yuwen Hu
303b104c10
Fix abnormal output for Qwen2-7B when sym_int8 (#12446) 2024-11-26 15:53:04 +08:00
Ruonan Wang
52c17fe104
Optimize first token of C++ NPU by adding npu_dpu_groups (#12443)
* add npu_dpu_groups

* add check for env

* fix style
2024-11-26 11:41:32 +08:00
Jinhe
66bd7abae4
add sdxl and lora-lcm optimization (#12444)
* add sdxl and lora-lcm optimization

* fix openjourney speed drop
2024-11-26 11:38:09 +08:00
Ruonan Wang
0e23bd779f
Add support of llama3.2 for NPU C++ (#12442)
* initial support of  llama3.2

* update

* update

* fix style

* fix style

* fix

* small fix
2024-11-26 09:26:55 +08:00
Yishuo Wang
cdd41f5e4c
optimize sdxl again (#12441) 2024-11-25 17:46:46 +08:00
Ruonan Wang
b9abb8a285
Support qwen2.5 3B for NPU & update related examples (#12438)
* update qwen2.5-3B

* update convert

* small fix

* replace load_in_low_bit with low_bit

* small fix
2024-11-25 16:38:31 +08:00
Jinhe
b633fbf26c
add chinese prompt troubleshooting for npu cpp examples (#12437)
* add chinese prompt troubleshooting

* add chinese prompt troubleshooting
2024-11-25 15:28:47 +08:00
Yishuo Wang
8164aed802
small change (#12439) 2024-11-25 14:35:49 +08:00
Yishuo Wang
be132c4209
fix and optimize sd (#12436) 2024-11-25 14:09:48 +08:00
Ruonan Wang
f41405368a
Support minicpm for NPU C++ (#12434)
* support minicpm-1b

* update

* tune fused_layers

* update readme.md
2024-11-25 10:42:02 +08:00
Ruonan Wang
0819fad34e
support Llama2-7B / Llama3-8B for NPU C++ (#12431)
* support llama2

* update

* support fused_layers=4 for Llama2-7B
2024-11-22 18:47:19 +08:00
Ruonan Wang
4ffa6c752c
New convert support for C++ NPU (#12430)
* initial commit

* fix

* fix style

* fix style

* fix

* fix
2024-11-22 14:28:30 +08:00
Yuwen Hu
e61ae88c5b
Upgrade denpendency for xpu_lnl and xpu_arl option (#12424) 2024-11-21 18:37:15 +08:00