Xu, Shuo
b0338c5529
Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internvl2 ( #12583 )
...
* Add --modelscope option for glm-v4 and MiniCPM-V-2_6
* glm-edge
* minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-20 13:54:17 +08:00
Xu, Shuo
47da3c999f
Add --modelscope in GPU examples for minicpm, minicpm3, baichuan2 ( #12564 )
...
* Add --modelscope for more models
* minicpm
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-19 17:25:46 +08:00
Xu, Shuo
47e90a362f
Add --modelscope in GPU examples for glm4, codegeex2, qwen2 and qwen2.5 ( #12561 )
...
* Add --modelscope for more models
* imporve readme
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-19 10:00:39 +08:00
binbin Deng
680ea7e4a8
[NPU doc] Update configuration for different platforms ( #12554 )
2024-12-17 10:15:09 +08:00
Xu, Shuo
ccc18eefb5
Add Modelscope option for chatglm3 on GPU ( #12545 )
...
* Add Modelscope option for GPU model chatglm3
* Update readme
* Update readme
* Update readme
* Update readme
* format update
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-16 20:00:37 +08:00
Chu,Youcheng
a86487c539
Add GLM-Edge GPU example ( #12483 )
...
* feat: initial commit
* generate.py and README updates
* Update link for main readme
* Update based on comments
* Small fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-12-16 14:39:19 +08:00
Jun Wang
0b953e61ef
[REFINE] graphmode code ( #12540 )
2024-12-16 09:17:01 +08:00
binbin Deng
caf15cc5ef
[NPU] Add IPEX_LLM_NPU_MTL to enable support on mtl ( #12543 )
2024-12-13 17:01:13 +08:00
binbin Deng
d20a968ce2
[NPU] Fix generate example ( #12541 )
2024-12-13 14:07:24 +08:00
Heyang Sun
fa261b8af1
torch 2.3 inference docker ( #12517 )
...
* torch 2.3 inference docker
* Update README.md
* add convert code
* rename image
* remove 2.1 and add graph example
* Update README.md
2024-12-13 10:47:04 +08:00
Yuwen Hu
dbaf4abcb3
[NPU] Update C++ example with repetition_penalty & update Python code accordingly ( #12528 )
...
* Update c++ npu examples with repetition penalty
* Fit python with updated C++ API
* Style fix
* Small fix
* Small fix
2024-12-12 13:42:55 +08:00
binbin Deng
6fc27da9c1
[NPU] Update glm-edge support in docs ( #12529 )
2024-12-12 11:14:09 +08:00
binbin Deng
ea55235cbd
[NPU] Support glm-edge models ( #12511 )
2024-12-09 14:06:27 +08:00
binbin Deng
12c78978dd
[NPU C++] Update example with conversation mode support ( #12510 )
2024-12-06 12:46:37 +08:00
Jinhe
5e1416c9aa
fix readme for npu cpp examples and llama.cpp ( #12505 )
...
* fix cpp readme
* fix cpp readme
* fix cpp readme
2024-12-05 12:32:42 +08:00
Chu,Youcheng
ffa9a9e1b3
Update streaming in npu examples ( #12495 )
...
* feat: add streaming
* Update readme accordingly
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-12-04 17:51:10 +08:00
Yuwen Hu
ef4028ac2d
[NPU] Support split lm_head for Qwen2 with CPP ( #12491 )
...
* Use split for Qwen2 lm_head instead of slice in optimize_pre
* Support split lm_head in Qwen2 python cpp backend
* Fit with Python acc lib pipeline
* Removed default mixed_precision=True in all-in-one and related examples
* Small fix
* Style fix
* Fix based on comments
* Fix based on comments
* Stype fix
2024-12-04 14:41:08 +08:00
Jin, Qiao
7082844f3f
Fix NPU LLM example save/load tokenizer ( #12485 )
2024-12-03 16:30:55 +08:00
binbin Deng
ab01753b1c
[NPU] update save-load API usage ( #12473 )
2024-12-03 09:46:15 +08:00
Yuwen Hu
aee9acb303
Add NPU QuickStart & update example links ( #12470 )
...
* Add initial NPU quickstart (c++ part unfinished)
* Small update
* Update based on comments
* Update main readme
* Remove LLaMA description
* Small fix
* Small fix
* Remove subsection link in main README
* Small fix
* Update based on comments
* Small fix
* TOC update and other small fixes
* Update for Chinese main readme
* Update based on comments and other small fixes
* Change order
2024-12-02 17:03:10 +08:00
binbin Deng
c911026f03
[NPU C++] Update model support & examples & benchmark ( #12466 )
2024-11-29 13:35:58 +08:00
binbin Deng
14d8d3d8af
Integrate NPU C++ imple into ipex-llm ( #12461 )
2024-11-29 09:25:37 +08:00
Heyang Sun
d272f6b471
remove nf4 unsupport comment in cpu finetuning ( #12460 )
...
Co-authored-by: Ariadne <wyn2000330@126.com>
2024-11-28 13:26:46 +08:00
Chu,Youcheng
ce6fcaa9ba
update transformers version in example of glm4 ( #12453 )
...
* fix: update transformers version in example of glm4
* fix: textual adjustments
* fix: texual adjustment
2024-11-27 15:02:25 +08:00
Yuwen Hu
effb9bb41c
Small update to LangChain examples readme ( #12452 )
2024-11-27 14:02:25 +08:00
Chu,Youcheng
acd77d9e87
Remove env variable BIGDL_LLM_XMX_DISABLED in documentation ( #12445 )
...
* fix: remove BIGDL_LLM_XMX_DISABLED in mddocs
* fix: remove set SYCL_CACHE_PERSISTENT=1 in example
* fix: remove BIGDL_LLM_XMX_DISABLED in workflows
* fix: merge igpu and A-series Graphics
* fix: remove set BIGDL_LLM_XMX_DISABLED=1 in example
* fix: remove BIGDL_LLM_XMX_DISABLED in workflows
* fix: merge igpu and A-series Graphics
* fix: textual adjustment
* fix: textual adjustment
* fix: textual adjustment
2024-11-27 11:16:36 +08:00
Ruonan Wang
f8c2bb2943
[NPU] optimize qwen2 prefill performance for C++ ( #12451 )
2024-11-27 10:46:18 +08:00
Jin, Qiao
c2efa264d9
Update LangChain examples to use upstream ( #12388 )
...
* Update LangChain examples to use upstream
* Update README and fix links
* Update LangChain CPU examples to use upstream
* Update LangChain CPU voice_assistant example
* Update CPU README
* Update GPU README
* Remove GPU Langchain vLLM example and fix comments
* Change langchain -> LangChain
* Add reference for both upstream llms and embeddings
* Fix comments
* Fix comments
* Fix comments
* Fix comments
* Fix comment
2024-11-26 16:43:15 +08:00
Jinhe
66bd7abae4
add sdxl and lora-lcm optimization ( #12444 )
...
* add sdxl and lora-lcm optimization
* fix openjourney speed drop
2024-11-26 11:38:09 +08:00
Ruonan Wang
0e23bd779f
Add support of llama3.2 for NPU C++ ( #12442 )
...
* initial support of llama3.2
* update
* update
* fix style
* fix style
* fix
* small fix
2024-11-26 09:26:55 +08:00
Ruonan Wang
b9abb8a285
Support qwen2.5 3B for NPU & update related examples ( #12438 )
...
* update qwen2.5-3B
* update convert
* small fix
* replace load_in_low_bit with low_bit
* small fix
2024-11-25 16:38:31 +08:00
Jinhe
b633fbf26c
add chinese prompt troubleshooting for npu cpp examples ( #12437 )
...
* add chinese prompt troubleshooting
* add chinese prompt troubleshooting
2024-11-25 15:28:47 +08:00
Ruonan Wang
f41405368a
Support minicpm for NPU C++ ( #12434 )
...
* support minicpm-1b
* update
* tune fused_layers
* update readme.md
2024-11-25 10:42:02 +08:00
Ruonan Wang
0819fad34e
support Llama2-7B / Llama3-8B for NPU C++ ( #12431 )
...
* support llama2
* update
* support fused_layers=4 for Llama2-7B
2024-11-22 18:47:19 +08:00
Ruonan Wang
4ffa6c752c
New convert support for C++ NPU ( #12430 )
...
* initial commit
* fix
* fix style
* fix style
* fix
* fix
2024-11-22 14:28:30 +08:00
Ruonan Wang
2935e97610
small fix of cpp readme( #12425 )
2024-11-21 18:21:34 +08:00
Jinhe
7e0a840f74
add optimization to openjourney ( #12423 )
...
* add optimization to openjourney
* add optimization to openjourney
2024-11-21 15:23:51 +08:00
Ruonan Wang
7288c759ce
Initial NPU C++ Example ( #12417 )
...
* temp save
* meet review, update
* update
* meet review, add license
* typo
2024-11-21 10:09:26 +08:00
Jinhe
d2a37b6ab2
add Stable diffusion examples ( #12418 )
...
* add openjourney example
* add timing
* add stable diffusion to model page
* 4.1 fix
* small fix
2024-11-20 17:18:36 +08:00
SONG Ge
ff3f7cb25f
Fix speech_paraformer issue with unexpected changes ( #12416 )
...
* Fix speech_paraformer issue with unexpected changes
* Add paraformer version specified
2024-11-19 15:01:20 +08:00
Qiyuan Gong
7e50ff113c
Add padding_token=eos_token for GPU trl QLora example ( #12398 )
...
* Avoid tokenizer doesn't have a padding token error.
2024-11-14 10:51:30 +08:00
SONG Ge
d2cbcb060c
Add initial support for modeling_xlm encoder on NPU ( #12393 )
...
* Add initial support for modeling_xlm encoder on NPU
* Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert
* Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU
* Add related example and documents
2024-11-14 10:50:27 +08:00
Guancheng Fu
0ee54fc55f
Upgrade to vllm 0.6.2 ( #12338 )
...
* Initial updates for vllm 0.6.2
* fix
* Change Dockerfile to support v062
* Fix
* fix examples
* Fix
* done
* fix
* Update engine.py
* Fix Dockerfile to original path
* fix
* add option
* fix
* fix
* fix
* fix
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-11-12 20:35:34 +08:00
binbin Deng
7a97fbb779
Support vpm and resampler module of minicpm-v on NPU ( #12375 )
2024-11-12 15:59:55 +08:00
Qiyuan Gong
2dfcc36825
Fix trl version and padding in trl qlora example ( #12368 )
...
* Change trl to 0.9.6
* Enable padding to avoid padding related errors.
2024-11-08 16:05:17 +08:00
Yina Chen
b2e69a896c
[NPU] Support Baichuan groupwise & gw code refactor ( #12337 )
...
* support minicpm 1b & qwen 1.5b gw
* support minicpm 1b
* baichuan part
* update
* support minicpm 1b & qwen 1.5b gw
* support minicpm 1b
* baichuan part
* update
* update
* update
* baichuan support
* code refactor
* remove code
* fix style
* address comments
* revert
2024-11-08 11:42:42 +08:00
binbin Deng
812d5cc32e
[NPU L0] Support llama3.2 in L0 pipeline ( #12361 )
2024-11-08 10:01:23 +08:00
SONG Ge
a7b66683f1
[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU ( #12339 )
...
* Add initial support for llama3.2-1b/3b
* move llama3.2 support into current llama_mp impl
2024-11-06 19:21:40 +08:00
Yina Chen
d872639395
[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support ( #12327 )
...
* support minicpm 1b & qwen 1.5b gw
* support minicpm 1b
* support minicpm 2b
* fix style & error
* fix style & update
* remove print
2024-11-05 15:51:31 +08:00
Jin, Qiao
82a61b5cf3
Limit trl version in example ( #12332 )
...
* Limit trl version in example
* Limit trl version in example
2024-11-05 14:50:10 +08:00
Kai Huang
c8679ad592
Qwen layernorm as input ( #12309 )
...
* qwen layernorm as input
* add group size
2024-11-04 09:51:15 +08:00
Zijie Li
cd5e22cee5
Update Llava GPU Example ( #12311 )
...
* update-llava-example
* add warmup
* small fix on llava example
* remove space& extra print prompt
* renew example
* small fix
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-11-01 17:06:00 +08:00
binbin Deng
d409d9d0eb
[NPU L0] Update streaming mode of example ( #12312 )
2024-11-01 15:38:10 +08:00
Jin, Qiao
126f95be80
Fix DPO finetuning example ( #12313 )
2024-11-01 13:29:44 +08:00
binbin Deng
eda764909c
Add minicpm-2b in L0 pipeline ( #12308 )
2024-11-01 09:30:01 +08:00
Jin, Qiao
3df6195cb0
Fix application quickstart ( #12305 )
...
* fix graphrag quickstart
* fix axolotl quickstart
* fix ragflow quickstart
* fix ragflow quickstart
* fix graphrag toc
* fix comments
* fix comment
* fix comments
2024-10-31 16:57:35 +08:00
binbin Deng
4892df61c9
Add qwen2-1.5b in l0 pipeline example ( #12306 )
2024-10-31 16:44:25 +08:00
Jinhe
30f668c206
updated transformers & accelerate requirements ( #12301 )
2024-10-31 15:59:40 +08:00
Kai Huang
416c19165c
Add Qwen pipeline and example ( #12292 )
...
* support qwen pipeline
* update error msg
* style
* meet review
* minor
2024-10-31 11:25:25 +08:00
Rahul Nair
4cf1ccc43a
Update DPO EADME.md ( #12162 )
...
bitsanbytes multi backend is now available and is required , otherwise would error out saying that no cuda is available
2024-10-31 10:56:46 +08:00
Chu,Youcheng
29400e2e75
feat: change oneccl to internal ( #12296 )
...
* feat: change oneccl
* fix: restore llama-70b
* fix: remove tab
* fix: remove extra blank
* small fix
* add comments
* fix: add a blank space
2024-10-31 09:51:43 +08:00
Zijie Li
6f22133efc
Update AWQ and GPTQ GPU example ( #12300 )
2024-10-31 09:35:31 +08:00
binbin Deng
41b8064554
Support minicpm-1B in level0 pipeline ( #12297 )
2024-10-30 17:21:47 +08:00
Jinhe
46d8300f6b
bugfix for qlora finetuning on GPU ( #12298 )
...
* bugfix for qlora 100 step error
* indent fix
* annotation fix
2024-10-30 16:54:10 +08:00
Ruonan Wang
2b2cb9c693
[NPU pipeline] Support save & load and update examples ( #12293 )
...
* support save & load, update llama examples
* update baichuan2 example
* update readme
2024-10-30 10:02:00 +08:00
binbin Deng
3feb58d1e4
Support baichuan2 for level0 pipeline ( #12289 )
2024-10-29 19:24:16 +08:00
Yina Chen
4467645088
[NPU] Support l0 Llama groupwise ( #12276 )
...
* except lm_head
* remove
* support gw lm_head
* update
* fix
* remove run.bat
* fix style
* support llama3
2024-10-28 17:06:55 +08:00
Ruonan Wang
3fe2ea3081
[NPU] Reuse prefill of acc lib for pipeline ( #12279 )
...
* first commit
* update example
* fix style
* update example
* embedding as const
* fix generate
* code refactor
* meet code review
* fix style
* change max_output_len to max_context_len
* fix all-in-one
* fix example
* add check for new tokens
2024-10-28 16:05:49 +08:00
binbin Deng
ec362e6133
Add llama3 level0 example ( #12275 )
2024-10-28 09:24:51 +08:00
SONG Ge
a0c6432899
[NPU] Add support for loading a FunASR model ( #12073 )
...
* add support for loading funasr model
* add initial support for paraformer-encoder
* add npu ops impl
* add encoder-decoder npu pipeline
* move paraformer encoders prefix 30 layers to npu and keep the rest layers on cpu
2024-10-25 17:22:01 +08:00
Ruonan Wang
854398f6e0
update example to reduce peak memory usage ( #12274 )
2024-10-25 17:09:26 +08:00
Ruonan Wang
ae57e23e4f
fix incompatibility between llama GW & llama pipeline ( #12267 )
...
* fix
* fix
2024-10-25 10:31:44 +08:00
Ruonan Wang
821fd96367
Initial integrate our L0 Llama impl into ipex-llm ( #12255 )
...
* temp save
* initial support
* fix
* simplify code
* fix style
* fix example
* make default value of pipeline as False
2024-10-24 09:49:27 +08:00
Jin, Qiao
8fa98e2742
Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" ( #12245 )
...
* Remove qwen2-7b from npu example readme
* fix
2024-10-22 17:07:51 +08:00
Yishuo Wang
9ea694484d
refactor ot remove old rope usage ( #12224 )
2024-10-17 17:06:09 +08:00
Jiao Wang
667f0db466
Update Eagle example to Eagle2+ipex-llm integration ( #11717 )
...
* update to e2 example
* update
* update
2024-10-16 23:16:14 -07:00
Jinhe
f983f1a8f4
Add Qwen2-VL gpu example ( #12135 )
...
* qwen2-vl readme
* add qwen2-vl example
* fix
* fix
* fix
* add link
* Update regarding modules_to_not_convert and readme
* Further fix
* Small fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-10-11 18:25:23 +08:00
Ruonan Wang
4d93bb81fe
Initial support of NPU level0 Model ( #12177 )
...
* first commit to support load dll and init llm pipeline
* add init generate
* fix style
* small updates
* fix style and check tokens number
2024-10-11 09:45:53 +08:00
Zijie Li
3d044dbf53
add llama3.2-vision Pytorch example ( #12165 )
2024-10-09 09:20:42 +08:00
Ch1y0q
17c23cd759
add llama3.2 GPU example ( #12137 )
...
* add llama3.2 GPU example
* change prompt format reference url
* update
* add Meta-Llama-3.2-1B-Instruct sample output
* update wording
2024-09-29 14:41:54 +08:00
Yuwen Hu
f71b38a994
Update MiniCPM_V_26 GPU example with save & load ( #12127 )
2024-09-26 17:40:22 +08:00
Ch1y0q
2ea13d502f
Add minicpm3 gpu example ( #12114 )
...
* add minicpm3 gpu example
* update GPU example
* update
---------
Co-authored-by: Huang, Xinshengzi <xinshengzi.huang@intel.com>
2024-09-26 13:51:37 +08:00
Jin, Qiao
2bedb17be7
Add Qwen2.5 NPU Example ( #12110 )
...
* Add Qwen2.5 NPU Example
* fix
* Merge qwen2.py and qwen2.5.py into qwen.py
* Fix description
2024-09-25 15:20:03 +08:00
Yuwen Hu
828fa01ad3
[NPU] Add mixed_precision for Qwen2 7B ( #12098 )
...
* Add mix_precision argument to control whether use INT8 lm_head for Qwen2-7B-Instruct
* Small fix
* Fixed on load low bit with mixed precision
* Small fix
* Update example accordingly
* Update for default prompt
* Update base on comments
* Final fix
2024-09-20 16:36:21 +08:00
Ch1y0q
2269768e71
add internvl2 example ( #12102 )
...
* add internvl2 example
* add to README.md
* update
* add link to zh-CN readme
2024-09-20 16:31:54 +08:00
Jin, Qiao
db7500bfd4
Add Qwen2.5 GPU example ( #12101 )
...
* Add Qwen2.5 GPU example
* fix end line
* fix description
2024-09-20 15:55:57 +08:00
Ch1y0q
b4b8c3e495
add lowbit_path for generate.py, fix npu_model ( #12077 )
...
* add `lowbit_path` for `generate.py`, fix `npu_model`
* update `README.md`
2024-09-13 17:28:05 +08:00
Wang, Jian4
d703e4f127
Enable vllm multimodal minicpm-v-2-6 ( #12074 )
...
* enable minicpm-v-2-6
* add image_url readme
2024-09-13 13:28:35 +08:00
Jinhe
e78e45ee01
update NPU readme: run conhost as administrator ( #12066 )
2024-09-11 17:54:04 +08:00
Jinhe
4ca330da15
Fix NPU load error message and add minicpm npu lowbit feat ( #12064 )
...
* fix npu_model raise sym_int4 error
* add load_lowbit
* remove print&perf
2024-09-11 16:56:35 +08:00
Jinhe
32e8362da7
added minicpm cpu examples ( #12027 )
...
* minicpm cpu examples
* add link for minicpm-2
2024-09-11 15:51:21 +08:00
Zijie Li
c5fdfde1bd
fix npu-model prompt ( #12057 )
2024-09-11 10:06:45 +08:00
Ch1y0q
73a4360f3f
update lowbit path for baichuan2, qwen2, generate.py ( #12051 )
...
* update lowbit path for baichuan2, qwen2, `generate.py`
* update readme
2024-09-10 15:35:24 +08:00
Yuwen Hu
f61b1785fb
Small update to NPU example readme ( #12034 )
...
* Small update to NPU example readme
* Small fix
2024-09-06 15:54:23 +08:00
Ruonan Wang
0d04531ae0
update NPU readme of Qwen2 ( #12032 )
...
* update readme
* update broadcast
2024-09-06 15:02:39 +08:00
binbin Deng
5b18bb3c4a
Add recommend version for mtl npu ( #12024 )
2024-09-05 16:28:53 +08:00
Ch1y0q
820f8a4554
add --lowbit-path option for NPU llama example ( #12020 )
...
* add option" `--lowbit-path`
* add descriptions in `README.md` and formatting
* Update llama.py
2024-09-05 15:31:01 +08:00
Wang, Jian4
b3b2cd64b4
Support lightweight-serving glm-4v-9b ( #11994 )
...
* enable glm-4v-9b serving
* update readme
* update for no image input
2024-09-05 09:25:08 +08:00
Jinhe
164f47adbd
MiniCPM-V-2 & MiniCPM-Llama3-V-2_5 example updates ( #11988 )
...
* minicpm example updates
* --stream
2024-09-03 17:02:06 +08:00
Jin, Qiao
2e54f4402b
Rename MiniCPM-V-2_6 CPU example ( #11998 )
2024-09-03 16:50:42 +08:00