Commit graph

614 commits

Author SHA1 Message Date
Jin, Qiao
82a61b5cf3
Limit trl version in example (#12332)
* Limit trl version in example

* Limit trl version in example
2024-11-05 14:50:10 +08:00
Kai Huang
c8679ad592
Qwen layernorm as input (#12309)
* qwen layernorm as input

* add group size
2024-11-04 09:51:15 +08:00
Zijie Li
cd5e22cee5
Update Llava GPU Example (#12311)
* update-llava-example

* add warmup

* small fix on llava example

* remove space& extra print prompt

* renew example

* small fix

---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-11-01 17:06:00 +08:00
binbin Deng
d409d9d0eb
[NPU L0] Update streaming mode of example (#12312) 2024-11-01 15:38:10 +08:00
Jin, Qiao
126f95be80
Fix DPO finetuning example (#12313) 2024-11-01 13:29:44 +08:00
binbin Deng
eda764909c
Add minicpm-2b in L0 pipeline (#12308) 2024-11-01 09:30:01 +08:00
Jin, Qiao
3df6195cb0
Fix application quickstart (#12305)
* fix graphrag quickstart

* fix axolotl quickstart

* fix ragflow quickstart

* fix ragflow quickstart

* fix graphrag toc

* fix comments

* fix comment

* fix comments
2024-10-31 16:57:35 +08:00
binbin Deng
4892df61c9
Add qwen2-1.5b in l0 pipeline example (#12306) 2024-10-31 16:44:25 +08:00
Jinhe
30f668c206
updated transformers & accelerate requirements (#12301) 2024-10-31 15:59:40 +08:00
Kai Huang
416c19165c
Add Qwen pipeline and example (#12292)
* support qwen pipeline

* update error msg

* style

* meet review

* minor
2024-10-31 11:25:25 +08:00
Rahul Nair
4cf1ccc43a
Update DPO EADME.md (#12162)
bitsanbytes multi backend is now available and is required , otherwise would error out saying that no cuda is available
2024-10-31 10:56:46 +08:00
Chu,Youcheng
29400e2e75
feat: change oneccl to internal (#12296)
* feat: change oneccl

* fix: restore llama-70b

* fix: remove tab

* fix: remove extra blank

* small fix

* add comments

* fix: add a blank space
2024-10-31 09:51:43 +08:00
Zijie Li
6f22133efc
Update AWQ and GPTQ GPU example (#12300) 2024-10-31 09:35:31 +08:00
binbin Deng
41b8064554
Support minicpm-1B in level0 pipeline (#12297) 2024-10-30 17:21:47 +08:00
Jinhe
46d8300f6b
bugfix for qlora finetuning on GPU (#12298)
* bugfix for qlora 100 step error

* indent fix

* annotation fix
2024-10-30 16:54:10 +08:00
Ruonan Wang
2b2cb9c693
[NPU pipeline] Support save & load and update examples (#12293)
* support save & load, update llama examples

* update baichuan2 example

* update readme
2024-10-30 10:02:00 +08:00
binbin Deng
3feb58d1e4
Support baichuan2 for level0 pipeline (#12289) 2024-10-29 19:24:16 +08:00
Yina Chen
4467645088
[NPU] Support l0 Llama groupwise (#12276)
* except lm_head

* remove

* support gw lm_head

* update

* fix

* remove run.bat

* fix style

* support llama3
2024-10-28 17:06:55 +08:00
Ruonan Wang
3fe2ea3081
[NPU] Reuse prefill of acc lib for pipeline (#12279)
* first commit

* update example

* fix style

* update example

* embedding as const

* fix generate

* code  refactor

* meet code review

* fix style

* change max_output_len to max_context_len

* fix all-in-one

* fix example

* add check for new tokens
2024-10-28 16:05:49 +08:00
binbin Deng
ec362e6133
Add llama3 level0 example (#12275) 2024-10-28 09:24:51 +08:00
SONG Ge
a0c6432899
[NPU] Add support for loading a FunASR model (#12073)
* add support for loading funasr model

* add initial support for paraformer-encoder

* add npu ops impl

* add encoder-decoder npu pipeline

* move paraformer encoders prefix 30 layers  to npu and keep the rest layers on cpu
2024-10-25 17:22:01 +08:00
Ruonan Wang
854398f6e0
update example to reduce peak memory usage (#12274) 2024-10-25 17:09:26 +08:00
Ruonan Wang
ae57e23e4f
fix incompatibility between llama GW & llama pipeline (#12267)
* fix

* fix
2024-10-25 10:31:44 +08:00
Ruonan Wang
821fd96367
Initial integrate our L0 Llama impl into ipex-llm (#12255)
* temp save

* initial support

* fix

* simplify code

* fix style

* fix example

* make default value of pipeline as False
2024-10-24 09:49:27 +08:00
Jin, Qiao
8fa98e2742
Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" (#12245)
* Remove qwen2-7b from npu example readme

* fix
2024-10-22 17:07:51 +08:00
Yishuo Wang
9ea694484d
refactor ot remove old rope usage (#12224) 2024-10-17 17:06:09 +08:00
Jiao Wang
667f0db466
Update Eagle example to Eagle2+ipex-llm integration (#11717)
* update to e2 example

* update

* update
2024-10-16 23:16:14 -07:00
Jinhe
f983f1a8f4
Add Qwen2-VL gpu example (#12135)
* qwen2-vl readme

* add qwen2-vl example

* fix

* fix

* fix

* add link

* Update regarding modules_to_not_convert and readme

* Further fix

* Small fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-10-11 18:25:23 +08:00
Ruonan Wang
4d93bb81fe
Initial support of NPU level0 Model (#12177)
* first commit to support load dll and init llm pipeline

* add init generate

* fix style

* small updates

* fix style and check tokens number
2024-10-11 09:45:53 +08:00
Zijie Li
3d044dbf53
add llama3.2-vision Pytorch example (#12165) 2024-10-09 09:20:42 +08:00
Ch1y0q
17c23cd759
add llama3.2 GPU example (#12137)
* add llama3.2 GPU example

* change prompt format reference url

* update

* add Meta-Llama-3.2-1B-Instruct sample output

* update wording
2024-09-29 14:41:54 +08:00
Yuwen Hu
f71b38a994
Update MiniCPM_V_26 GPU example with save & load (#12127) 2024-09-26 17:40:22 +08:00
Ch1y0q
2ea13d502f
Add minicpm3 gpu example (#12114)
* add minicpm3 gpu example

* update GPU example

* update

---------

Co-authored-by: Huang, Xinshengzi <xinshengzi.huang@intel.com>
2024-09-26 13:51:37 +08:00
Jin, Qiao
2bedb17be7
Add Qwen2.5 NPU Example (#12110)
* Add Qwen2.5 NPU Example

* fix

* Merge qwen2.py and qwen2.5.py into qwen.py

* Fix description
2024-09-25 15:20:03 +08:00
Yuwen Hu
828fa01ad3
[NPU] Add mixed_precision for Qwen2 7B (#12098)
* Add mix_precision argument to control whether use INT8 lm_head for Qwen2-7B-Instruct

* Small fix

* Fixed on load low bit with mixed precision

* Small fix

* Update example accordingly

* Update for default prompt

* Update base on comments

* Final fix
2024-09-20 16:36:21 +08:00
Ch1y0q
2269768e71
add internvl2 example (#12102)
* add internvl2 example

* add to README.md

* update

* add link to zh-CN readme
2024-09-20 16:31:54 +08:00
Jin, Qiao
db7500bfd4
Add Qwen2.5 GPU example (#12101)
* Add Qwen2.5 GPU example

* fix end line

* fix description
2024-09-20 15:55:57 +08:00
Ch1y0q
b4b8c3e495
add lowbit_path for generate.py, fix npu_model (#12077)
* add `lowbit_path` for `generate.py`, fix `npu_model`

* update `README.md`
2024-09-13 17:28:05 +08:00
Wang, Jian4
d703e4f127
Enable vllm multimodal minicpm-v-2-6 (#12074)
* enable minicpm-v-2-6

* add image_url readme
2024-09-13 13:28:35 +08:00
Jinhe
e78e45ee01
update NPU readme: run conhost as administrator (#12066) 2024-09-11 17:54:04 +08:00
Jinhe
4ca330da15
Fix NPU load error message and add minicpm npu lowbit feat (#12064)
* fix npu_model raise sym_int4 error

* add load_lowbit

* remove print&perf
2024-09-11 16:56:35 +08:00
Jinhe
32e8362da7
added minicpm cpu examples (#12027)
* minicpm cpu examples

* add link for minicpm-2
2024-09-11 15:51:21 +08:00
Zijie Li
c5fdfde1bd
fix npu-model prompt (#12057) 2024-09-11 10:06:45 +08:00
Ch1y0q
73a4360f3f
update lowbit path for baichuan2, qwen2, generate.py (#12051)
* update lowbit path for baichuan2, qwen2, `generate.py`

* update readme
2024-09-10 15:35:24 +08:00
Yuwen Hu
f61b1785fb
Small update to NPU example readme (#12034)
* Small update to NPU example readme

* Small fix
2024-09-06 15:54:23 +08:00
Ruonan Wang
0d04531ae0
update NPU readme of Qwen2 (#12032)
* update readme

* update broadcast
2024-09-06 15:02:39 +08:00
binbin Deng
5b18bb3c4a
Add recommend version for mtl npu (#12024) 2024-09-05 16:28:53 +08:00
Ch1y0q
820f8a4554
add --lowbit-path option for NPU llama example (#12020)
* add option" `--lowbit-path`

* add descriptions in `README.md` and formatting

* Update llama.py
2024-09-05 15:31:01 +08:00
Wang, Jian4
b3b2cd64b4
Support lightweight-serving glm-4v-9b (#11994)
* enable glm-4v-9b serving

* update readme

* update for no image input
2024-09-05 09:25:08 +08:00
Jinhe
164f47adbd
MiniCPM-V-2 & MiniCPM-Llama3-V-2_5 example updates (#11988)
* minicpm example updates

* --stream
2024-09-03 17:02:06 +08:00