ipex-llm

Author	SHA1	Message	Date
Yishuo Wang	145e8b480f	update batch kernel condition (#12421 )	2024-11-21 10:12:46 +08:00
Ruonan Wang	7288c759ce	Initial NPU C++ Example (#12417 ) * temp save * meet review, update * update * meet review, add license * typo	2024-11-21 10:09:26 +08:00
Jinhe	d2a37b6ab2	add Stable diffusion examples (#12418 ) * add openjourney example * add timing * add stable diffusion to model page * 4.1 fix * small fix	2024-11-20 17:18:36 +08:00
Ruonan Wang	54c62feb74	[NPU] dump prefill IR for further C++ solution (#12402 ) * save prefill ir * fix * shorten convert time * fix * fix * fix * fix * fix style * dump config.json * meet review * small fix	2024-11-20 15:20:05 +08:00
SONG Ge	ff3f7cb25f	Fix speech_paraformer issue with unexpected changes (#12416 ) * Fix speech_paraformer issue with unexpected changes * Add paraformer version specified	2024-11-19 15:01:20 +08:00
Yuwen Hu	a69395f31f	Support performance mode of GLM4 model (#12401 ) * Initial support of prepare generation args for transformers 445 * Small fix to chatglm4 model optimization * Small fix * fix glm4 position id * fix glm4 error * Small change in conditon & fix based on comments * Style fixes --------- Co-authored-by: cyita <yitastudy@gmail.com>	2024-11-18 18:46:52 +08:00
Song Fuchang	d2c821d458	Add missing arguments in pipeline parallel generate method (#12142 ) Add two arguments: negative_prompt_ids and negative_prompt_attention_mask to generate method in pipeline_parallel.py.	2024-11-18 13:50:18 +08:00
Yishuo Wang	3d5fbf2069	update batch kernel condition (#12408 )	2024-11-15 13:47:05 +08:00
binbin Deng	d4d949443f	[NPU] change attention_mask to fp16 (#12400 )	2024-11-14 17:20:29 +08:00
Qiyuan Gong	7e50ff113c	Add padding_token=eos_token for GPU trl QLora example (#12398 ) * Avoid tokenizer doesn't have a padding token error.	2024-11-14 10:51:30 +08:00
SONG Ge	d2cbcb060c	Add initial support for modeling_xlm encoder on NPU (#12393 ) * Add initial support for modeling_xlm encoder on NPU * Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert * Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU * Add related example and documents	2024-11-14 10:50:27 +08:00
Yina Chen	59b01fa7d2	small fix (#12397 )	2024-11-14 10:03:36 +08:00
Yishuo Wang	00fce5c940	use new q4_0 batch kernel (#12396 )	2024-11-13 18:37:34 +08:00
Yina Chen	d6d63d6b84	[NPU] Qwen prefill attn_mask type hotfix (#12395 ) * qwen prefill attn_mask type fp16 * update	2024-11-13 17:51:34 +08:00
Yina Chen	9220babaab	qwen prefill attn_mask type fp16 (#12394 )	2024-11-13 17:45:26 +08:00
Yuwen Hu	1158f91648	Fix llava with multi-image inputs (#12384 )	2024-11-13 09:27:50 +08:00
Guancheng Fu	0ee54fc55f	Upgrade to vllm 0.6.2 (#12338 ) * Initial updates for vllm 0.6.2 * fix * Change Dockerfile to support v062 * Fix * fix examples * Fix * done * fix * Update engine.py * Fix Dockerfile to original path * fix * add option * fix * fix * fix * fix --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com>	2024-11-12 20:35:34 +08:00
Ruonan Wang	6bf5a8c230	[NPU] Update qwen2 compile config (#12383 ) * update * fix	2024-11-12 16:59:44 +08:00
binbin Deng	7a97fbb779	Support vpm and resampler module of minicpm-v on NPU (#12375 )	2024-11-12 15:59:55 +08:00
Yuwen Hu	e0918934c8	Add fused_mlp to glm4v models (#12378 )	2024-11-11 17:10:25 +08:00
Yishuo Wang	dc34e8c51f	optimize glm4v vision attention (#12369 )	2024-11-08 17:01:57 +08:00
Qiyuan Gong	2dfcc36825	Fix trl version and padding in trl qlora example (#12368 ) * Change trl to 0.9.6 * Enable padding to avoid padding related errors.	2024-11-08 16:05:17 +08:00
Yishuo Wang	51f7f87768	fix ipex 2.3 bug (#12366 )	2024-11-08 13:29:15 +08:00
Yina Chen	b2e69a896c	[NPU] Support Baichuan groupwise & gw code refactor (#12337 ) * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * update * update * baichuan support * code refactor * remove code * fix style * address comments * revert	2024-11-08 11:42:42 +08:00
binbin Deng	812d5cc32e	[NPU L0] Support llama3.2 in L0 pipeline (#12361 )	2024-11-08 10:01:23 +08:00
Yuwen Hu	8fe294e01f	Small fix to all-in-one benchmark (#12362 )	2024-11-07 18:56:34 +08:00
Yuwen Hu	1a6cbc473f	Add fused mlp optimizations to glm4 models (#12360 ) * Add fused mlp to glm4 models * Small fix	2024-11-07 18:52:47 +08:00
Yishuo Wang	ad68c56573	small improvement (#12359 )	2024-11-07 15:57:41 +08:00
Yina Chen	d880e534d2	[NPU] acclib llama3.2 support groupwise (#12355 ) * change inter_pp * add comment	2024-11-07 11:19:55 +08:00
Jinhe	79f2877413	add minicpm-v models to `transformers_int4_npu_win` api (#12352 ) * add minicpm npu * optimize model	2024-11-07 10:05:10 +08:00
SONG Ge	a7b66683f1	[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339 ) * Add initial support for llama3.2-1b/3b * move llama3.2 support into current llama_mp impl	2024-11-06 19:21:40 +08:00
Yuwen Hu	872a74481a	Small optimization to glm4 models (#12351 )	2024-11-06 19:16:58 +08:00
Ruonan Wang	c267355b35	fix three NPU benchmark issues (#12350 ) * fix three issues * limit mixed_precision for CW only	2024-11-06 19:01:01 +08:00
Yina Chen	f24352aef9	llama 3.1/3.2 support compresskv (#12347 ) * llama 3.1/3.2 support compresskv * update * fix transformers 4.45 error * fix style * fix typo * disable llama3.2 1b compresskv	2024-11-06 17:33:43 +08:00
Jin, Qiao	d984c0672a	Add MiniCPM-V-2_6 to arc perf test (#12349 )	2024-11-06 16:32:28 +08:00
Yishuo Wang	e23ef7d088	optimize glm4v's vision part (#12346 )	2024-11-06 15:43:40 +08:00
Yishuo Wang	c8b7265359	Add basic glm4v support (#12345 )	2024-11-06 13:50:10 +08:00
binbin Deng	69e3a56943	[NPU] Hot fix of load_low_bit (#12344 )	2024-11-06 10:07:00 +08:00
Jin, Qiao	7240c283a3	Add dummy model in iGPU perf (#12341 ) * Add dummy model in iGPU perf * Add dummy model in iGPU perf * Fix	2024-11-05 17:56:10 +08:00
Zhao Changmin	8e9a3a1158	fix chatglm2 cpu ut (#12336 )	2024-11-05 16:43:57 +08:00
Yina Chen	d872639395	[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327 ) * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * support minicpm 2b * fix style & error * fix style & update * remove print	2024-11-05 15:51:31 +08:00
Jin, Qiao	82a61b5cf3	Limit trl version in example (#12332 ) * Limit trl version in example * Limit trl version in example	2024-11-05 14:50:10 +08:00
Zijie Li	45b0d371aa	update benchmark readme (#12323 ) * update benchmark readme update new comment with memory usage included * Update README.md	2024-11-05 08:19:08 +08:00
Zhao Changmin	1b637e4477	Add chatglm2&3 fuse mlp (#12328 ) * add chatglm fuse mlp	2024-11-04 18:04:41 +08:00
Yina Chen	94c4ce389f	[NPU] Add env to disable compile opt (#12330 ) * add env to disable compile opt * fix style * fix style	2024-11-04 17:46:17 +08:00
Ch1y0q	e54af44ed6	Add `transformers_int4_npu_pipeline_win` in all-in-one benchmark (#12325 ) * add transformers_int4_npu_pipeline_win * bugfix * bugfix: wrong actual_output_len * fix format * bugfix & update `README.md`	2024-11-04 16:00:20 +08:00
binbin Deng	5ee6f97d6f	[NPU L0] Add layernorm weight as const / input setting (#12322 )	2024-11-04 15:46:24 +08:00
Chu,Youcheng	a01371f90b	Doc: update harness readme (#12324 )	2024-11-04 14:58:54 +08:00
Kai Huang	c8679ad592	Qwen layernorm as input (#12309 ) * qwen layernorm as input * add group size	2024-11-04 09:51:15 +08:00
Yuwen Hu	20755e8077	Small fix to all-in-one benchmark scripts (#12317 )	2024-11-01 19:16:25 +08:00
Ch1y0q	48123af463	add `npu_group_size` for `transformers_int4_npu_win` in all-in-one benchmark api (#12316 ) * add `npu_group_size` for `transformers_int4_npu_win` small bugfix * update	2024-11-01 18:44:27 +08:00
Zijie Li	cd5e22cee5	Update Llava GPU Example (#12311 ) * update-llava-example * add warmup * small fix on llava example * remove space& extra print prompt * renew example * small fix --------- Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>	2024-11-01 17:06:00 +08:00
binbin Deng	f53bb4ea0b	[NPU L0] Update 1st token generation (#12314 )	2024-11-01 17:02:07 +08:00
binbin Deng	d409d9d0eb	[NPU L0] Update streaming mode of example (#12312 )	2024-11-01 15:38:10 +08:00
Jin, Qiao	126f95be80	Fix DPO finetuning example (#12313 )	2024-11-01 13:29:44 +08:00
Yina Chen	05c5d0267a	[NPU] Llama2 prefill use ov sdp (#12310 ) * prefill use sdp * add param * update * fix style * fix style * meet comments	2024-11-01 11:05:20 +08:00
binbin Deng	eda764909c	Add minicpm-2b in L0 pipeline (#12308 )	2024-11-01 09:30:01 +08:00
Yishuo Wang	b9853f98b3	fix qwen2 attention_mask slice (#12307 )	2024-10-31 17:00:05 +08:00
Jin, Qiao	3df6195cb0	Fix application quickstart (#12305 ) * fix graphrag quickstart * fix axolotl quickstart * fix ragflow quickstart * fix ragflow quickstart * fix graphrag toc * fix comments * fix comment * fix comments	2024-10-31 16:57:35 +08:00
binbin Deng	4892df61c9	Add qwen2-1.5b in l0 pipeline example (#12306 )	2024-10-31 16:44:25 +08:00
Jinhe	30f668c206	updated transformers & accelerate requirements (#12301 )	2024-10-31 15:59:40 +08:00
Xin Qiu	97a0f7fd35	Codegeex support (#12303 ) * new codegeex attn * use kv cache * add compress/quantize kv * remove compress/quantize kv * fix style check * fix style * fix codegeex	2024-10-31 15:28:56 +08:00
Yishuo Wang	72605c7016	fix llama3.1/3.2 quantize kv check (#12302 )	2024-10-31 11:55:07 +08:00
Kai Huang	416c19165c	Add Qwen pipeline and example (#12292 ) * support qwen pipeline * update error msg * style * meet review * minor	2024-10-31 11:25:25 +08:00
Rahul Nair	4cf1ccc43a	Update DPO EADME.md (#12162 ) bitsanbytes multi backend is now available and is required , otherwise would error out saying that no cuda is available	2024-10-31 10:56:46 +08:00
Chu,Youcheng	29400e2e75	feat: change oneccl to internal (#12296 ) * feat: change oneccl * fix: restore llama-70b * fix: remove tab * fix: remove extra blank * small fix * add comments * fix: add a blank space	2024-10-31 09:51:43 +08:00
Zijie Li	6f22133efc	Update AWQ and GPTQ GPU example (#12300 )	2024-10-31 09:35:31 +08:00
Yina Chen	0763268e4c	[NPU]Qwen2 groupwise performance opt (#12299 ) * qwen2 gw performance opt * remove debug	2024-10-30 17:40:21 +08:00
binbin Deng	41b8064554	Support minicpm-1B in level0 pipeline (#12297 )	2024-10-30 17:21:47 +08:00
Jinhe	46d8300f6b	bugfix for qlora finetuning on GPU (#12298 ) * bugfix for qlora 100 step error * indent fix * annotation fix	2024-10-30 16:54:10 +08:00
Yina Chen	70037ad55f	Groupwise prefill optimization (#12291 ) * except lm_head * remove * support gw lm_head * update * fix * remove run.bat * fix style * support llama3 * slice -> split * remove debug * fix style * add dpu	2024-10-30 14:59:45 +08:00
Yishuo Wang	540eaeb12c	refactor attention_softmax (#12295 )	2024-10-30 13:20:50 +08:00
Ruonan Wang	2b2cb9c693	[NPU pipeline] Support save & load and update examples (#12293 ) * support save & load, update llama examples * update baichuan2 example * update readme	2024-10-30 10:02:00 +08:00
Yuwen Hu	5a15098835	Initial support for quantized forward on CPU when `quantization_group_size=0` (#12282 ) * Initial support for quantized forward on CPU when quantization_group_size=0 * Style fix * Style fix * Small fix * Small fix	2024-10-29 19:40:17 +08:00
binbin Deng	3feb58d1e4	Support baichuan2 for level0 pipeline (#12289 )	2024-10-29 19:24:16 +08:00
Zhao Changmin	546f455e8e	Patch sdpa check function in specific module attributes table (#12285 )	2024-10-29 18:41:09 +08:00
Ruonan Wang	821b0033ed	[NPU L0] update layernorm & code refactor (#12287 ) * update layernorm & code refactor * fix style * add common utils * change to Pool() * remove print	2024-10-29 15:01:45 +08:00
Yina Chen	4467645088	[NPU] Support l0 Llama groupwise (#12276 ) * except lm_head * remove * support gw lm_head * update * fix * remove run.bat * fix style * support llama3	2024-10-28 17:06:55 +08:00
Ruonan Wang	3fe2ea3081	[NPU] Reuse prefill of acc lib for pipeline (#12279 ) * first commit * update example * fix style * update example * embedding as const * fix generate * code refactor * meet code review * fix style * change max_output_len to max_context_len * fix all-in-one * fix example * add check for new tokens	2024-10-28 16:05:49 +08:00
binbin Deng	ec362e6133	Add llama3 level0 example (#12275 )	2024-10-28 09:24:51 +08:00
SONG Ge	08cb065370	hot-fix redundant import funasr (#12277 )	2024-10-25 19:40:39 +08:00
SONG Ge	a0c6432899	[NPU] Add support for loading a FunASR model (#12073 ) * add support for loading funasr model * add initial support for paraformer-encoder * add npu ops impl * add encoder-decoder npu pipeline * move paraformer encoders prefix 30 layers to npu and keep the rest layers on cpu	2024-10-25 17:22:01 +08:00
Ruonan Wang	854398f6e0	update example to reduce peak memory usage (#12274 )	2024-10-25 17:09:26 +08:00
Yuwen Hu	e713296090	Update all-in-one benchmark (#12272 ) * Update all-in-one benchmark * Small fix * Small fix * Small fix	2024-10-25 16:52:59 +08:00
Yuwen Hu	43b25a2fe7	Fix llama 3.2 vision on LNL (#12264 ) * Fix llama 3.2 vision on LNL * Small fix	2024-10-25 16:23:31 +08:00
Yuwen Hu	93895b2ac2	Openvino all in one benchmark small fix (#12269 ) * Small update for all-in-one benchmark readme to support OpenVINO tests * Small fix	2024-10-25 14:13:52 +08:00
Zijie Li	f7f62a3fef	Add OpenVINO performance tests to all-in-one benchmark (#12238 ) * add-openvino-to-all-in-one * update on openvino API * Update save_openvino.py * Update save_openvino.py * Update save_openvino.py * update on run.py and save_openvino * update references * Create openvino-requirements.txt * fix on comments * Small updates * Small fix * Fix --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-10-25 13:53:53 +08:00
Ruonan Wang	ae57e23e4f	fix incompatibility between llama GW & llama pipeline (#12267 ) * fix * fix	2024-10-25 10:31:44 +08:00
Yina Chen	b5e663854b	[NPU] Support llama groupwise (#12260 ) * support llama gw * support llama gw lm_head * fix style * remove unused code	2024-10-24 18:06:45 +08:00
Xin Qiu	39c9d1de52	fix code geex (#12261 )	2024-10-24 14:34:01 +08:00
Yishuo Wang	f3a2b20e6b	Optimize gpt2 (#12259 )	2024-10-24 13:44:24 +08:00
Ruonan Wang	821fd96367	Initial integrate our L0 Llama impl into ipex-llm (#12255 ) * temp save * initial support * fix * simplify code * fix style * fix example * make default value of pipeline as False	2024-10-24 09:49:27 +08:00
Yishuo Wang	cacc891962	Fix PR validation (#12253 )	2024-10-23 18:10:47 +08:00
binbin Deng	b685cf4349	Fix npu group size setting of optimize_model=False (#12256 )	2024-10-23 17:53:54 +08:00
binbin Deng	567b77a76b	Support IR and blob format for llama level0 pipeline (#12251 )	2024-10-23 16:02:35 +08:00
Yishuo Wang	578aef245d	Fix models auto choose SdpaAttention with ipex 2.3 (#12252 )	2024-10-23 15:33:45 +08:00
Yishuo Wang	88dc120a4c	fix fp16 linear (#12250 )	2024-10-23 14:35:19 +08:00
Yina Chen	e8cf7f32f5	npu gw small fix (#12249 )	2024-10-23 14:26:01 +08:00
Shaojun Liu	aae2490cb8	fix UT (#12247 ) * fix ut * Update test_transformers_api_attention.py * Update test_transformers_api_mlp.py	2024-10-23 14:13:06 +08:00
Yina Chen	e37f951cce	[NPU] Groupwise (#12241 ) * dq divide * fix * support attn divide * update qwen2 7b * divide down_proj & other linear * use concat & reduce sum * support scale after * support qwen2 * w/ mm * update reshape * spda * split * split 2+ * update * lm head-> 28 * no scale * update * update * update * fix style * fix style * to split linear * update * update code * address comments * fix style & remove redundant code & revert benchmark scripts * fix style & remove code * update save & load --------- Co-authored-by: Yang Wang <yang3.wang@intel.com>	2024-10-23 14:10:58 +08:00
Jin, Qiao	8fa98e2742	Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" (#12245 ) * Remove qwen2-7b from npu example readme * fix	2024-10-22 17:07:51 +08:00
Yina Chen	ec465fbcd7	Add lookup generate in load_low_bit (#12243 ) * add lookup generate in load_low_bit * update comment	2024-10-22 15:51:52 +08:00
Yuwen Hu	b3df47486d	Fix Gemma 2 on LNL (#12240 ) * Fix gemma 2 on LNL * Python style fix	2024-10-21 18:25:53 +08:00
Yuwen Hu	5935b25622	Further update windows gpu perf test regarding results integrity check (#12232 )	2024-10-18 18:15:13 +08:00
Yuwen Hu	b88c1df324	Add Llama 3.1 & 3.2 to Arc Performance test (#12225 ) * Add llama3.1 and llama3.2 in arc perf (#12202) * Add llama3.1 and llama3.2 in arc perf * Uninstall trl after arc test on transformers>=4.40 * Fix arc llama3 perf (#12212) * Fix pip uninstall * Uninstall trl after test on transformers==4.43.1 * Fix llama3 arc perf (#12218) --------- Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com>	2024-10-17 21:12:45 +08:00
Yishuo Wang	9ea694484d	refactor ot remove old rope usage (#12224 )	2024-10-17 17:06:09 +08:00
Yishuo Wang	324bcb057e	refactor to reduce old rope usage (#12219 )	2024-10-17 14:45:09 +08:00
Jiao Wang	667f0db466	Update Eagle example to Eagle2+ipex-llm integration (#11717 ) * update to e2 example * update * update	2024-10-16 23:16:14 -07:00
Yishuo Wang	a4a758656a	refactor gemma to reduce old fuse rope usage (#12215 )	2024-10-16 17:40:28 +08:00
Yishuo Wang	9104a168f6	refactor phi-2 to reduce old fuse rope usage (#12214 )	2024-10-16 17:08:14 +08:00
Yishuo Wang	bb247e991b	refactor merge_qkv and attention_softmax (#12213 )	2024-10-16 15:58:14 +08:00
Yishuo Wang	e279148aa0	optimize llama3.2 vision again (#12211 )	2024-10-16 14:29:48 +08:00
Chu,Youcheng	f17cc4fdee	feat: add llama3.2-11b-vision in all in one (#12207 ) * feat: add llama3.2-11b-vision in all in one * fix: change model * fix: change name * fix: add a space * fix: switch import	2024-10-16 10:32:11 +08:00
Yuwen Hu	c9ac39fc1e	Add Llama 3.2 to iGPU performance test (`transformers 4.45`) (#12209 ) * Add Llama 3.2 to iGPU Perf (#12200) * Add Llama 3.2 to iGPU Perf * Downgrade accelerate after step * Temporarily disable model for test * Temporarily change ERRORLEVEL check (#12201) * Restore llama3.2 perf (#12206) * Revert "Temporarily change ERRORLEVEL check" This reverts commit 909dbbc930ab4283737161a55bb32006e6ca1991. * Revert "Temporarily disable model for test" This reverts commit 95322dc3c6429aa836f21bda0b5ba8d9b48592f8. --------- Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com>	2024-10-15 17:44:46 +08:00
Yishuo Wang	f6611f9d3a	optimize llama3.2 vison attention again (#12204 )	2024-10-15 16:08:20 +08:00
Yishuo Wang	9b81236a2e	optimzie qwen2-vl vision (#12203 )	2024-10-15 15:54:25 +08:00
Yishuo Wang	d5344587ab	optimize internvl2 vision model's attention (#12198 )	2024-10-15 10:51:00 +08:00
Yuwen Hu	f8d1adc573	Fix Llama 3.2 & 3.1 on LNL (#12196 )	2024-10-14 17:39:20 +08:00
Yuwen Hu	516b578104	Support cpp release for ARL on Windows (#12189 ) * Support cpp Windows release for ARL * Temp commit for test * Remove temp commit	2024-10-14 17:20:31 +08:00
Zijie Li	7d80db710e	Add benchmark_util for `transformers >= 4.44.0` (#12171 ) * Create benchmark_util_4_45.py * Update __init__.py * Update lint-python * Update benchmark_util_4_45.py * Update benchmark_util_4_45.py * Create benchmark_util_4_44.py	2024-10-14 15:40:12 +08:00
Jin, Qiao	8e35800abe	Add llama 3.1 in igpu perf (#12194 )	2024-10-14 15:14:34 +08:00
Yuwen Hu	ddcdf47539	Support Windows ARL release (#12183 ) * Support release for ARL * Small fix * Small fix to doc * Temp for test * Remove temp commit for test	2024-10-11 18:30:52 +08:00
Jinhe	f983f1a8f4	Add Qwen2-VL gpu example (#12135 ) * qwen2-vl readme * add qwen2-vl example * fix * fix * fix * add link * Update regarding modules_to_not_convert and readme * Further fix * Small fix --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-10-11 18:25:23 +08:00
Ruonan Wang	310f18c8af	update NPU pipeline generate (#12182 ) * update * fix style	2024-10-11 17:39:20 +08:00
Shaojun Liu	724b2ae66d	add npu-level0 pipeline.dll to ipex-llm (#12181 ) * add npu-level0 pipeline.dll to ipex-llm * test * update runner label * fix * update * fix * fix	2024-10-11 16:05:20 +08:00
Ruonan Wang	4d93bb81fe	Initial support of NPU level0 Model (#12177 ) * first commit to support load dll and init llm pipeline * add init generate * fix style * small updates * fix style and check tokens number	2024-10-11 09:45:53 +08:00
Yuwen Hu	890662610b	Fix auto importer for LNL release (#12175 )	2024-10-10 15:17:43 +08:00
Yishuo Wang	535bee5381	fix qwen2 vl again (#12174 )	2024-10-10 13:50:01 +08:00
Yuwen Hu	aef1f671bd	Support LNL Windows release (#12169 ) * Release for LNL on Windows * Temp commit for release test * Change option name * Remove temp commit and change option name * temp commit for test again * Remove temp commit	2024-10-09 17:41:10 +08:00
Yishuo Wang	78d253165d	optimize qwen2 vl perf again (#12167 )	2024-10-09 16:43:48 +08:00
Zijie Li	3d044dbf53	add llama3.2-vision Pytorch example (#12165 )	2024-10-09 09:20:42 +08:00
Yishuo Wang	644af2a76e	add basic llama 3.2 vision support (#12163 )	2024-10-08 10:46:48 +08:00
Ch1y0q	17c23cd759	add llama3.2 GPU example (#12137 ) * add llama3.2 GPU example * change prompt format reference url * update * add Meta-Llama-3.2-1B-Instruct sample output * update wording	2024-09-29 14:41:54 +08:00
Yuwen Hu	f71b38a994	Update MiniCPM_V_26 GPU example with save & load (#12127 )	2024-09-26 17:40:22 +08:00
Yishuo Wang	669ff1a97b	fix sd1.5 (#12129 )	2024-09-26 17:15:16 +08:00
Yishuo Wang	a266528719	optimize llama 3.2 rope (#12128 )	2024-09-26 16:08:10 +08:00
Yishuo Wang	584c3489e7	add basic support for llama3.2 (#12125 )	2024-09-26 15:46:19 +08:00
Yishuo Wang	66f419f8b7	fix qwen2 vl (#12126 )	2024-09-26 15:44:02 +08:00
Ch1y0q	2ea13d502f	Add minicpm3 gpu example (#12114 ) * add minicpm3 gpu example * update GPU example * update --------- Co-authored-by: Huang, Xinshengzi <xinshengzi.huang@intel.com>	2024-09-26 13:51:37 +08:00
Yishuo Wang	77af9bc5fa	support passing None to low_bit in optimize_model (#12121 )	2024-09-26 11:09:35 +08:00
Yishuo Wang	47e0b83cbf	optimize sd 1.5 (#12119 )	2024-09-25 15:45:13 +08:00
Jin, Qiao	2bedb17be7	Add Qwen2.5 NPU Example (#12110 ) * Add Qwen2.5 NPU Example * fix * Merge qwen2.py and qwen2.5.py into qwen.py * Fix description	2024-09-25 15:20:03 +08:00
Yishuo Wang	5d63aef60b	optimize qwen2 vl again (#12109 )	2024-09-23 13:22:01 +08:00
Ruonan Wang	03bd01c99c	optimize npu qwen2 (#12107 )	2024-09-20 19:46:16 +08:00
Jinhe	02399021d6	add npu load_low_bit api in all-in-one benchmark (#12103 )	2024-09-20 17:56:08 +08:00
Yishuo Wang	9239fd4f12	add basic support and optimization for qwen2-vl (#12104 )	2024-09-20 17:23:06 +08:00
Yuwen Hu	828fa01ad3	[NPU] Add `mixed_precision` for Qwen2 7B (#12098 ) * Add mix_precision argument to control whether use INT8 lm_head for Qwen2-7B-Instruct * Small fix * Fixed on load low bit with mixed precision * Small fix * Update example accordingly * Update for default prompt * Update base on comments * Final fix	2024-09-20 16:36:21 +08:00
Ch1y0q	2269768e71	add internvl2 example (#12102 ) * add internvl2 example * add to README.md * update * add link to zh-CN readme	2024-09-20 16:31:54 +08:00
Ruonan Wang	09b8c80d9d	update code for NPU qwen2 (#12094 ) * update code * fix	2024-09-20 15:58:32 +08:00
Jin, Qiao	db7500bfd4	Add Qwen2.5 GPU example (#12101 ) * Add Qwen2.5 GPU example * fix end line * fix description	2024-09-20 15:55:57 +08:00

1 2 3 4 5 ...

2099 commits