ipex-llm

Author	SHA1	Message	Date
Jinhe	5e1416c9aa	fix readme for npu cpp examples and llama.cpp (#12505 ) * fix cpp readme * fix cpp readme * fix cpp readme	2024-12-05 12:32:42 +08:00
binbin Deng	f56a111aa2	[NPU] Fix load-low-bit benchmark script (#12502 )	2024-12-05 10:01:32 +08:00
Yuwen Hu	84f1c4ad57	Small fix for NPU Python cpp simple generate regarding eos tokens (#12501 )	2024-12-04 18:54:06 +08:00
Kai Huang	d8b14a6305	Update save/load comments (#12500 )	2024-12-04 18:51:38 +08:00
Kai Huang	b89ea1b0cf	Support save/load model for hf generate (#12499 ) * change dummy model * style * meet review	2024-12-04 18:26:39 +08:00
Kai Huang	7d27f134dd	Fix hf generate for llama3.2 (#12497 ) * fix kv condition] * meet review	2024-12-04 17:54:40 +08:00
Chu,Youcheng	ffa9a9e1b3	Update streaming in npu examples (#12495 ) * feat: add streaming * Update readme accordingly --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-12-04 17:51:10 +08:00
Yishuo Wang	a9e3f7f14c	optimize minicpm (#12496 )	2024-12-04 17:14:16 +08:00
Yishuo Wang	e0bf0054e1	small fix (#12493 )	2024-12-04 16:37:39 +08:00
Kai Huang	7ff4533b39	Support hf generate (#12477 ) * generate * style * update * remove timing * style * style * combine generate api * simple in kwargs	2024-12-04 16:31:09 +08:00
Yuwen Hu	ef4028ac2d	[NPU] Support split `lm_head` for Qwen2 with CPP (#12491 ) * Use split for Qwen2 lm_head instead of slice in optimize_pre * Support split lm_head in Qwen2 python cpp backend * Fit with Python acc lib pipeline * Removed default mixed_precision=True in all-in-one and related examples * Small fix * Style fix * Fix based on comments * Fix based on comments * Stype fix	2024-12-04 14:41:08 +08:00
Yishuo Wang	5629fdd518	optimize qwen2_vl multiple image input or video input (#12487 )	2024-12-04 09:24:38 +08:00
binbin Deng	c59284418c	Hotfix of BCE-Emdedding model (#12490 )	2024-12-03 18:16:04 +08:00
Yuwen Hu	4ac66db034	[NPU] Support streaming in Python (cpp backend) (#12488 ) * Support streaming in NPU Python (cpp backend) * Small fix	2024-12-03 17:17:26 +08:00
Jin, Qiao	7082844f3f	Fix NPU LLM example save/load tokenizer (#12485 )	2024-12-03 16:30:55 +08:00
Jin, Qiao	5fe766788e	Fix MiniCPM-V-2_6 running on NPU (#12486 )	2024-12-03 16:16:29 +08:00
Ruonan Wang	598603bea6	small fix of imatrix (#12480 )	2024-12-03 10:46:36 +08:00
binbin Deng	ab01753b1c	[NPU] update save-load API usage (#12473 )	2024-12-03 09:46:15 +08:00
Yuwen Hu	26adb82ee3	[NPU] Remove hard code (#12479 )	2024-12-02 18:26:07 +08:00
Yuwen Hu	b2e56a2e03	Add release support for option `xpu_arc` (#12422 ) * Add release support for xpu-arc * Dependency update	2024-12-02 17:16:04 +08:00
Yuwen Hu	aee9acb303	Add NPU QuickStart & update example links (#12470 ) * Add initial NPU quickstart (c++ part unfinished) * Small update * Update based on comments * Update main readme * Remove LLaMA description * Small fix * Small fix * Remove subsection link in main README * Small fix * Update based on comments * Small fix * TOC update and other small fixes * Update for Chinese main readme * Update based on comments and other small fixes * Change order	2024-12-02 17:03:10 +08:00
Jin, Qiao	31c69a8d31	Fix MiniCPM-V models running on NPU (#12478 )	2024-12-02 16:29:46 +08:00
binbin Deng	54d9a590d4	[NPU]Fix eos_token setting (#12475 )	2024-12-02 14:18:22 +08:00
Guancheng Fu	59bd4a214f	add vLLM glm4 fix (#12474 )	2024-12-02 14:05:16 +08:00
Ruonan Wang	4b6c3160be	Support imatrix-guided quantization for NPU CW (#12468 ) * init commit * remove print * add interface * fix * fix * fix style	2024-12-02 11:31:26 +08:00
binbin Deng	f99f188023	Hotfix of benchmark script (#12467 )	2024-11-29 14:00:59 +08:00
binbin Deng	c911026f03	[NPU C++] Update model support & examples & benchmark (#12466 )	2024-11-29 13:35:58 +08:00
binbin Deng	14d8d3d8af	Integrate NPU C++ imple into ipex-llm (#12461 )	2024-11-29 09:25:37 +08:00
Ruonan Wang	490bb0ca53	[NPU] update fused layers for GW (#12459 ) * update fused layers for GW * fix * fix llama condition for glm model * update	2024-11-28 17:14:30 +08:00
Yina Chen	1b533a105c	[NPU] Add env to enable scale search (#12462 ) * add env enable scale search * address comment * move logic	2024-11-28 17:06:00 +08:00
Heyang Sun	d272f6b471	remove nf4 unsupport comment in cpu finetuning (#12460 ) Co-authored-by: Ariadne <wyn2000330@126.com>	2024-11-28 13:26:46 +08:00
Ruonan Wang	b29da30205	[NPU] Update C++ L0 (#12458 ) * update * fix style	2024-11-27 22:08:48 +08:00
Yishuo Wang	6f3441ba4c	fix glm4-9b overflow (#12455 )	2024-11-27 17:39:13 +08:00
Ruonan Wang	281c9b0bb9	[NPU] Add L0 support for NPU C++ (#12454 ) * add L0 models support * meet review * fix style	2024-11-27 17:04:13 +08:00
Chu,Youcheng	ce6fcaa9ba	update transformers version in example of glm4 (#12453 ) * fix: update transformers version in example of glm4 * fix: textual adjustments * fix: texual adjustment	2024-11-27 15:02:25 +08:00
Yuwen Hu	effb9bb41c	Small update to LangChain examples readme (#12452 )	2024-11-27 14:02:25 +08:00
Chu,Youcheng	acd77d9e87	Remove env variable `BIGDL_LLM_XMX_DISABLED` in documentation (#12445 ) * fix: remove BIGDL_LLM_XMX_DISABLED in mddocs * fix: remove set SYCL_CACHE_PERSISTENT=1 in example * fix: remove BIGDL_LLM_XMX_DISABLED in workflows * fix: merge igpu and A-series Graphics * fix: remove set BIGDL_LLM_XMX_DISABLED=1 in example * fix: remove BIGDL_LLM_XMX_DISABLED in workflows * fix: merge igpu and A-series Graphics * fix: textual adjustment * fix: textual adjustment * fix: textual adjustment	2024-11-27 11:16:36 +08:00
Ruonan Wang	f8c2bb2943	[NPU] optimize qwen2 prefill performance for C++ (#12451 )	2024-11-27 10:46:18 +08:00
Ruonan Wang	7b40f9b372	[NPU] Support GW for NPU C++ (#12450 )	2024-11-26 17:46:40 +08:00
Jin, Qiao	c2efa264d9	Update LangChain examples to use upstream (#12388 ) * Update LangChain examples to use upstream * Update README and fix links * Update LangChain CPU examples to use upstream * Update LangChain CPU voice_assistant example * Update CPU README * Update GPU README * Remove GPU Langchain vLLM example and fix comments * Change langchain -> LangChain * Add reference for both upstream llms and embeddings * Fix comments * Fix comments * Fix comments * Fix comments * Fix comment	2024-11-26 16:43:15 +08:00
Ruonan Wang	24b46b2b19	[NPU] further fix of qwen2 int8 pipeline & C++ (#12449 ) * fix * fix style	2024-11-26 16:39:39 +08:00
Yuwen Hu	303b104c10	Fix abnormal output for Qwen2-7B when sym_int8 (#12446 )	2024-11-26 15:53:04 +08:00
Ruonan Wang	52c17fe104	Optimize first token of C++ NPU by adding npu_dpu_groups (#12443 ) * add npu_dpu_groups * add check for env * fix style	2024-11-26 11:41:32 +08:00
Jinhe	66bd7abae4	add sdxl and lora-lcm optimization (#12444 ) * add sdxl and lora-lcm optimization * fix openjourney speed drop	2024-11-26 11:38:09 +08:00
Ruonan Wang	0e23bd779f	Add support of llama3.2 for NPU C++ (#12442 ) * initial support of llama3.2 * update * update * fix style * fix style * fix * small fix	2024-11-26 09:26:55 +08:00
Yishuo Wang	cdd41f5e4c	optimize sdxl again (#12441 )	2024-11-25 17:46:46 +08:00
Ruonan Wang	b9abb8a285	Support qwen2.5 3B for NPU & update related examples (#12438 ) * update qwen2.5-3B * update convert * small fix * replace load_in_low_bit with low_bit * small fix	2024-11-25 16:38:31 +08:00
Jinhe	b633fbf26c	add chinese prompt troubleshooting for npu cpp examples (#12437 ) * add chinese prompt troubleshooting * add chinese prompt troubleshooting	2024-11-25 15:28:47 +08:00
Yishuo Wang	8164aed802	small change (#12439 )	2024-11-25 14:35:49 +08:00
Yishuo Wang	be132c4209	fix and optimize sd (#12436 )	2024-11-25 14:09:48 +08:00
Ruonan Wang	f41405368a	Support minicpm for NPU C++ (#12434 ) * support minicpm-1b * update * tune fused_layers * update readme.md	2024-11-25 10:42:02 +08:00
Ruonan Wang	0819fad34e	support Llama2-7B / Llama3-8B for NPU C++ (#12431 ) * support llama2 * update * support fused_layers=4 for Llama2-7B	2024-11-22 18:47:19 +08:00
Ruonan Wang	4ffa6c752c	New convert support for C++ NPU (#12430 ) * initial commit * fix * fix style * fix style * fix * fix	2024-11-22 14:28:30 +08:00
Yuwen Hu	e61ae88c5b	Upgrade denpendency for xpu_lnl and xpu_arl option (#12424 )	2024-11-21 18:37:15 +08:00
Ruonan Wang	2935e97610	small fix of cpp readme(#12425 )	2024-11-21 18:21:34 +08:00
Yuwen Hu	8fdc36c140	Optimize with new batch kernel when `batch_size=1` on LNL (#12419 ) * Add use batch kernel condition for LNL * Fix for other device judgement * Fix based on comment	2024-11-21 16:21:35 +08:00
Jinhe	7e0a840f74	add optimization to openjourney (#12423 ) * add optimization to openjourney * add optimization to openjourney	2024-11-21 15:23:51 +08:00
Yishuo Wang	145e8b480f	update batch kernel condition (#12421 )	2024-11-21 10:12:46 +08:00
Ruonan Wang	7288c759ce	Initial NPU C++ Example (#12417 ) * temp save * meet review, update * update * meet review, add license * typo	2024-11-21 10:09:26 +08:00
Jinhe	d2a37b6ab2	add Stable diffusion examples (#12418 ) * add openjourney example * add timing * add stable diffusion to model page * 4.1 fix * small fix	2024-11-20 17:18:36 +08:00
Ruonan Wang	54c62feb74	[NPU] dump prefill IR for further C++ solution (#12402 ) * save prefill ir * fix * shorten convert time * fix * fix * fix * fix * fix style * dump config.json * meet review * small fix	2024-11-20 15:20:05 +08:00
SONG Ge	ff3f7cb25f	Fix speech_paraformer issue with unexpected changes (#12416 ) * Fix speech_paraformer issue with unexpected changes * Add paraformer version specified	2024-11-19 15:01:20 +08:00
Yuwen Hu	a69395f31f	Support performance mode of GLM4 model (#12401 ) * Initial support of prepare generation args for transformers 445 * Small fix to chatglm4 model optimization * Small fix * fix glm4 position id * fix glm4 error * Small change in conditon & fix based on comments * Style fixes --------- Co-authored-by: cyita <yitastudy@gmail.com>	2024-11-18 18:46:52 +08:00
Song Fuchang	d2c821d458	Add missing arguments in pipeline parallel generate method (#12142 ) Add two arguments: negative_prompt_ids and negative_prompt_attention_mask to generate method in pipeline_parallel.py.	2024-11-18 13:50:18 +08:00
Yishuo Wang	3d5fbf2069	update batch kernel condition (#12408 )	2024-11-15 13:47:05 +08:00
binbin Deng	d4d949443f	[NPU] change attention_mask to fp16 (#12400 )	2024-11-14 17:20:29 +08:00
Qiyuan Gong	7e50ff113c	Add padding_token=eos_token for GPU trl QLora example (#12398 ) * Avoid tokenizer doesn't have a padding token error.	2024-11-14 10:51:30 +08:00
SONG Ge	d2cbcb060c	Add initial support for modeling_xlm encoder on NPU (#12393 ) * Add initial support for modeling_xlm encoder on NPU * Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert * Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU * Add related example and documents	2024-11-14 10:50:27 +08:00
Yina Chen	59b01fa7d2	small fix (#12397 )	2024-11-14 10:03:36 +08:00
Yishuo Wang	00fce5c940	use new q4_0 batch kernel (#12396 )	2024-11-13 18:37:34 +08:00
Yina Chen	d6d63d6b84	[NPU] Qwen prefill attn_mask type hotfix (#12395 ) * qwen prefill attn_mask type fp16 * update	2024-11-13 17:51:34 +08:00
Yina Chen	9220babaab	qwen prefill attn_mask type fp16 (#12394 )	2024-11-13 17:45:26 +08:00
Yuwen Hu	1158f91648	Fix llava with multi-image inputs (#12384 )	2024-11-13 09:27:50 +08:00
Guancheng Fu	0ee54fc55f	Upgrade to vllm 0.6.2 (#12338 ) * Initial updates for vllm 0.6.2 * fix * Change Dockerfile to support v062 * Fix * fix examples * Fix * done * fix * Update engine.py * Fix Dockerfile to original path * fix * add option * fix * fix * fix * fix --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com>	2024-11-12 20:35:34 +08:00
Ruonan Wang	6bf5a8c230	[NPU] Update qwen2 compile config (#12383 ) * update * fix	2024-11-12 16:59:44 +08:00
binbin Deng	7a97fbb779	Support vpm and resampler module of minicpm-v on NPU (#12375 )	2024-11-12 15:59:55 +08:00
Yuwen Hu	e0918934c8	Add fused_mlp to glm4v models (#12378 )	2024-11-11 17:10:25 +08:00
Yishuo Wang	dc34e8c51f	optimize glm4v vision attention (#12369 )	2024-11-08 17:01:57 +08:00
Qiyuan Gong	2dfcc36825	Fix trl version and padding in trl qlora example (#12368 ) * Change trl to 0.9.6 * Enable padding to avoid padding related errors.	2024-11-08 16:05:17 +08:00
Yishuo Wang	51f7f87768	fix ipex 2.3 bug (#12366 )	2024-11-08 13:29:15 +08:00
Yina Chen	b2e69a896c	[NPU] Support Baichuan groupwise & gw code refactor (#12337 ) * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * update * update * baichuan support * code refactor * remove code * fix style * address comments * revert	2024-11-08 11:42:42 +08:00
binbin Deng	812d5cc32e	[NPU L0] Support llama3.2 in L0 pipeline (#12361 )	2024-11-08 10:01:23 +08:00
Yuwen Hu	8fe294e01f	Small fix to all-in-one benchmark (#12362 )	2024-11-07 18:56:34 +08:00
Yuwen Hu	1a6cbc473f	Add fused mlp optimizations to glm4 models (#12360 ) * Add fused mlp to glm4 models * Small fix	2024-11-07 18:52:47 +08:00
Yishuo Wang	ad68c56573	small improvement (#12359 )	2024-11-07 15:57:41 +08:00
Yina Chen	d880e534d2	[NPU] acclib llama3.2 support groupwise (#12355 ) * change inter_pp * add comment	2024-11-07 11:19:55 +08:00
Jinhe	79f2877413	add minicpm-v models to `transformers_int4_npu_win` api (#12352 ) * add minicpm npu * optimize model	2024-11-07 10:05:10 +08:00
SONG Ge	a7b66683f1	[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339 ) * Add initial support for llama3.2-1b/3b * move llama3.2 support into current llama_mp impl	2024-11-06 19:21:40 +08:00
Yuwen Hu	872a74481a	Small optimization to glm4 models (#12351 )	2024-11-06 19:16:58 +08:00
Ruonan Wang	c267355b35	fix three NPU benchmark issues (#12350 ) * fix three issues * limit mixed_precision for CW only	2024-11-06 19:01:01 +08:00
Yina Chen	f24352aef9	llama 3.1/3.2 support compresskv (#12347 ) * llama 3.1/3.2 support compresskv * update * fix transformers 4.45 error * fix style * fix typo * disable llama3.2 1b compresskv	2024-11-06 17:33:43 +08:00
Jin, Qiao	d984c0672a	Add MiniCPM-V-2_6 to arc perf test (#12349 )	2024-11-06 16:32:28 +08:00
Yishuo Wang	e23ef7d088	optimize glm4v's vision part (#12346 )	2024-11-06 15:43:40 +08:00
Yishuo Wang	c8b7265359	Add basic glm4v support (#12345 )	2024-11-06 13:50:10 +08:00
binbin Deng	69e3a56943	[NPU] Hot fix of load_low_bit (#12344 )	2024-11-06 10:07:00 +08:00
Jin, Qiao	7240c283a3	Add dummy model in iGPU perf (#12341 ) * Add dummy model in iGPU perf * Add dummy model in iGPU perf * Fix	2024-11-05 17:56:10 +08:00
Zhao Changmin	8e9a3a1158	fix chatglm2 cpu ut (#12336 )	2024-11-05 16:43:57 +08:00
Yina Chen	d872639395	[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327 ) * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * support minicpm 2b * fix style & error * fix style & update * remove print	2024-11-05 15:51:31 +08:00
Jin, Qiao	82a61b5cf3	Limit trl version in example (#12332 ) * Limit trl version in example * Limit trl version in example	2024-11-05 14:50:10 +08:00
Zijie Li	45b0d371aa	update benchmark readme (#12323 ) * update benchmark readme update new comment with memory usage included * Update README.md	2024-11-05 08:19:08 +08:00
Zhao Changmin	1b637e4477	Add chatglm2&3 fuse mlp (#12328 ) * add chatglm fuse mlp	2024-11-04 18:04:41 +08:00
Yina Chen	94c4ce389f	[NPU] Add env to disable compile opt (#12330 ) * add env to disable compile opt * fix style * fix style	2024-11-04 17:46:17 +08:00
Ch1y0q	e54af44ed6	Add `transformers_int4_npu_pipeline_win` in all-in-one benchmark (#12325 ) * add transformers_int4_npu_pipeline_win * bugfix * bugfix: wrong actual_output_len * fix format * bugfix & update `README.md`	2024-11-04 16:00:20 +08:00
binbin Deng	5ee6f97d6f	[NPU L0] Add layernorm weight as const / input setting (#12322 )	2024-11-04 15:46:24 +08:00
Chu,Youcheng	a01371f90b	Doc: update harness readme (#12324 )	2024-11-04 14:58:54 +08:00
Kai Huang	c8679ad592	Qwen layernorm as input (#12309 ) * qwen layernorm as input * add group size	2024-11-04 09:51:15 +08:00
Yuwen Hu	20755e8077	Small fix to all-in-one benchmark scripts (#12317 )	2024-11-01 19:16:25 +08:00
Ch1y0q	48123af463	add `npu_group_size` for `transformers_int4_npu_win` in all-in-one benchmark api (#12316 ) * add `npu_group_size` for `transformers_int4_npu_win` small bugfix * update	2024-11-01 18:44:27 +08:00
Zijie Li	cd5e22cee5	Update Llava GPU Example (#12311 ) * update-llava-example * add warmup * small fix on llava example * remove space& extra print prompt * renew example * small fix --------- Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>	2024-11-01 17:06:00 +08:00
binbin Deng	f53bb4ea0b	[NPU L0] Update 1st token generation (#12314 )	2024-11-01 17:02:07 +08:00
binbin Deng	d409d9d0eb	[NPU L0] Update streaming mode of example (#12312 )	2024-11-01 15:38:10 +08:00
Jin, Qiao	126f95be80	Fix DPO finetuning example (#12313 )	2024-11-01 13:29:44 +08:00
Yina Chen	05c5d0267a	[NPU] Llama2 prefill use ov sdp (#12310 ) * prefill use sdp * add param * update * fix style * fix style * meet comments	2024-11-01 11:05:20 +08:00
binbin Deng	eda764909c	Add minicpm-2b in L0 pipeline (#12308 )	2024-11-01 09:30:01 +08:00
Yishuo Wang	b9853f98b3	fix qwen2 attention_mask slice (#12307 )	2024-10-31 17:00:05 +08:00
Jin, Qiao	3df6195cb0	Fix application quickstart (#12305 ) * fix graphrag quickstart * fix axolotl quickstart * fix ragflow quickstart * fix ragflow quickstart * fix graphrag toc * fix comments * fix comment * fix comments	2024-10-31 16:57:35 +08:00
binbin Deng	4892df61c9	Add qwen2-1.5b in l0 pipeline example (#12306 )	2024-10-31 16:44:25 +08:00
Jinhe	30f668c206	updated transformers & accelerate requirements (#12301 )	2024-10-31 15:59:40 +08:00
Xin Qiu	97a0f7fd35	Codegeex support (#12303 ) * new codegeex attn * use kv cache * add compress/quantize kv * remove compress/quantize kv * fix style check * fix style * fix codegeex	2024-10-31 15:28:56 +08:00
Yishuo Wang	72605c7016	fix llama3.1/3.2 quantize kv check (#12302 )	2024-10-31 11:55:07 +08:00
Kai Huang	416c19165c	Add Qwen pipeline and example (#12292 ) * support qwen pipeline * update error msg * style * meet review * minor	2024-10-31 11:25:25 +08:00
Rahul Nair	4cf1ccc43a	Update DPO EADME.md (#12162 ) bitsanbytes multi backend is now available and is required , otherwise would error out saying that no cuda is available	2024-10-31 10:56:46 +08:00
Chu,Youcheng	29400e2e75	feat: change oneccl to internal (#12296 ) * feat: change oneccl * fix: restore llama-70b * fix: remove tab * fix: remove extra blank * small fix * add comments * fix: add a blank space	2024-10-31 09:51:43 +08:00
Zijie Li	6f22133efc	Update AWQ and GPTQ GPU example (#12300 )	2024-10-31 09:35:31 +08:00
Yina Chen	0763268e4c	[NPU]Qwen2 groupwise performance opt (#12299 ) * qwen2 gw performance opt * remove debug	2024-10-30 17:40:21 +08:00
binbin Deng	41b8064554	Support minicpm-1B in level0 pipeline (#12297 )	2024-10-30 17:21:47 +08:00
Jinhe	46d8300f6b	bugfix for qlora finetuning on GPU (#12298 ) * bugfix for qlora 100 step error * indent fix * annotation fix	2024-10-30 16:54:10 +08:00
Yina Chen	70037ad55f	Groupwise prefill optimization (#12291 ) * except lm_head * remove * support gw lm_head * update * fix * remove run.bat * fix style * support llama3 * slice -> split * remove debug * fix style * add dpu	2024-10-30 14:59:45 +08:00
Yishuo Wang	540eaeb12c	refactor attention_softmax (#12295 )	2024-10-30 13:20:50 +08:00
Ruonan Wang	2b2cb9c693	[NPU pipeline] Support save & load and update examples (#12293 ) * support save & load, update llama examples * update baichuan2 example * update readme	2024-10-30 10:02:00 +08:00
Yuwen Hu	5a15098835	Initial support for quantized forward on CPU when `quantization_group_size=0` (#12282 ) * Initial support for quantized forward on CPU when quantization_group_size=0 * Style fix * Style fix * Small fix * Small fix	2024-10-29 19:40:17 +08:00
binbin Deng	3feb58d1e4	Support baichuan2 for level0 pipeline (#12289 )	2024-10-29 19:24:16 +08:00
Zhao Changmin	546f455e8e	Patch sdpa check function in specific module attributes table (#12285 )	2024-10-29 18:41:09 +08:00
Ruonan Wang	821b0033ed	[NPU L0] update layernorm & code refactor (#12287 ) * update layernorm & code refactor * fix style * add common utils * change to Pool() * remove print	2024-10-29 15:01:45 +08:00
Yina Chen	4467645088	[NPU] Support l0 Llama groupwise (#12276 ) * except lm_head * remove * support gw lm_head * update * fix * remove run.bat * fix style * support llama3	2024-10-28 17:06:55 +08:00
Ruonan Wang	3fe2ea3081	[NPU] Reuse prefill of acc lib for pipeline (#12279 ) * first commit * update example * fix style * update example * embedding as const * fix generate * code refactor * meet code review * fix style * change max_output_len to max_context_len * fix all-in-one * fix example * add check for new tokens	2024-10-28 16:05:49 +08:00
binbin Deng	ec362e6133	Add llama3 level0 example (#12275 )	2024-10-28 09:24:51 +08:00
SONG Ge	08cb065370	hot-fix redundant import funasr (#12277 )	2024-10-25 19:40:39 +08:00
SONG Ge	a0c6432899	[NPU] Add support for loading a FunASR model (#12073 ) * add support for loading funasr model * add initial support for paraformer-encoder * add npu ops impl * add encoder-decoder npu pipeline * move paraformer encoders prefix 30 layers to npu and keep the rest layers on cpu	2024-10-25 17:22:01 +08:00
Ruonan Wang	854398f6e0	update example to reduce peak memory usage (#12274 )	2024-10-25 17:09:26 +08:00
Yuwen Hu	e713296090	Update all-in-one benchmark (#12272 ) * Update all-in-one benchmark * Small fix * Small fix * Small fix	2024-10-25 16:52:59 +08:00
Yuwen Hu	43b25a2fe7	Fix llama 3.2 vision on LNL (#12264 ) * Fix llama 3.2 vision on LNL * Small fix	2024-10-25 16:23:31 +08:00
Yuwen Hu	93895b2ac2	Openvino all in one benchmark small fix (#12269 ) * Small update for all-in-one benchmark readme to support OpenVINO tests * Small fix	2024-10-25 14:13:52 +08:00
Zijie Li	f7f62a3fef	Add OpenVINO performance tests to all-in-one benchmark (#12238 ) * add-openvino-to-all-in-one * update on openvino API * Update save_openvino.py * Update save_openvino.py * Update save_openvino.py * update on run.py and save_openvino * update references * Create openvino-requirements.txt * fix on comments * Small updates * Small fix * Fix --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-10-25 13:53:53 +08:00
Ruonan Wang	ae57e23e4f	fix incompatibility between llama GW & llama pipeline (#12267 ) * fix * fix	2024-10-25 10:31:44 +08:00
Yina Chen	b5e663854b	[NPU] Support llama groupwise (#12260 ) * support llama gw * support llama gw lm_head * fix style * remove unused code	2024-10-24 18:06:45 +08:00
Xin Qiu	39c9d1de52	fix code geex (#12261 )	2024-10-24 14:34:01 +08:00
Yishuo Wang	f3a2b20e6b	Optimize gpt2 (#12259 )	2024-10-24 13:44:24 +08:00
Ruonan Wang	821fd96367	Initial integrate our L0 Llama impl into ipex-llm (#12255 ) * temp save * initial support * fix * simplify code * fix style * fix example * make default value of pipeline as False	2024-10-24 09:49:27 +08:00
Yishuo Wang	cacc891962	Fix PR validation (#12253 )	2024-10-23 18:10:47 +08:00

1 2 3 4 5 ...

2156 commits