ipex-llm

Author	SHA1	Message	Date
Qiyuan Gong	7e50ff113c	Add padding_token=eos_token for GPU trl QLora example (#12398 ) * Avoid tokenizer doesn't have a padding token error.	2024-11-14 10:51:30 +08:00
SONG Ge	d2cbcb060c	Add initial support for modeling_xlm encoder on NPU (#12393 ) * Add initial support for modeling_xlm encoder on NPU * Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert * Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU * Add related example and documents	2024-11-14 10:50:27 +08:00
Yina Chen	59b01fa7d2	small fix (#12397 )	2024-11-14 10:03:36 +08:00
Yishuo Wang	00fce5c940	use new q4_0 batch kernel (#12396 )	2024-11-13 18:37:34 +08:00
Yina Chen	d6d63d6b84	[NPU] Qwen prefill attn_mask type hotfix (#12395 ) * qwen prefill attn_mask type fp16 * update	2024-11-13 17:51:34 +08:00
Yina Chen	9220babaab	qwen prefill attn_mask type fp16 (#12394 )	2024-11-13 17:45:26 +08:00
Yuwen Hu	1158f91648	Fix llava with multi-image inputs (#12384 )	2024-11-13 09:27:50 +08:00
Guancheng Fu	0ee54fc55f	Upgrade to vllm 0.6.2 (#12338 ) * Initial updates for vllm 0.6.2 * fix * Change Dockerfile to support v062 * Fix * fix examples * Fix * done * fix * Update engine.py * Fix Dockerfile to original path * fix * add option * fix * fix * fix * fix --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com>	2024-11-12 20:35:34 +08:00
Ruonan Wang	6bf5a8c230	[NPU] Update qwen2 compile config (#12383 ) * update * fix	2024-11-12 16:59:44 +08:00
binbin Deng	7a97fbb779	Support vpm and resampler module of minicpm-v on NPU (#12375 )	2024-11-12 15:59:55 +08:00
Yuwen Hu	e0918934c8	Add fused_mlp to glm4v models (#12378 )	2024-11-11 17:10:25 +08:00
Yishuo Wang	dc34e8c51f	optimize glm4v vision attention (#12369 )	2024-11-08 17:01:57 +08:00
Qiyuan Gong	2dfcc36825	Fix trl version and padding in trl qlora example (#12368 ) * Change trl to 0.9.6 * Enable padding to avoid padding related errors.	2024-11-08 16:05:17 +08:00
Yishuo Wang	51f7f87768	fix ipex 2.3 bug (#12366 )	2024-11-08 13:29:15 +08:00
Yina Chen	b2e69a896c	[NPU] Support Baichuan groupwise & gw code refactor (#12337 ) * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * update * update * baichuan support * code refactor * remove code * fix style * address comments * revert	2024-11-08 11:42:42 +08:00
binbin Deng	812d5cc32e	[NPU L0] Support llama3.2 in L0 pipeline (#12361 )	2024-11-08 10:01:23 +08:00
Yuwen Hu	8fe294e01f	Small fix to all-in-one benchmark (#12362 )	2024-11-07 18:56:34 +08:00
Yuwen Hu	1a6cbc473f	Add fused mlp optimizations to glm4 models (#12360 ) * Add fused mlp to glm4 models * Small fix	2024-11-07 18:52:47 +08:00
Yishuo Wang	ad68c56573	small improvement (#12359 )	2024-11-07 15:57:41 +08:00
Yina Chen	d880e534d2	[NPU] acclib llama3.2 support groupwise (#12355 ) * change inter_pp * add comment	2024-11-07 11:19:55 +08:00
Jinhe	79f2877413	add minicpm-v models to `transformers_int4_npu_win` api (#12352 ) * add minicpm npu * optimize model	2024-11-07 10:05:10 +08:00
SONG Ge	a7b66683f1	[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339 ) * Add initial support for llama3.2-1b/3b * move llama3.2 support into current llama_mp impl	2024-11-06 19:21:40 +08:00
Yuwen Hu	872a74481a	Small optimization to glm4 models (#12351 )	2024-11-06 19:16:58 +08:00
Ruonan Wang	c267355b35	fix three NPU benchmark issues (#12350 ) * fix three issues * limit mixed_precision for CW only	2024-11-06 19:01:01 +08:00
Yina Chen	f24352aef9	llama 3.1/3.2 support compresskv (#12347 ) * llama 3.1/3.2 support compresskv * update * fix transformers 4.45 error * fix style * fix typo * disable llama3.2 1b compresskv	2024-11-06 17:33:43 +08:00
Jin, Qiao	d984c0672a	Add MiniCPM-V-2_6 to arc perf test (#12349 )	2024-11-06 16:32:28 +08:00
Yishuo Wang	e23ef7d088	optimize glm4v's vision part (#12346 )	2024-11-06 15:43:40 +08:00
Yishuo Wang	c8b7265359	Add basic glm4v support (#12345 )	2024-11-06 13:50:10 +08:00
binbin Deng	69e3a56943	[NPU] Hot fix of load_low_bit (#12344 )	2024-11-06 10:07:00 +08:00
Jin, Qiao	7240c283a3	Add dummy model in iGPU perf (#12341 ) * Add dummy model in iGPU perf * Add dummy model in iGPU perf * Fix	2024-11-05 17:56:10 +08:00
Zhao Changmin	8e9a3a1158	fix chatglm2 cpu ut (#12336 )	2024-11-05 16:43:57 +08:00
Yina Chen	d872639395	[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327 ) * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * support minicpm 2b * fix style & error * fix style & update * remove print	2024-11-05 15:51:31 +08:00
Jin, Qiao	82a61b5cf3	Limit trl version in example (#12332 ) * Limit trl version in example * Limit trl version in example	2024-11-05 14:50:10 +08:00
Zijie Li	45b0d371aa	update benchmark readme (#12323 ) * update benchmark readme update new comment with memory usage included * Update README.md	2024-11-05 08:19:08 +08:00
Zhao Changmin	1b637e4477	Add chatglm2&3 fuse mlp (#12328 ) * add chatglm fuse mlp	2024-11-04 18:04:41 +08:00
Yina Chen	94c4ce389f	[NPU] Add env to disable compile opt (#12330 ) * add env to disable compile opt * fix style * fix style	2024-11-04 17:46:17 +08:00
Ch1y0q	e54af44ed6	Add `transformers_int4_npu_pipeline_win` in all-in-one benchmark (#12325 ) * add transformers_int4_npu_pipeline_win * bugfix * bugfix: wrong actual_output_len * fix format * bugfix & update `README.md`	2024-11-04 16:00:20 +08:00
binbin Deng	5ee6f97d6f	[NPU L0] Add layernorm weight as const / input setting (#12322 )	2024-11-04 15:46:24 +08:00
Chu,Youcheng	a01371f90b	Doc: update harness readme (#12324 )	2024-11-04 14:58:54 +08:00
Kai Huang	c8679ad592	Qwen layernorm as input (#12309 ) * qwen layernorm as input * add group size	2024-11-04 09:51:15 +08:00
Yuwen Hu	20755e8077	Small fix to all-in-one benchmark scripts (#12317 )	2024-11-01 19:16:25 +08:00
Ch1y0q	48123af463	add `npu_group_size` for `transformers_int4_npu_win` in all-in-one benchmark api (#12316 ) * add `npu_group_size` for `transformers_int4_npu_win` small bugfix * update	2024-11-01 18:44:27 +08:00
Zijie Li	cd5e22cee5	Update Llava GPU Example (#12311 ) * update-llava-example * add warmup * small fix on llava example * remove space& extra print prompt * renew example * small fix --------- Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>	2024-11-01 17:06:00 +08:00
binbin Deng	f53bb4ea0b	[NPU L0] Update 1st token generation (#12314 )	2024-11-01 17:02:07 +08:00
binbin Deng	d409d9d0eb	[NPU L0] Update streaming mode of example (#12312 )	2024-11-01 15:38:10 +08:00
Jin, Qiao	126f95be80	Fix DPO finetuning example (#12313 )	2024-11-01 13:29:44 +08:00
Yina Chen	05c5d0267a	[NPU] Llama2 prefill use ov sdp (#12310 ) * prefill use sdp * add param * update * fix style * fix style * meet comments	2024-11-01 11:05:20 +08:00
binbin Deng	eda764909c	Add minicpm-2b in L0 pipeline (#12308 )	2024-11-01 09:30:01 +08:00
Yishuo Wang	b9853f98b3	fix qwen2 attention_mask slice (#12307 )	2024-10-31 17:00:05 +08:00
Jin, Qiao	3df6195cb0	Fix application quickstart (#12305 ) * fix graphrag quickstart * fix axolotl quickstart * fix ragflow quickstart * fix ragflow quickstart * fix graphrag toc * fix comments * fix comment * fix comments	2024-10-31 16:57:35 +08:00

1 2 3 4 5 ...

1990 commits