ipex-llm

Author	SHA1	Message	Date
Chu,Youcheng	acd77d9e87	Remove env variable `BIGDL_LLM_XMX_DISABLED` in documentation (#12445 ) * fix: remove BIGDL_LLM_XMX_DISABLED in mddocs * fix: remove set SYCL_CACHE_PERSISTENT=1 in example * fix: remove BIGDL_LLM_XMX_DISABLED in workflows * fix: merge igpu and A-series Graphics * fix: remove set BIGDL_LLM_XMX_DISABLED=1 in example * fix: remove BIGDL_LLM_XMX_DISABLED in workflows * fix: merge igpu and A-series Graphics * fix: textual adjustment * fix: textual adjustment * fix: textual adjustment	2024-11-27 11:16:36 +08:00
Ruonan Wang	f8c2bb2943	[NPU] optimize qwen2 prefill performance for C++ (#12451 )	2024-11-27 10:46:18 +08:00
Ruonan Wang	7b40f9b372	[NPU] Support GW for NPU C++ (#12450 )	2024-11-26 17:46:40 +08:00
Jin, Qiao	c2efa264d9	Update LangChain examples to use upstream (#12388 ) * Update LangChain examples to use upstream * Update README and fix links * Update LangChain CPU examples to use upstream * Update LangChain CPU voice_assistant example * Update CPU README * Update GPU README * Remove GPU Langchain vLLM example and fix comments * Change langchain -> LangChain * Add reference for both upstream llms and embeddings * Fix comments * Fix comments * Fix comments * Fix comments * Fix comment	2024-11-26 16:43:15 +08:00
Ruonan Wang	24b46b2b19	[NPU] further fix of qwen2 int8 pipeline & C++ (#12449 ) * fix * fix style	2024-11-26 16:39:39 +08:00
Yuwen Hu	303b104c10	Fix abnormal output for Qwen2-7B when sym_int8 (#12446 )	2024-11-26 15:53:04 +08:00
Ruonan Wang	52c17fe104	Optimize first token of C++ NPU by adding npu_dpu_groups (#12443 ) * add npu_dpu_groups * add check for env * fix style	2024-11-26 11:41:32 +08:00
Jinhe	66bd7abae4	add sdxl and lora-lcm optimization (#12444 ) * add sdxl and lora-lcm optimization * fix openjourney speed drop	2024-11-26 11:38:09 +08:00
Ruonan Wang	0e23bd779f	Add support of llama3.2 for NPU C++ (#12442 ) * initial support of llama3.2 * update * update * fix style * fix style * fix * small fix	2024-11-26 09:26:55 +08:00
Yishuo Wang	cdd41f5e4c	optimize sdxl again (#12441 )	2024-11-25 17:46:46 +08:00
Ruonan Wang	b9abb8a285	Support qwen2.5 3B for NPU & update related examples (#12438 ) * update qwen2.5-3B * update convert * small fix * replace load_in_low_bit with low_bit * small fix	2024-11-25 16:38:31 +08:00
Jinhe	b633fbf26c	add chinese prompt troubleshooting for npu cpp examples (#12437 ) * add chinese prompt troubleshooting * add chinese prompt troubleshooting	2024-11-25 15:28:47 +08:00
Yishuo Wang	8164aed802	small change (#12439 )	2024-11-25 14:35:49 +08:00
Yishuo Wang	be132c4209	fix and optimize sd (#12436 )	2024-11-25 14:09:48 +08:00
Ruonan Wang	f41405368a	Support minicpm for NPU C++ (#12434 ) * support minicpm-1b * update * tune fused_layers * update readme.md	2024-11-25 10:42:02 +08:00
Ruonan Wang	0819fad34e	support Llama2-7B / Llama3-8B for NPU C++ (#12431 ) * support llama2 * update * support fused_layers=4 for Llama2-7B	2024-11-22 18:47:19 +08:00
Ruonan Wang	4ffa6c752c	New convert support for C++ NPU (#12430 ) * initial commit * fix * fix style * fix style * fix * fix	2024-11-22 14:28:30 +08:00
Yuwen Hu	e61ae88c5b	Upgrade denpendency for xpu_lnl and xpu_arl option (#12424 )	2024-11-21 18:37:15 +08:00
Ruonan Wang	2935e97610	small fix of cpp readme(#12425 )	2024-11-21 18:21:34 +08:00
Yuwen Hu	8fdc36c140	Optimize with new batch kernel when `batch_size=1` on LNL (#12419 ) * Add use batch kernel condition for LNL * Fix for other device judgement * Fix based on comment	2024-11-21 16:21:35 +08:00
Jinhe	7e0a840f74	add optimization to openjourney (#12423 ) * add optimization to openjourney * add optimization to openjourney	2024-11-21 15:23:51 +08:00
Yishuo Wang	145e8b480f	update batch kernel condition (#12421 )	2024-11-21 10:12:46 +08:00
Ruonan Wang	7288c759ce	Initial NPU C++ Example (#12417 ) * temp save * meet review, update * update * meet review, add license * typo	2024-11-21 10:09:26 +08:00
Jinhe	d2a37b6ab2	add Stable diffusion examples (#12418 ) * add openjourney example * add timing * add stable diffusion to model page * 4.1 fix * small fix	2024-11-20 17:18:36 +08:00
Ruonan Wang	54c62feb74	[NPU] dump prefill IR for further C++ solution (#12402 ) * save prefill ir * fix * shorten convert time * fix * fix * fix * fix * fix style * dump config.json * meet review * small fix	2024-11-20 15:20:05 +08:00
SONG Ge	ff3f7cb25f	Fix speech_paraformer issue with unexpected changes (#12416 ) * Fix speech_paraformer issue with unexpected changes * Add paraformer version specified	2024-11-19 15:01:20 +08:00
Yuwen Hu	a69395f31f	Support performance mode of GLM4 model (#12401 ) * Initial support of prepare generation args for transformers 445 * Small fix to chatglm4 model optimization * Small fix * fix glm4 position id * fix glm4 error * Small change in conditon & fix based on comments * Style fixes --------- Co-authored-by: cyita <yitastudy@gmail.com>	2024-11-18 18:46:52 +08:00
Song Fuchang	d2c821d458	Add missing arguments in pipeline parallel generate method (#12142 ) Add two arguments: negative_prompt_ids and negative_prompt_attention_mask to generate method in pipeline_parallel.py.	2024-11-18 13:50:18 +08:00
Yishuo Wang	3d5fbf2069	update batch kernel condition (#12408 )	2024-11-15 13:47:05 +08:00
binbin Deng	d4d949443f	[NPU] change attention_mask to fp16 (#12400 )	2024-11-14 17:20:29 +08:00
Qiyuan Gong	7e50ff113c	Add padding_token=eos_token for GPU trl QLora example (#12398 ) * Avoid tokenizer doesn't have a padding token error.	2024-11-14 10:51:30 +08:00
SONG Ge	d2cbcb060c	Add initial support for modeling_xlm encoder on NPU (#12393 ) * Add initial support for modeling_xlm encoder on NPU * Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert * Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU * Add related example and documents	2024-11-14 10:50:27 +08:00
Yina Chen	59b01fa7d2	small fix (#12397 )	2024-11-14 10:03:36 +08:00
Yishuo Wang	00fce5c940	use new q4_0 batch kernel (#12396 )	2024-11-13 18:37:34 +08:00
Yina Chen	d6d63d6b84	[NPU] Qwen prefill attn_mask type hotfix (#12395 ) * qwen prefill attn_mask type fp16 * update	2024-11-13 17:51:34 +08:00
Yina Chen	9220babaab	qwen prefill attn_mask type fp16 (#12394 )	2024-11-13 17:45:26 +08:00
Yuwen Hu	1158f91648	Fix llava with multi-image inputs (#12384 )	2024-11-13 09:27:50 +08:00
Guancheng Fu	0ee54fc55f	Upgrade to vllm 0.6.2 (#12338 ) * Initial updates for vllm 0.6.2 * fix * Change Dockerfile to support v062 * Fix * fix examples * Fix * done * fix * Update engine.py * Fix Dockerfile to original path * fix * add option * fix * fix * fix * fix --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com>	2024-11-12 20:35:34 +08:00
Ruonan Wang	6bf5a8c230	[NPU] Update qwen2 compile config (#12383 ) * update * fix	2024-11-12 16:59:44 +08:00
binbin Deng	7a97fbb779	Support vpm and resampler module of minicpm-v on NPU (#12375 )	2024-11-12 15:59:55 +08:00
Yuwen Hu	e0918934c8	Add fused_mlp to glm4v models (#12378 )	2024-11-11 17:10:25 +08:00
Yishuo Wang	dc34e8c51f	optimize glm4v vision attention (#12369 )	2024-11-08 17:01:57 +08:00
Qiyuan Gong	2dfcc36825	Fix trl version and padding in trl qlora example (#12368 ) * Change trl to 0.9.6 * Enable padding to avoid padding related errors.	2024-11-08 16:05:17 +08:00
Yishuo Wang	51f7f87768	fix ipex 2.3 bug (#12366 )	2024-11-08 13:29:15 +08:00
Yina Chen	b2e69a896c	[NPU] Support Baichuan groupwise & gw code refactor (#12337 ) * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * update * update * baichuan support * code refactor * remove code * fix style * address comments * revert	2024-11-08 11:42:42 +08:00
binbin Deng	812d5cc32e	[NPU L0] Support llama3.2 in L0 pipeline (#12361 )	2024-11-08 10:01:23 +08:00
Yuwen Hu	8fe294e01f	Small fix to all-in-one benchmark (#12362 )	2024-11-07 18:56:34 +08:00
Yuwen Hu	1a6cbc473f	Add fused mlp optimizations to glm4 models (#12360 ) * Add fused mlp to glm4 models * Small fix	2024-11-07 18:52:47 +08:00
Yishuo Wang	ad68c56573	small improvement (#12359 )	2024-11-07 15:57:41 +08:00
Yina Chen	d880e534d2	[NPU] acclib llama3.2 support groupwise (#12355 ) * change inter_pp * add comment	2024-11-07 11:19:55 +08:00

1 2 3 4 5 ...

2020 commits