ipex-llm

Author	SHA1	Message	Date
Heyang Sun	d272f6b471	remove nf4 unsupport comment in cpu finetuning (#12460 ) Co-authored-by: Ariadne <wyn2000330@126.com>	2024-11-28 13:26:46 +08:00
Ruonan Wang	b29da30205	[NPU] Update C++ L0 (#12458 ) * update * fix style	2024-11-27 22:08:48 +08:00
Yuwen Hu	a2272b70d3	Small fix in llama.cpp troubleshooting guide (#12457 )	2024-11-27 19:22:11 +08:00
Yishuo Wang	6f3441ba4c	fix glm4-9b overflow (#12455 )	2024-11-27 17:39:13 +08:00
Ruonan Wang	281c9b0bb9	[NPU] Add L0 support for NPU C++ (#12454 ) * add L0 models support * meet review * fix style	2024-11-27 17:04:13 +08:00
Chu,Youcheng	ce6fcaa9ba	update transformers version in example of glm4 (#12453 ) * fix: update transformers version in example of glm4 * fix: textual adjustments * fix: texual adjustment	2024-11-27 15:02:25 +08:00
Yuwen Hu	effb9bb41c	Small update to LangChain examples readme (#12452 )	2024-11-27 14:02:25 +08:00
Chu,Youcheng	acd77d9e87	Remove env variable `BIGDL_LLM_XMX_DISABLED` in documentation (#12445 ) * fix: remove BIGDL_LLM_XMX_DISABLED in mddocs * fix: remove set SYCL_CACHE_PERSISTENT=1 in example * fix: remove BIGDL_LLM_XMX_DISABLED in workflows * fix: merge igpu and A-series Graphics * fix: remove set BIGDL_LLM_XMX_DISABLED=1 in example * fix: remove BIGDL_LLM_XMX_DISABLED in workflows * fix: merge igpu and A-series Graphics * fix: textual adjustment * fix: textual adjustment * fix: textual adjustment	2024-11-27 11:16:36 +08:00
Ruonan Wang	f8c2bb2943	[NPU] optimize qwen2 prefill performance for C++ (#12451 )	2024-11-27 10:46:18 +08:00
Guancheng Fu	8331875f34	Fix (#12390 )	2024-11-27 10:41:58 +08:00
Jun Wang	cb7b08948b	update vllm-docker-quick-start for vllm0.6.2 (#12392 ) * update vllm-docker-quick-start for vllm0.6.2 * [UPDATE] rm max-num-seqs parameter in vllm-serving script	2024-11-27 08:47:03 +08:00
Ruonan Wang	7b40f9b372	[NPU] Support GW for NPU C++ (#12450 )	2024-11-26 17:46:40 +08:00
Jin, Qiao	c2efa264d9	Update LangChain examples to use upstream (#12388 ) * Update LangChain examples to use upstream * Update README and fix links * Update LangChain CPU examples to use upstream * Update LangChain CPU voice_assistant example * Update CPU README * Update GPU README * Remove GPU Langchain vLLM example and fix comments * Change langchain -> LangChain * Add reference for both upstream llms and embeddings * Fix comments * Fix comments * Fix comments * Fix comments * Fix comment	2024-11-26 16:43:15 +08:00
Ruonan Wang	24b46b2b19	[NPU] further fix of qwen2 int8 pipeline & C++ (#12449 ) * fix * fix style	2024-11-26 16:39:39 +08:00
Yuwen Hu	303b104c10	Fix abnormal output for Qwen2-7B when sym_int8 (#12446 )	2024-11-26 15:53:04 +08:00
Pepijn de Vos	71e1f11aa6	update serving image runtime (#12433 )	2024-11-26 14:55:30 +08:00
Ruonan Wang	52c17fe104	Optimize first token of C++ NPU by adding npu_dpu_groups (#12443 ) * add npu_dpu_groups * add check for env * fix style	2024-11-26 11:41:32 +08:00
Jinhe	66bd7abae4	add sdxl and lora-lcm optimization (#12444 ) * add sdxl and lora-lcm optimization * fix openjourney speed drop	2024-11-26 11:38:09 +08:00
Ruonan Wang	0e23bd779f	Add support of llama3.2 for NPU C++ (#12442 ) * initial support of llama3.2 * update * update * fix style * fix style * fix * small fix	2024-11-26 09:26:55 +08:00
Yishuo Wang	cdd41f5e4c	optimize sdxl again (#12441 )	2024-11-25 17:46:46 +08:00
Ruonan Wang	b9abb8a285	Support qwen2.5 3B for NPU & update related examples (#12438 ) * update qwen2.5-3B * update convert * small fix * replace load_in_low_bit with low_bit * small fix	2024-11-25 16:38:31 +08:00
Jinhe	b633fbf26c	add chinese prompt troubleshooting for npu cpp examples (#12437 ) * add chinese prompt troubleshooting * add chinese prompt troubleshooting	2024-11-25 15:28:47 +08:00
Yishuo Wang	8164aed802	small change (#12439 )	2024-11-25 14:35:49 +08:00
Yishuo Wang	be132c4209	fix and optimize sd (#12436 )	2024-11-25 14:09:48 +08:00
Ruonan Wang	f41405368a	Support minicpm for NPU C++ (#12434 ) * support minicpm-1b * update * tune fused_layers * update readme.md	2024-11-25 10:42:02 +08:00
Ruonan Wang	0819fad34e	support Llama2-7B / Llama3-8B for NPU C++ (#12431 ) * support llama2 * update * support fused_layers=4 for Llama2-7B	2024-11-22 18:47:19 +08:00
Ruonan Wang	4ffa6c752c	New convert support for C++ NPU (#12430 ) * initial commit * fix * fix style * fix style * fix * fix	2024-11-22 14:28:30 +08:00
Shaojun Liu	c089b6c10d	Update english prompt to 34k (#12429 )	2024-11-22 11:20:35 +08:00
Yuwen Hu	e61ae88c5b	Upgrade denpendency for xpu_lnl and xpu_arl option (#12424 )	2024-11-21 18:37:15 +08:00
Ruonan Wang	2935e97610	small fix of cpp readme(#12425 )	2024-11-21 18:21:34 +08:00
Yuwen Hu	8fdc36c140	Optimize with new batch kernel when `batch_size=1` on LNL (#12419 ) * Add use batch kernel condition for LNL * Fix for other device judgement * Fix based on comment	2024-11-21 16:21:35 +08:00
Jinhe	7e0a840f74	add optimization to openjourney (#12423 ) * add optimization to openjourney * add optimization to openjourney	2024-11-21 15:23:51 +08:00
Yishuo Wang	145e8b480f	update batch kernel condition (#12421 )	2024-11-21 10:12:46 +08:00
Ruonan Wang	7288c759ce	Initial NPU C++ Example (#12417 ) * temp save * meet review, update * update * meet review, add license * typo	2024-11-21 10:09:26 +08:00
Jinhe	d2a37b6ab2	add Stable diffusion examples (#12418 ) * add openjourney example * add timing * add stable diffusion to model page * 4.1 fix * small fix	2024-11-20 17:18:36 +08:00
Ruonan Wang	54c62feb74	[NPU] dump prefill IR for further C++ solution (#12402 ) * save prefill ir * fix * shorten convert time * fix * fix * fix * fix * fix style * dump config.json * meet review * small fix	2024-11-20 15:20:05 +08:00
Wang, Jian4	1bfcbc0640	Add multimodal benchmark (#12415 ) * add benchmark multimodal * update * update * update	2024-11-20 14:21:13 +08:00
SONG Ge	ff3f7cb25f	Fix speech_paraformer issue with unexpected changes (#12416 ) * Fix speech_paraformer issue with unexpected changes * Add paraformer version specified	2024-11-19 15:01:20 +08:00
joan726	a9cb70a71c	Add install_windows_gpu.zh-CN.md and install_linux_gpu.zh-CN.md (#12409 ) * Add install_linux_gpu.zh-CN.md * Add install_windows_gpu.zh-CN.md * Update llama_cpp_quickstart.zh-CN.md Related links updated to zh-CN version. * Update install_linux_gpu.zh-CN.md Added link to English version. * Update install_windows_gpu.zh-CN.md Add the link to English version. * Update install_windows_gpu.md Add the link to CN version. * Update install_linux_gpu.md Add the link to CN version. * Update README.zh-CN.md Modified the related link to zh-CN version.	2024-11-19 14:39:53 +08:00
Guancheng Fu	d6057f6dd2	Update benchmark_vllm_throughput.py (#12414 )	2024-11-19 10:41:43 +08:00
Yuwen Hu	a69395f31f	Support performance mode of GLM4 model (#12401 ) * Initial support of prepare generation args for transformers 445 * Small fix to chatglm4 model optimization * Small fix * fix glm4 position id * fix glm4 error * Small change in conditon & fix based on comments * Style fixes --------- Co-authored-by: cyita <yitastudy@gmail.com>	2024-11-18 18:46:52 +08:00
Song Fuchang	d2c821d458	Add missing arguments in pipeline parallel generate method (#12142 ) Add two arguments: negative_prompt_ids and negative_prompt_attention_mask to generate method in pipeline_parallel.py.	2024-11-18 13:50:18 +08:00
Yishuo Wang	3d5fbf2069	update batch kernel condition (#12408 )	2024-11-15 13:47:05 +08:00
Ruonan Wang	6c5e8fc70c	fix again (#12407 )	2024-11-15 11:57:58 +08:00
Ruonan Wang	fcc0fa7316	fix workflow again (#12406 ) * fix again * fix name	2024-11-15 11:01:35 +08:00
Yuwen Hu	d1cde7fac4	Tiny doc fix (#12405 )	2024-11-15 10:28:38 +08:00
Ruonan Wang	548dec5185	fix npu pipeline workflow (#12404 )	2024-11-15 10:01:33 +08:00
binbin Deng	d4d949443f	[NPU] change attention_mask to fp16 (#12400 )	2024-11-14 17:20:29 +08:00
Qiyuan Gong	7e50ff113c	Add padding_token=eos_token for GPU trl QLora example (#12398 ) * Avoid tokenizer doesn't have a padding token error.	2024-11-14 10:51:30 +08:00
SONG Ge	d2cbcb060c	Add initial support for modeling_xlm encoder on NPU (#12393 ) * Add initial support for modeling_xlm encoder on NPU * Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert * Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU * Add related example and documents	2024-11-14 10:50:27 +08:00

1 2 3 4 5 ...

3712 commits