ipex-llm

Author	SHA1	Message	Date
Shaojun Liu	c92d76b997	Update oneccl-binding.patch (#12377 ) * Add files via upload * upload oneccl-binding.patch * Update Dockerfile	2024-11-11 22:34:08 +08:00
Yuwen Hu	e0918934c8	Add fused_mlp to glm4v models (#12378 )	2024-11-11 17:10:25 +08:00
Yishuo Wang	dc34e8c51f	optimize glm4v vision attention (#12369 )	2024-11-08 17:01:57 +08:00
Qiyuan Gong	2dfcc36825	Fix trl version and padding in trl qlora example (#12368 ) * Change trl to 0.9.6 * Enable padding to avoid padding related errors.	2024-11-08 16:05:17 +08:00
Shaojun Liu	fad15c8ca0	Update fastchat demo script (#12367 ) * Update README.md * Update vllm_docker_quickstart.md	2024-11-08 15:42:17 +08:00
Yishuo Wang	51f7f87768	fix ipex 2.3 bug (#12366 )	2024-11-08 13:29:15 +08:00
Yina Chen	b2e69a896c	[NPU] Support Baichuan groupwise & gw code refactor (#12337 ) * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * update * update * baichuan support * code refactor * remove code * fix style * address comments * revert	2024-11-08 11:42:42 +08:00
binbin Deng	812d5cc32e	[NPU L0] Support llama3.2 in L0 pipeline (#12361 )	2024-11-08 10:01:23 +08:00
Xin Qiu	7ef7696956	update linux installation doc (#12365 ) * update linux doc * update	2024-11-08 09:44:58 +08:00
Yuwen Hu	8fe294e01f	Small fix to all-in-one benchmark (#12362 )	2024-11-07 18:56:34 +08:00
Yuwen Hu	1a6cbc473f	Add fused mlp optimizations to glm4 models (#12360 ) * Add fused mlp to glm4 models * Small fix	2024-11-07 18:52:47 +08:00
Xin Qiu	520af4e9b5	Update install_linux_gpu.md (#12353 )	2024-11-07 16:08:01 +08:00
Yishuo Wang	ad68c56573	small improvement (#12359 )	2024-11-07 15:57:41 +08:00
Jinhe	71ea539351	Add troubleshootings for ollama and llama.cpp (#12358 ) * add ollama troubleshoot en * zh ollama troubleshoot * llamacpp trouble shoot * llamacpp trouble shoot * fix * save gpu memory	2024-11-07 15:49:20 +08:00
Xu, Shuo	ce0c6ae423	Update Readme for FastChat docker demo (#12354 ) * update Readme for FastChat docker demo * update readme * add 'Serving with FastChat' part in docs * polish docs --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-11-07 15:22:42 +08:00
Yina Chen	d880e534d2	[NPU] acclib llama3.2 support groupwise (#12355 ) * change inter_pp * add comment	2024-11-07 11:19:55 +08:00
Jinhe	79f2877413	add minicpm-v models to `transformers_int4_npu_win` api (#12352 ) * add minicpm npu * optimize model	2024-11-07 10:05:10 +08:00
SONG Ge	a7b66683f1	[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339 ) * Add initial support for llama3.2-1b/3b * move llama3.2 support into current llama_mp impl	2024-11-06 19:21:40 +08:00
Yuwen Hu	872a74481a	Small optimization to glm4 models (#12351 )	2024-11-06 19:16:58 +08:00
Ruonan Wang	c267355b35	fix three NPU benchmark issues (#12350 ) * fix three issues * limit mixed_precision for CW only	2024-11-06 19:01:01 +08:00
Yina Chen	f24352aef9	llama 3.1/3.2 support compresskv (#12347 ) * llama 3.1/3.2 support compresskv * update * fix transformers 4.45 error * fix style * fix typo * disable llama3.2 1b compresskv	2024-11-06 17:33:43 +08:00
Jin, Qiao	d984c0672a	Add MiniCPM-V-2_6 to arc perf test (#12349 )	2024-11-06 16:32:28 +08:00
Yishuo Wang	e23ef7d088	optimize glm4v's vision part (#12346 )	2024-11-06 15:43:40 +08:00
Yishuo Wang	c8b7265359	Add basic glm4v support (#12345 )	2024-11-06 13:50:10 +08:00
binbin Deng	69e3a56943	[NPU] Hot fix of load_low_bit (#12344 )	2024-11-06 10:07:00 +08:00
Xu, Shuo	899a30331a	Replace gradio_web_server.patch to adjust webui (#12329 ) * replace gradio_web_server.patch to adjust webui * fix patch problem --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-11-06 09:16:32 +08:00
Jin, Qiao	7240c283a3	Add dummy model in iGPU perf (#12341 ) * Add dummy model in iGPU perf * Add dummy model in iGPU perf * Fix	2024-11-05 17:56:10 +08:00
Zhao Changmin	8e9a3a1158	fix chatglm2 cpu ut (#12336 )	2024-11-05 16:43:57 +08:00
Yina Chen	d872639395	[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327 ) * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * support minicpm 2b * fix style & error * fix style & update * remove print	2024-11-05 15:51:31 +08:00
Jin, Qiao	82a61b5cf3	Limit trl version in example (#12332 ) * Limit trl version in example * Limit trl version in example	2024-11-05 14:50:10 +08:00
Yuwen Hu	923d696854	Small fix to LNL performance tests (#12333 )	2024-11-05 13:24:58 +08:00
Zijie Li	45b0d371aa	update benchmark readme (#12323 ) * update benchmark readme update new comment with memory usage included * Update README.md	2024-11-05 08:19:08 +08:00
Yuwen Hu	e2adc974fd	Small fix to LNL performance tests (#12331 )	2024-11-04 19:22:41 +08:00
Yuwen Hu	522cdf8e9d	Add initial support for LNL nightly performance tests (#12326 ) * Add initial support for LNL nightly performance tests * Small fix	2024-11-04 18:53:51 +08:00
Zhao Changmin	1b637e4477	Add chatglm2&3 fuse mlp (#12328 ) * add chatglm fuse mlp	2024-11-04 18:04:41 +08:00
Yina Chen	94c4ce389f	[NPU] Add env to disable compile opt (#12330 ) * add env to disable compile opt * fix style * fix style	2024-11-04 17:46:17 +08:00
Ch1y0q	e54af44ed6	Add `transformers_int4_npu_pipeline_win` in all-in-one benchmark (#12325 ) * add transformers_int4_npu_pipeline_win * bugfix * bugfix: wrong actual_output_len * fix format * bugfix & update `README.md`	2024-11-04 16:00:20 +08:00
binbin Deng	5ee6f97d6f	[NPU L0] Add layernorm weight as const / input setting (#12322 )	2024-11-04 15:46:24 +08:00
Chu,Youcheng	a01371f90b	Doc: update harness readme (#12324 )	2024-11-04 14:58:54 +08:00
Yuwen Hu	4644cb640c	Perf test further fix regarding trl version (#12321 )	2024-11-04 11:01:25 +08:00
Ruonan Wang	8fe01c9e4d	[NPU pipeline] update cmake usage of pipeline (#12320 )	2024-11-04 10:30:03 +08:00
Kai Huang	c8679ad592	Qwen layernorm as input (#12309 ) * qwen layernorm as input * add group size	2024-11-04 09:51:15 +08:00
Yuwen Hu	94ce447794	Fix performance tests regarding `trl` version (#12319 ) * Fix performance tests regarding trl version * Small fix	2024-11-04 09:42:18 +08:00
Yuwen Hu	20755e8077	Small fix to all-in-one benchmark scripts (#12317 )	2024-11-01 19:16:25 +08:00
Ch1y0q	48123af463	add `npu_group_size` for `transformers_int4_npu_win` in all-in-one benchmark api (#12316 ) * add `npu_group_size` for `transformers_int4_npu_win` small bugfix * update	2024-11-01 18:44:27 +08:00
Zijie Li	cd5e22cee5	Update Llava GPU Example (#12311 ) * update-llava-example * add warmup * small fix on llava example * remove space& extra print prompt * renew example * small fix --------- Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>	2024-11-01 17:06:00 +08:00
binbin Deng	f53bb4ea0b	[NPU L0] Update 1st token generation (#12314 )	2024-11-01 17:02:07 +08:00
binbin Deng	d409d9d0eb	[NPU L0] Update streaming mode of example (#12312 )	2024-11-01 15:38:10 +08:00
Jin, Qiao	126f95be80	Fix DPO finetuning example (#12313 )	2024-11-01 13:29:44 +08:00
Yina Chen	05c5d0267a	[NPU] Llama2 prefill use ov sdp (#12310 ) * prefill use sdp * add param * update * fix style * fix style * meet comments	2024-11-01 11:05:20 +08:00

1 2 3 4 5 ...

3649 commits