ipex-llm

Author	SHA1	Message	Date
SONG Ge	ef4b6519fb	Add phi-3 model support for pipeline parallel inference (#11334 ) * add phi-3 model support * add phi3 example	2024-06-17 17:44:24 +08:00
hxsz1997	99b309928b	Add lookahead in test_api: transformer_int4_fp16_gpu (#11337 ) * add lookahead in test_api:transformer_int4_fp16_gpu * change the short prompt of summarize * change short prompt to cnn_64 * change short prompt of summarize	2024-06-17 17:41:41 +08:00
Qiyuan Gong	5d7c9bf901	Upgrade accelerate to 0.23.0 (#11331 ) * Upgrade accelerate to 0.23.0	2024-06-17 15:03:11 +08:00
Xin Qiu	183e0c6cf5	glm-4v-9b support (#11327 ) * chatglm4v support * fix style check * update glm4v	2024-06-17 13:52:37 +08:00
Wenjing Margaret Mao	bca5cbd96c	Modify arc nightly perf to fp16 (#11275 ) * change api * move to pr mode and remove the build * add batch4 yaml and remove the bigcode * remove batch4 * revert the starcode * remove the exclude * revert --------- Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>	2024-06-17 13:47:22 +08:00
binbin Deng	6ea1e71af0	Update PP inference benchmark script (#11323 )	2024-06-17 09:59:36 +08:00
SONG Ge	be00380f1a	Fix pipeline parallel inference past_key_value error in Baichuan (#11318 ) * fix past_key_value error * add baichuan2 example * fix style * update doc * add script link in doc * fix import error * update	2024-06-17 09:29:32 +08:00
Yina Chen	0af0102e61	Add quantization scale search switch (#11326 ) * add scale_search switch * remove llama3 instruct * remove print	2024-06-14 18:46:52 +08:00
Ruonan Wang	8a3247ac71	support batch forward for q4_k, q6_k (#11325 )	2024-06-14 18:25:50 +08:00
Yishuo Wang	e8dd8e97ef	fix chatglm lookahead on ARC (#11320 )	2024-06-14 16:26:11 +08:00
Shaojun Liu	f5ef94046e	exclude dolly-v2-12b for arc perf test (#11315 ) * test arc perf * test * test * exclude dolly-v2-12b:2048 * revert changes	2024-06-14 15:35:56 +08:00
Xiangyu Tian	4359ab3172	LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example (#11187 ) Add /generate_stream and OpenAI-formatted endpoint for Pipeline-Parallel-FastAPI example	2024-06-14 15:15:32 +08:00
Jin Qiao	0e7a31a09c	ChatGLM Examples Restructure regarding Installation Steps (#11285 ) * merge install step in glm examples * fix section * fix section * fix tiktoken	2024-06-14 12:37:05 +08:00
Yishuo Wang	91965b5d05	add glm_sdpa back to fix chatglm-6b (#11313 )	2024-06-14 10:31:43 +08:00
Yishuo Wang	7f65836cb9	fix chatglm2/3-32k/128k fp16 (#11311 )	2024-06-14 09:58:07 +08:00
Xin Qiu	1b0c4c8cb8	use new rotary two in chatglm4 (#11312 ) * use new rotary two in chatglm4 * rempve	2024-06-13 19:02:18 +08:00
Xin Qiu	f1410d6823	refactor chatglm4 (#11301 ) * glm4 * remove useless code * stype * add rope_ratio * update * fix fp16 * fix style	2024-06-13 18:06:04 +08:00
Yishuo Wang	5e25766855	fix and optimize chatglm2-32k and chatglm3-128k (#11306 )	2024-06-13 17:37:58 +08:00
binbin Deng	60cb1dac7c	Support PP for qwen1.5 (#11300 )	2024-06-13 17:35:24 +08:00
binbin Deng	f97cce2642	Fix import error of ds autotp (#11307 )	2024-06-13 16:22:52 +08:00
Jin Qiao	3682c6a979	add glm4 and qwen2 to igpu perf (#11304 )	2024-06-13 16:16:35 +08:00
Yishuo Wang	a24666b8f3	fix chatglm3-6b-32k (#11303 )	2024-06-13 16:01:34 +08:00
Yishuo Wang	01fe0fc1a2	refactor chatglm2/3 (#11290 )	2024-06-13 12:22:58 +08:00
Guancheng Fu	57a023aadc	Fix vllm tp (#11297 )	2024-06-13 10:47:48 +08:00
Ruonan Wang	986af21896	fix perf test(#11295 )	2024-06-13 10:35:48 +08:00
binbin Deng	220151e2a1	Refactor pipeline parallel multi-stage implementation (#11286 )	2024-06-13 10:00:23 +08:00
Ruonan Wang	14b1e6b699	Fix gguf_q4k (#11293 ) * udpate embedding parameter * update benchmark	2024-06-12 20:43:08 +08:00
Yuwen Hu	8edcdeb0e7	Fix bug that torch.ops.torch_ipex.matmul_bias_out cannot work on Linux MTL for short input (#11292 )	2024-06-12 19:12:57 +08:00
Wenjing Margaret Mao	b61f6e3ab1	Add update_parent_folder for nightly_perf_test (#11287 ) * add update_parent_folder and change the workflow file * add update_parent_folder and change the workflow file * move to pr mode and comment the test * use one model per comfig * revert --------- Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>	2024-06-12 17:58:13 +08:00
Xin Qiu	592f7aa61e	Refine glm1-4 sdp (#11276 ) * chatglm * update * update * change chatglm * update sdpa * update * fix style * fix * fix glm * update glm2-32k * update glm2-32k * fix cpu * update * change lower_bound	2024-06-12 17:11:56 +08:00
Yuwen Hu	cffb932f05	Expose timeout for streamer for fastchat worker (#11288 ) * Expose timeout for stremer for fastchat worker * Change to read from env variables	2024-06-12 17:02:40 +08:00
ivy-lv11	e7a4e2296f	Add Stable Diffusion examples on GPU and CPU (#11166 ) * add sdxl and lcm-lora * readme * modify * add cpu * add license * modify * add file	2024-06-12 16:33:25 +08:00
Jin Qiao	f224e98297	Add GLM-4 CPU example (#11223 ) * Add GLM-4 example * add tiktoken dependency * fix * fix	2024-06-12 15:30:51 +08:00
Zijie Li	40fc8704c4	Add GPU example for GLM-4 (#11267 ) * Add GPU example for GLM-4 * Update streamchat.py * Fix pretrianed arguments Fix pretrained arguments in generate and streamchat.py * Update Readme Update install tiktoken required for GLM-4 * Update comments in generate.py	2024-06-12 14:29:50 +08:00
Qiyuan Gong	0d9cc9c106	Remove duplicate check for ipex (#11281 ) * Replacing builtin.import is causing lots of unpredicted problems. Remove this function.	2024-06-12 13:52:02 +08:00
Yishuo Wang	10e480ee96	refactor internlm and internlm2 (#11274 )	2024-06-11 14:19:19 +08:00
Yuwen Hu	fac49f15e3	Remove manual importing ipex in all-in-one benchmark (#11272 )	2024-06-11 09:32:13 +08:00
Wenjing Margaret Mao	70b17c87be	Merge multiple batches (#11264 ) * add merge steps * move to pr mode * remove build + add merge.py * add tohtml and change cp * change test_batch folder path * change merge_temp path * change to html folder * revert * change place * revert 437 * revert space --------- Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>	2024-06-07 18:38:45 +08:00
Xiangyu Tian	4b07712fd8	LLM: Fix vLLM CPU model convert mismatch (#11254 ) Fix vLLM CPU model convert mismatch.	2024-06-07 15:54:34 +08:00
Yishuo Wang	42fab480ea	support stablm2 12b (#11265 )	2024-06-07 15:46:00 +08:00
Xin Qiu	dbc3c2d72d	glm4 sdp (#11253 ) * glm4 sdp * fix style * update comment	2024-06-07 15:42:23 +08:00
Xin Qiu	151fcf37bb	check devie name in use_flash_attention (#11263 )	2024-06-07 15:07:47 +08:00
Yishuo Wang	2623944604	qwen2 sdpa small fix (#11261 )	2024-06-07 14:42:18 +08:00
Yishuo Wang	ea0d03fd28	Refactor baichuan1 7B and 13B (#11258 )	2024-06-07 14:29:20 +08:00
Qiyuan Gong	1aa9c9597a	Avoid duplicate import in IPEX auto importer (#11227 ) * Add custom import to avoid ipex duplicate importing * Add scope limitation	2024-06-07 14:08:00 +08:00
Wang, Jian4	6f2684e5c9	Update pp llama.py to save memory (#11233 )	2024-06-07 13:18:16 +08:00
Yishuo Wang	ef8e9b2ecd	Refactor qwen2 moe (#11244 )	2024-06-07 13:14:54 +08:00
Zijie Li	7b753dc8ca	Update sample output for HF Qwen2 GPU and CPU (#11257 )	2024-06-07 11:36:22 +08:00
Zhao Changmin	b7948671de	[WIP] Add look up table in 1st token stage (#11193 ) * lookuptb	2024-06-07 10:51:05 +08:00
Yuwen Hu	8c36b5bdde	Add qwen2 example (#11252 ) * Add GPU example for Qwen2 * Update comments in README * Update README for Qwen2 GPU example * Add CPU example for Qwen2 Sample Output under README pending * Update generate.py and README for CPU Qwen2 * Update GPU example for Qwen2 * Small update * Small fix * Add Qwen2 table * Update README for Qwen2 CPU and GPU Update sample output under README --------- Co-authored-by: Zijie Li <michael20001122@gmail.com>	2024-06-07 10:29:33 +08:00

1 2 3 4 5 ...

1468 commits