ipex-llm

Author	SHA1	Message	Date
Guoqiong Song	c44b1942ed	fix mistral for transformers>=4.39 (#11191 ) * fix mistral for transformers>=4.39	2024-06-18 13:39:35 -07:00
Yishuo Wang	83082e5cc7	add initial support for intel npu acceleration library (#11347 )	2024-06-18 16:07:16 +08:00
Yina Chen	5dad33e5af	Support fp8_e4m3 scale search (#11339 ) * fp8e4m3 switch off * fix style	2024-06-18 11:47:43 +08:00
binbin Deng	e50c890e1f	Support finishing PP inference once `eos_token_id` is found (#11336 )	2024-06-18 09:55:40 +08:00
SONG Ge	ef4b6519fb	Add phi-3 model support for pipeline parallel inference (#11334 ) * add phi-3 model support * add phi3 example	2024-06-17 17:44:24 +08:00
Xin Qiu	183e0c6cf5	glm-4v-9b support (#11327 ) * chatglm4v support * fix style check * update glm4v	2024-06-17 13:52:37 +08:00
binbin Deng	6ea1e71af0	Update PP inference benchmark script (#11323 )	2024-06-17 09:59:36 +08:00
SONG Ge	be00380f1a	Fix pipeline parallel inference past_key_value error in Baichuan (#11318 ) * fix past_key_value error * add baichuan2 example * fix style * update doc * add script link in doc * fix import error * update	2024-06-17 09:29:32 +08:00
Yina Chen	0af0102e61	Add quantization scale search switch (#11326 ) * add scale_search switch * remove llama3 instruct * remove print	2024-06-14 18:46:52 +08:00
Ruonan Wang	8a3247ac71	support batch forward for q4_k, q6_k (#11325 )	2024-06-14 18:25:50 +08:00
Yishuo Wang	e8dd8e97ef	fix chatglm lookahead on ARC (#11320 )	2024-06-14 16:26:11 +08:00
Yishuo Wang	91965b5d05	add glm_sdpa back to fix chatglm-6b (#11313 )	2024-06-14 10:31:43 +08:00
Yishuo Wang	7f65836cb9	fix chatglm2/3-32k/128k fp16 (#11311 )	2024-06-14 09:58:07 +08:00
Xin Qiu	1b0c4c8cb8	use new rotary two in chatglm4 (#11312 ) * use new rotary two in chatglm4 * rempve	2024-06-13 19:02:18 +08:00
Xin Qiu	f1410d6823	refactor chatglm4 (#11301 ) * glm4 * remove useless code * stype * add rope_ratio * update * fix fp16 * fix style	2024-06-13 18:06:04 +08:00
Yishuo Wang	5e25766855	fix and optimize chatglm2-32k and chatglm3-128k (#11306 )	2024-06-13 17:37:58 +08:00
binbin Deng	60cb1dac7c	Support PP for qwen1.5 (#11300 )	2024-06-13 17:35:24 +08:00
Yishuo Wang	a24666b8f3	fix chatglm3-6b-32k (#11303 )	2024-06-13 16:01:34 +08:00
Yishuo Wang	01fe0fc1a2	refactor chatglm2/3 (#11290 )	2024-06-13 12:22:58 +08:00
Guancheng Fu	57a023aadc	Fix vllm tp (#11297 )	2024-06-13 10:47:48 +08:00
binbin Deng	220151e2a1	Refactor pipeline parallel multi-stage implementation (#11286 )	2024-06-13 10:00:23 +08:00
Ruonan Wang	14b1e6b699	Fix gguf_q4k (#11293 ) * udpate embedding parameter * update benchmark	2024-06-12 20:43:08 +08:00
Yuwen Hu	8edcdeb0e7	Fix bug that torch.ops.torch_ipex.matmul_bias_out cannot work on Linux MTL for short input (#11292 )	2024-06-12 19:12:57 +08:00
Xin Qiu	592f7aa61e	Refine glm1-4 sdp (#11276 ) * chatglm * update * update * change chatglm * update sdpa * update * fix style * fix * fix glm * update glm2-32k * update glm2-32k * fix cpu * update * change lower_bound	2024-06-12 17:11:56 +08:00
Yishuo Wang	10e480ee96	refactor internlm and internlm2 (#11274 )	2024-06-11 14:19:19 +08:00
Yishuo Wang	42fab480ea	support stablm2 12b (#11265 )	2024-06-07 15:46:00 +08:00
Xin Qiu	dbc3c2d72d	glm4 sdp (#11253 ) * glm4 sdp * fix style * update comment	2024-06-07 15:42:23 +08:00
Xin Qiu	151fcf37bb	check devie name in use_flash_attention (#11263 )	2024-06-07 15:07:47 +08:00
Yishuo Wang	2623944604	qwen2 sdpa small fix (#11261 )	2024-06-07 14:42:18 +08:00
Yishuo Wang	ea0d03fd28	Refactor baichuan1 7B and 13B (#11258 )	2024-06-07 14:29:20 +08:00
Yishuo Wang	ef8e9b2ecd	Refactor qwen2 moe (#11244 )	2024-06-07 13:14:54 +08:00
Zhao Changmin	b7948671de	[WIP] Add look up table in 1st token stage (#11193 ) * lookuptb	2024-06-07 10:51:05 +08:00
Xin Qiu	2f809116e2	optimize Chatglm4 (#11239 ) * chatglm4 * update * update * add rms norm * chatglm4	2024-06-06 18:25:20 +08:00
Yishuo Wang	2e4ccd541c	fix qwen2 cpu (#11240 )	2024-06-06 16:24:19 +08:00
Yishuo Wang	e738ec38f4	disable quantize kv in specific qwen model (#11238 )	2024-06-06 14:08:39 +08:00
Yishuo Wang	c4e5806e01	add latest optimization in starcoder2 (#11236 )	2024-06-06 14:02:17 +08:00
Yishuo Wang	ba27e750b1	refactor yuan2 (#11235 )	2024-06-06 13:17:54 +08:00
Guoqiong Song	f6d5c6af78	fix issue 1407 (#11171 )	2024-06-05 13:35:57 -07:00
Yina Chen	ed67435491	Support Fp6 k in ipex-llm (#11222 ) * support fp6_k * support fp6_k * remove * fix style	2024-06-05 17:34:36 +08:00
binbin Deng	a6674f5bce	Fix `should_use_fuse_rope` error of Qwen1.5-MoE-A2.7B-Chat (#11216 )	2024-06-05 15:56:10 +08:00
Xin Qiu	566691c5a3	quantized attention forward for minicpm (#11200 ) * quantized minicpm * fix style check	2024-06-05 09:15:25 +08:00
Jiao Wang	bb83bc23fd	Fix Starcoder issue on CPU on transformers 4.36+ (#11190 ) * fix starcoder for sdpa * update * style	2024-06-04 10:05:40 -07:00
Xiangyu Tian	ac3d53ff5d	LLM: Fix vLLM CPU version error (#11206 ) Fix vLLM CPU version error	2024-06-04 19:10:23 +08:00
Ruonan Wang	1dde204775	update q6k (#11205 )	2024-06-04 17:14:33 +08:00
Yishuo Wang	6454655dcc	use sdp in baichuan2 13b (#11198 )	2024-06-04 15:39:00 +08:00
Yishuo Wang	d90cd977d0	refactor stablelm (#11195 )	2024-06-04 13:14:43 +08:00
Xin Qiu	5f13700c9f	optimize Minicpm (#11189 ) * minicpm optimize * update	2024-06-03 18:28:29 +08:00
Shaojun Liu	401013a630	Remove chatglm_C Module to Eliminate LGPL Dependency (#11178 ) * remove chatglm_C.*.pyd to solve ngsolve weak copyright vunl fix style check error * remove chatglm native int4 from langchain	2024-05-31 17:03:11 +08:00
Ruonan Wang	50b5f4476f	update q4k convert (#11179 )	2024-05-31 11:36:53 +08:00
ZehuaCao	4127b99ed6	Fix null pointer dereferences error. (#11125 ) * delete unused function on tgi_server * update * update * fix style	2024-05-30 16:16:10 +08:00

1 2 3 4 5

203 commits