ipex-llm

Author	SHA1	Message	Date
Yishuo Wang	a945500a98	fix internlm xcomposser stream chat (#11564 )	2024-07-11 18:21:17 +08:00
Cengguang Zhang	70ab1a6f1a	LLM: unify memory optimization env variables. (#11549 ) * LLM: unify memory optimization env variables. * fix comments.	2024-07-11 11:01:28 +08:00
Yishuo Wang	994e49a510	optimize internlm xcomposser performance again (#11551 )	2024-07-10 17:08:56 +08:00
binbin Deng	60de428b37	Support pipeline parallel for qwen-vl (#11503 )	2024-07-04 18:03:57 +08:00
Xin Qiu	f84ca99b9f	optimize gemma2 rmsnorm (#11500 )	2024-07-03 15:21:03 +08:00
binbin Deng	9274282ef7	Support pipeline parallel for glm-4-9b-chat (#11463 )	2024-07-03 14:25:28 +08:00
Shaojun Liu	ab9f7f3ac5	FIX: Qwen1.5-GPTQ-Int4 inference error (#11432 ) * merge_qkv if quant_method is 'gptq' * fix python style checks * refactor * update GPU example	2024-06-26 15:36:22 +08:00
Guancheng Fu	99cd16ef9f	Fix error while using pipeline parallism (#11434 )	2024-06-26 15:33:47 +08:00
Xin Qiu	9e4ee61737	rename BIGDL_OPTIMIZE_LM_HEAD to IPEX_LLM_LAST_LM_HEAD and add qwen2 (#11418 )	2024-06-24 18:42:37 +08:00
Yishuo Wang	abe53eaa4f	optimize qwen1.5/2 memory usage when running long input with fp16 (#11403 )	2024-06-24 13:43:04 +08:00
Guoqiong Song	7507000ef2	Fix 1383 Llama model on transformers=4.41[WIP] (#11280 )	2024-06-21 11:24:10 -07:00
Yishuo Wang	f0fdfa081b	Optimize qwen 1.5 14B batch performance (#11370 )	2024-06-20 17:23:39 +08:00
Qiyuan Gong	1eb884a249	IPEX Duplicate importer V2 (#11310 ) * Add gguf support. * Avoid error when import ipex-llm for multiple times. * Add check to avoid duplicate replace and revert. * Add calling from check to avoid raising exceptions in the submodule. * Add BIGDL_CHECK_DUPLICATE_IMPORT for controlling duplicate checker. Default is true.	2024-06-19 16:29:19 +08:00
Guoqiong Song	c44b1942ed	fix mistral for transformers>=4.39 (#11191 ) * fix mistral for transformers>=4.39	2024-06-18 13:39:35 -07:00
Yina Chen	5dad33e5af	Support fp8_e4m3 scale search (#11339 ) * fp8e4m3 switch off * fix style	2024-06-18 11:47:43 +08:00
Xin Qiu	183e0c6cf5	glm-4v-9b support (#11327 ) * chatglm4v support * fix style check * update glm4v	2024-06-17 13:52:37 +08:00
Yina Chen	0af0102e61	Add quantization scale search switch (#11326 ) * add scale_search switch * remove llama3 instruct * remove print	2024-06-14 18:46:52 +08:00
Yishuo Wang	5e25766855	fix and optimize chatglm2-32k and chatglm3-128k (#11306 )	2024-06-13 17:37:58 +08:00
Guancheng Fu	57a023aadc	Fix vllm tp (#11297 )	2024-06-13 10:47:48 +08:00
Yishuo Wang	10e480ee96	refactor internlm and internlm2 (#11274 )	2024-06-11 14:19:19 +08:00
Yishuo Wang	ea0d03fd28	Refactor baichuan1 7B and 13B (#11258 )	2024-06-07 14:29:20 +08:00
Yishuo Wang	ef8e9b2ecd	Refactor qwen2 moe (#11244 )	2024-06-07 13:14:54 +08:00
Xin Qiu	2f809116e2	optimize Chatglm4 (#11239 ) * chatglm4 * update * update * add rms norm * chatglm4	2024-06-06 18:25:20 +08:00
Yishuo Wang	2e4ccd541c	fix qwen2 cpu (#11240 )	2024-06-06 16:24:19 +08:00
Yishuo Wang	ba27e750b1	refactor yuan2 (#11235 )	2024-06-06 13:17:54 +08:00
Guoqiong Song	f6d5c6af78	fix issue 1407 (#11171 )	2024-06-05 13:35:57 -07:00
Xin Qiu	566691c5a3	quantized attention forward for minicpm (#11200 ) * quantized minicpm * fix style check	2024-06-05 09:15:25 +08:00
Jiao Wang	bb83bc23fd	Fix Starcoder issue on CPU on transformers 4.36+ (#11190 ) * fix starcoder for sdpa * update * style	2024-06-04 10:05:40 -07:00
Xiangyu Tian	ac3d53ff5d	LLM: Fix vLLM CPU version error (#11206 ) Fix vLLM CPU version error	2024-06-04 19:10:23 +08:00
Xin Qiu	5f13700c9f	optimize Minicpm (#11189 ) * minicpm optimize * update	2024-06-03 18:28:29 +08:00
ZehuaCao	4127b99ed6	Fix null pointer dereferences error. (#11125 ) * delete unused function on tgi_server * update * update * fix style	2024-05-30 16:16:10 +08:00
Guancheng Fu	50ee004ac7	Fix vllm condition (#11169 ) * add use-vllm * done * fix style * fix done	2024-05-30 15:23:17 +08:00
Zhao Changmin	65f4212f89	Fix qwen 14b run into register attention fwd (#11128 ) * fix qwen 14b	2024-05-24 14:45:07 +08:00
Yishuo Wang	797dbc48b8	fix phi-2 and phi-3 convert (#11116 )	2024-05-23 17:37:37 +08:00
Yishuo Wang	37b98a531f	support running internlm xcomposer2 on gpu and add sdp optimization (#11115 )	2024-05-23 17:26:24 +08:00
Zhao Changmin	c5e8b90c8d	Add Qwen register attention implemention (#11110 ) * qwen_register	2024-05-23 17:17:45 +08:00
Yishuo Wang	0e53f20edb	support running internlm-xcomposer2 on cpu (#11111 )	2024-05-23 16:36:09 +08:00
Yishuo Wang	cd4dff09ee	support phi-3 vision (#11101 )	2024-05-22 17:43:50 +08:00
Yishuo Wang	f00625f9a4	refactor qwen2 (#11087 )	2024-05-21 16:53:42 +08:00
Yishuo Wang	d830a63bb7	refactor qwen (#11074 )	2024-05-20 18:08:37 +08:00
Ruonan Wang	f1156e6b20	support gguf_q4k_m / gguf_q4k_s (#10887 ) * initial commit * UPDATE * fix style * fix style * add gguf_q4k_s * update comment * fix	2024-05-17 14:30:09 +08:00
Yishuo Wang	981d668be6	refactor baichuan2-7b (#11062 )	2024-05-17 13:01:34 +08:00
SONG Ge	192ae35012	Add support for llama2 quantize_kv with transformers 4.38.0 (#11054 ) * add support for llama2 quantize_kv with transformers 4.38.0 * fix code style * fix code style	2024-05-16 22:23:39 +08:00
Yishuo Wang	8cae897643	use new rope in phi3 (#11047 )	2024-05-16 15:12:35 +08:00
SONG Ge	9942a4ba69	[WIP] Support llama2 with transformers==4.38.0 (#11024 ) * support llama2 with transformers==4.38.0 * add supprot for quantize_qkv * add original support for 4.38.0 now * code style fix	2024-05-15 18:07:00 +08:00
Yishuo Wang	ee325e9cc9	fix phi3 (#11022 )	2024-05-15 09:32:12 +08:00
Zhao Changmin	0a732bebe7	Add phi3 cached RotaryEmbedding (#11013 ) * phi3cachedrotaryembed * pep8	2024-05-15 08:16:43 +08:00
Zhao Changmin	b03c859278	Add phi3RMS (#10988 ) * phi3RMS	2024-05-14 15:16:27 +08:00
Yishuo Wang	1b3c7a6928	remove phi3 empty cache (#10997 )	2024-05-13 14:09:55 +08:00
Kai Huang	a6342cc068	Empty cache after phi first attention to support 4k input (#10972 ) * empty cache * fix style	2024-05-09 19:50:04 +08:00

1 2

73 commits