ipex-llm

Author	SHA1	Message	Date
Yishuo Wang	78d253165d	optimize qwen2 vl perf again (#12167 )	2024-10-09 16:43:48 +08:00
Yishuo Wang	644af2a76e	add basic llama 3.2 vision support (#12163 )	2024-10-08 10:46:48 +08:00
Yishuo Wang	584c3489e7	add basic support for llama3.2 (#12125 )	2024-09-26 15:46:19 +08:00
Yishuo Wang	77af9bc5fa	support passing None to low_bit in optimize_model (#12121 )	2024-09-26 11:09:35 +08:00
Yishuo Wang	9239fd4f12	add basic support and optimization for qwen2-vl (#12104 )	2024-09-20 17:23:06 +08:00
Wang, Jian4	40e463c66b	Enable vllm load gptq model (#12083 ) * enable vllm load gptq model * update * update * update * update style	2024-09-18 14:41:00 +08:00
Yishuo Wang	d8c044e79d	optimize minicpm3 kv cache (#12052 )	2024-09-10 16:51:21 +08:00
Guancheng Fu	69c8d36f16	Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042 ) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * Remove duplicate layer * LLM: Update vLLM to v0.5.4 (#11746) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * update 0.5.4 api_server * add dockerfile * fix * fix * refine * fix --------- Co-authored-by: gc-fu <guancheng.fu@intel.com> * Add vllm-0.5.4 Dockerfile (#11838) * Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957) * Fix vLLM not convert issues (#11817) (#11918) * Fix not convert issues * refine Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com> * Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969) * init * update mlp forward * fix minicpm error in vllm 0.5.4 * fix dependabot alerts (#12008) * Update 0.5.4 dockerfile (#12021) * Add vllm awq loading logic (#11987) * [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error * Enable Qwen padding mlp to 256 to support batch_forward (#12030) * Enable padding mlp * padding to 256 * update style * Install 27191 runtime in 0.5.4 docker image (#12040) * fix rebase error * fix rebase error * vLLM: format for 0.5.4 rebase (#12043) * format * Update model_convert.py * Fix serving docker related modifications (#12046) * Fix undesired modifications (#12048) * fix * Refine offline_inference arguments --------- Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com> Co-authored-by: Jun Wang <thoughts.times@gmail.com> Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com> Co-authored-by: liu-shaojun <johnssalyn@outlook.com> Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>	2024-09-10 15:37:43 +08:00
Yishuo Wang	abc370728c	optimize minicpm3 again (#12047 )	2024-09-10 14:19:57 +08:00
Yishuo Wang	048b4590aa	add basic minicpm3 optimization (#12039 )	2024-09-09 17:25:08 +08:00
Yuwen Hu	a9e485eb1b	Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer (#11963 ) * Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer * Style fixes	2024-08-29 19:22:09 +08:00
Guancheng Fu	0a7bd274e2	Add vllm awq loading logic (#11950 ) * add vllm awq loading logic * fix * refine	2024-08-28 16:46:18 +08:00
Yina Chen	23631cd357	disable lm_head opt for baichuan2-13b (#11905 )	2024-08-23 15:39:47 +08:00
hxsz1997	650e6e6ce4	Merge pull request #11891 from hxsz1997/baichuan2-compresskv Add compress_kv for Baichuan2	2024-08-23 06:09:58 +03:00
Huang, Xinshengzi	4cf03d6212	update baichuan-7b	2024-08-22 18:16:33 +08:00
Guancheng Fu	278b191dc1	Fix optimize lm head error (#11899 )	2024-08-22 17:45:26 +08:00
Huang, Xinshengzi	86248b0505	add compress_kv for baichuan2	2024-08-22 10:59:08 +08:00
Yina Chen	0236de3ac2	set IPEX_LLM_LAST_LM_HEAD=1 as default (#11885 )	2024-08-21 15:06:12 +08:00
Yishuo Wang	2946420e14	add minicpmv 2.6 load_low_bit workaround (#11856 )	2024-08-20 11:16:02 +08:00
Zhao Changmin	6841a9ac8f	fix load low bit com dtype (#11832 )	2024-08-19 13:43:19 +08:00
Yishuo Wang	e966e85df8	force lm_head optimization in any model if set environment variable (#11830 )	2024-08-16 16:48:45 +08:00
Yishuo Wang	17a0beb21f	optimize qwen2-audio again (#11825 )	2024-08-16 11:11:35 +08:00
Guancheng Fu	e70ae0638e	Fix vLLM not convert issues (#11817 ) * Fix not convert issues * refine	2024-08-15 19:04:05 +08:00
Yishuo Wang	750d4ad5dc	fix minicpm-v-2 fp16 (#11819 )	2024-08-15 18:34:40 +08:00
Yishuo Wang	4e178f0c5d	rewrite minicpmv optimization (#11816 )	2024-08-15 17:27:12 +08:00
Yishuo Wang	07b7f13982	support and optimize qwen2-audio (#11809 )	2024-08-15 14:59:04 +08:00
Yishuo Wang	9a93808fc5	fix and optimize minicpm v 2 (#11799 )	2024-08-14 17:27:23 +08:00
Yishuo Wang	3d6cfa291d	optimize minicpm v 2.5 (#11793 )	2024-08-14 16:07:24 +08:00
Yishuo Wang	cb79dcda93	refactor llama convert to fix minicpm-v 2.5 optimization (#11783 )	2024-08-14 09:29:57 +08:00
Yishuo Wang	a184b120c9	fix minicpm-v 2.5 (#11780 )	2024-08-13 16:14:00 +08:00
Yishuo Wang	a1eb793f70	optimize minicpm v 2_6 firs token perf (#11770 )	2024-08-13 09:51:18 +08:00
Yishuo Wang	54cc9353db	support and optimize minicpm-v-2_6 (#11738 )	2024-08-07 18:21:16 +08:00
Ruonan Wang	00a5574c8a	Use `merge_qkv` to replace `fused_qkv` for llama2 (#11727 ) * update 4.38 * support new versions * update * fix style * fix style * update rope * temp test sdpa * fix style * fix cpu ut	2024-08-07 18:04:01 +08:00
Yishuo Wang	bbdff6edeb	optimize internvl2 4b performance (#11720 )	2024-08-06 14:25:08 +08:00
Yishuo Wang	f44b732aa8	support internvl2-4b (#11718 )	2024-08-06 13:36:32 +08:00
Ruonan Wang	aa98ef96fe	change mixed_precision to q6_k (#11706 )	2024-08-02 15:55:16 +08:00
Guancheng Fu	afeca38a47	Fix import vllm condition (#11682 )	2024-07-31 13:50:01 +08:00
Ruonan Wang	54bf3a23a6	add fallback for unsupported k-quants (#11691 ) * add fallback * fix style * fix	2024-07-31 11:39:58 +08:00
Yishuo Wang	c02003925b	add mlp for gemma2 (#11678 )	2024-07-29 16:10:23 +08:00
Yishuo Wang	6f999e6e90	add sdp for gemma2 (#11677 )	2024-07-29 15:15:47 +08:00
Yishuo Wang	7f88ce23cd	add more gemma2 optimization (#11673 )	2024-07-29 11:13:00 +08:00
Yishuo Wang	3e8819734b	add basic gemma2 optimization (#11672 )	2024-07-29 10:46:51 +08:00
Yina Chen	fc7f8feb83	Support compress kv (#11642 ) * mistral snapkv * update * mtl update * update * update * update * add comments * style fix * fix style * support llama * llama use compress kv * support mistral 4.40 * fix style * support diff transformers versions * move snapkv util to kv * fix style * meet comments & small fix * revert all in one * fix indent --------- Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2024-07-26 16:02:00 +08:00
Guancheng Fu	a4d30a8211	Change logic for detecting if vllm is available (#11657 ) * fix * fix	2024-07-25 15:24:19 +08:00
Xiangyu Tian	4499d25c26	LLM: Fix ParallelLMHead convert in vLLM cpu (#11654 )	2024-07-25 13:07:19 +08:00
Yishuo Wang	1b3b46e54d	fix chatglm new model (#11639 )	2024-07-23 13:44:56 +08:00
Yishuo Wang	d020ad6397	add save_low_bit support for DiskEmbedding (#11621 )	2024-07-19 10:34:53 +08:00
Guoqiong Song	380717f50d	fix gemma for 4.41 (#11531 ) * fix gemma for 4.41	2024-07-18 15:02:50 -07:00
Guoqiong Song	5a6211fd56	fix minicpm for transformers>=4.39 (#11533 ) * fix minicpm for transformers>=4.39	2024-07-18 15:01:57 -07:00
Yishuo Wang	0209427cf4	Add disk_embedding parameter to support put Embedding layer on CPU (#11617 )	2024-07-18 17:06:06 +08:00

1 2 3

127 commits