ipex-llm

Author	SHA1	Message	Date
Wang, Jian4	d703e4f127	Enable vllm multimodal minicpm-v-2-6 (#12074 ) * enable minicpm-v-2-6 * add image_url readme	2024-09-13 13:28:35 +08:00
Wang, Jian4	c75f3dd874	vllm no padding glm4 to avoid nan error (#12062 ) * no padding glm4 * add codegeex	2024-09-11 13:44:40 +08:00
Wang, Jian4	30a8680645	Update for vllm one card padding (#12058 )	2024-09-11 10:52:55 +08:00
Wang, Jian4	5d3ab16a80	Add vllm glm and baichuan padding (#12053 )	2024-09-10 15:57:28 +08:00
Guancheng Fu	69c8d36f16	Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042 ) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * Remove duplicate layer * LLM: Update vLLM to v0.5.4 (#11746) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * update 0.5.4 api_server * add dockerfile * fix * fix * refine * fix --------- Co-authored-by: gc-fu <guancheng.fu@intel.com> * Add vllm-0.5.4 Dockerfile (#11838) * Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957) * Fix vLLM not convert issues (#11817) (#11918) * Fix not convert issues * refine Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com> * Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969) * init * update mlp forward * fix minicpm error in vllm 0.5.4 * fix dependabot alerts (#12008) * Update 0.5.4 dockerfile (#12021) * Add vllm awq loading logic (#11987) * [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error * Enable Qwen padding mlp to 256 to support batch_forward (#12030) * Enable padding mlp * padding to 256 * update style * Install 27191 runtime in 0.5.4 docker image (#12040) * fix rebase error * fix rebase error * vLLM: format for 0.5.4 rebase (#12043) * format * Update model_convert.py * Fix serving docker related modifications (#12046) * Fix undesired modifications (#12048) * fix * Refine offline_inference arguments --------- Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com> Co-authored-by: Jun Wang <thoughts.times@gmail.com> Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com> Co-authored-by: liu-shaojun <johnssalyn@outlook.com> Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>	2024-09-10 15:37:43 +08:00
Wang, Jian4	2b993ad479	vllm update for glm-4 model automatic not_convert (#12003 )	2024-09-04 13:50:32 +08:00
Wang, Jian4	7d103417b8	Fix glm4-9b-chat nan error on vllm 0.3.3 (#11970 ) * fix nan value * update	2024-08-30 09:50:18 +08:00
Xiangyu Tian	7ca557aada	LLM: Fix vLLM CPU convert error (#11926 )	2024-08-27 09:22:19 +08:00
Guancheng Fu	537c0d2767	fix vllm qwen2 models (#11879 )	2024-08-21 11:05:24 +08:00
Guancheng Fu	e70ae0638e	Fix vLLM not convert issues (#11817 ) * Fix not convert issues * refine	2024-08-15 19:04:05 +08:00
Xiangyu Tian	044e486480	Fix vLLM CPU /chat endpoint (#11748 )	2024-08-09 10:33:52 +08:00
Guancheng Fu	06930ab258	Enable ipex-llm optimization for lm head (#11589 ) * basic * Modify convert.py * fix	2024-07-16 16:48:44 +08:00
Xiangyu Tian	b30bf7648e	Fix vLLM CPU api_server params (#11384 )	2024-06-21 13:00:06 +08:00
Xiangyu Tian	4b07712fd8	LLM: Fix vLLM CPU model convert mismatch (#11254 ) Fix vLLM CPU model convert mismatch.	2024-06-07 15:54:34 +08:00
Xiangyu Tian	ac3d53ff5d	LLM: Fix vLLM CPU version error (#11206 ) Fix vLLM CPU version error	2024-06-04 19:10:23 +08:00
Xiangyu Tian	b3f6faa038	LLM: Add CPU vLLM entrypoint (#11083 ) Add CPU vLLM entrypoint and update CPU vLLM serving example.	2024-05-24 09:16:59 +08:00
Guancheng Fu	990535b1cf	Add tensor parallel for vLLM (#10879 ) * initial * test initial tp * initial sup * fix format * fix * fix	2024-04-26 17:10:49 +08:00
Guancheng Fu	47bd5f504c	[vLLM]Remove vllm-v1, refactor v2 (#10842 ) * remove vllm-v1 * fix format	2024-04-22 17:51:32 +08:00
Xiangyu Tian	08018a18df	Remove not-imported MistralConfig (#10670 )	2024-04-07 10:32:05 +08:00
Jiao Wang	69bdbf5806	Fix vllm print error message issue (#10664 ) * update chatglm readme * Add condition to invalidInputError * update * update * style	2024-04-05 15:08:13 -07:00
Shaojun Liu	a10f5a1b8d	add python style check (#10620 ) * add python style check * fix style checks * update runner * add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow * update tag to 2.1.0-SNAPSHOT	2024-04-02 16:17:56 +08:00
Wang, Jian4	9df70d95eb	Refactor bigdl.llm to ipex_llm (#24 ) * Rename bigdl/llm to ipex_llm * rm python/llm/src/bigdl * from bigdl.llm to from ipex_llm	2024-03-22 15:41:21 +08:00

22 commits