Commit graph

20 commits

Author SHA1 Message Date
Wang, Jian4
16fa778e65
enable glm4v and gemma-3 on vllm 083 (#13114)
* enable glm4v and gemma-3

* update

* add qwen2.5-vl
2025-04-27 17:10:56 +08:00
Wang, Jian4
c9ecb7a113
Fix qwen nan value issue on vllm (#12971)
* add to fix qwen nan value issue

* update
2025-03-14 14:43:54 +08:00
Wang, Jian4
348dc8056d
Fix vllm gptq awq error (#12863)
* fix gptq awq error

* fix python style
2025-02-20 16:27:23 +08:00
Guancheng Fu
4eed0c7d99
initial implementation for low_bit_loader vLLM (#12838)
* initial

* add logic for handling tensor parallel models

* fix

* Add some comments

* add doc

* fix done
2025-02-19 19:45:34 +08:00
Wang, Jian4
e1809a6295
Update multimodal on vllm 0.6.6 (#12816)
* add glm4v and minicpmv example

* fix
2025-02-19 10:04:42 +08:00
Guancheng Fu
af693425f1
Upgrade to vLLM 0.6.6 (#12796)
* init

* update engine init

* fix serving load_in_low_bit problem

* temp

* temp

* temp

* temp

* temp

* fix

* fixed

* done

* fix

* fix all arguments

* fix

* fix throughput script

* fix

* fix

* use official ipex-llm

* Fix readme

* fix

---------

Co-authored-by: hzjane <a1015616934@qq.com>
2025-02-12 16:47:51 +08:00
Wang, Jian4
6711a48a36
Enable internvl2-8b on vllm(#12645) 2025-01-03 14:49:36 +08:00
Guancheng Fu
59bd4a214f
add vLLM glm4 fix (#12474) 2024-12-02 14:05:16 +08:00
Guancheng Fu
0ee54fc55f
Upgrade to vllm 0.6.2 (#12338)
* Initial updates for vllm 0.6.2

* fix

* Change Dockerfile to support v062

* Fix

* fix examples

* Fix

* done

* fix

* Update engine.py

* Fix Dockerfile to original path

* fix

* add option

* fix

* fix

* fix

* fix

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-11-12 20:35:34 +08:00
Wang, Jian4
d703e4f127
Enable vllm multimodal minicpm-v-2-6 (#12074)
* enable minicpm-v-2-6

* add image_url readme
2024-09-13 13:28:35 +08:00
Wang, Jian4
c75f3dd874
vllm no padding glm4 to avoid nan error (#12062)
* no padding glm4

* add codegeex
2024-09-11 13:44:40 +08:00
Wang, Jian4
30a8680645
Update for vllm one card padding (#12058) 2024-09-11 10:52:55 +08:00
Wang, Jian4
5d3ab16a80
Add vllm glm and baichuan padding (#12053) 2024-09-10 15:57:28 +08:00
Guancheng Fu
69c8d36f16
Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042)
* Enable single card sync engine

* enable ipex-llm optimizations for vllm

* enable optimizations for lm_head

* Fix chatglm multi-reference problem

* Remove duplicate layer

* LLM: Update vLLM to v0.5.4 (#11746)

* Enable single card sync engine

* enable ipex-llm optimizations for vllm

* enable optimizations for lm_head

* Fix chatglm multi-reference problem

* update 0.5.4 api_server

* add dockerfile

* fix

* fix

* refine

* fix

---------

Co-authored-by: gc-fu <guancheng.fu@intel.com>

* Add vllm-0.5.4 Dockerfile (#11838)

* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957)

* Fix vLLM not convert issues (#11817) (#11918)

* Fix not convert issues

* refine

Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>

* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969)

* init

* update mlp forward

* fix minicpm error in vllm 0.5.4

* fix dependabot alerts (#12008)

* Update 0.5.4 dockerfile (#12021)

* Add vllm awq loading logic (#11987)

* [ADD] Add vllm awq loading logic

* [FIX] fix the module.linear_method path

* [FIX] fix quant_config path error

* Enable Qwen padding mlp to 256 to support batch_forward (#12030)

* Enable padding mlp

* padding to 256

* update style

* Install 27191 runtime in 0.5.4 docker image (#12040)

* fix rebase error

* fix rebase error

* vLLM: format for 0.5.4 rebase (#12043)

* format

* Update model_convert.py

* Fix serving docker related modifications (#12046)

* Fix undesired modifications (#12048)

* fix

* Refine offline_inference arguments

---------

Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-09-10 15:37:43 +08:00
Wang, Jian4
2b993ad479
vllm update for glm-4 model automatic not_convert (#12003) 2024-09-04 13:50:32 +08:00
Wang, Jian4
7d103417b8
Fix glm4-9b-chat nan error on vllm 0.3.3 (#11970)
* fix nan value

* update
2024-08-30 09:50:18 +08:00
Guancheng Fu
537c0d2767
fix vllm qwen2 models (#11879) 2024-08-21 11:05:24 +08:00
Guancheng Fu
e70ae0638e
Fix vLLM not convert issues (#11817)
* Fix not convert issues

* refine
2024-08-15 19:04:05 +08:00
Guancheng Fu
06930ab258
Enable ipex-llm optimization for lm head (#11589)
* basic

* Modify convert.py

* fix
2024-07-16 16:48:44 +08:00
Xiangyu Tian
b3f6faa038
LLM: Add CPU vLLM entrypoint (#11083)
Add CPU vLLM entrypoint and update CPU vLLM serving example.
2024-05-24 09:16:59 +08:00
Renamed from python/llm/src/ipex_llm/vllm/model_convert.py (Browse further)