Wang, Jian4
|
d703e4f127
|
Enable vllm multimodal minicpm-v-2-6 (#12074)
* enable minicpm-v-2-6
* add image_url readme
|
2024-09-13 13:28:35 +08:00 |
|
Wang, Jian4
|
c75f3dd874
|
vllm no padding glm4 to avoid nan error (#12062)
* no padding glm4
* add codegeex
|
2024-09-11 13:44:40 +08:00 |
|
Wang, Jian4
|
30a8680645
|
Update for vllm one card padding (#12058)
|
2024-09-11 10:52:55 +08:00 |
|
Wang, Jian4
|
5d3ab16a80
|
Add vllm glm and baichuan padding (#12053)
|
2024-09-10 15:57:28 +08:00 |
|
Guancheng Fu
|
69c8d36f16
|
Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042)
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* Remove duplicate layer
* LLM: Update vLLM to v0.5.4 (#11746)
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* update 0.5.4 api_server
* add dockerfile
* fix
* fix
* refine
* fix
---------
Co-authored-by: gc-fu <guancheng.fu@intel.com>
* Add vllm-0.5.4 Dockerfile (#11838)
* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957)
* Fix vLLM not convert issues (#11817) (#11918)
* Fix not convert issues
* refine
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969)
* init
* update mlp forward
* fix minicpm error in vllm 0.5.4
* fix dependabot alerts (#12008)
* Update 0.5.4 dockerfile (#12021)
* Add vllm awq loading logic (#11987)
* [ADD] Add vllm awq loading logic
* [FIX] fix the module.linear_method path
* [FIX] fix quant_config path error
* Enable Qwen padding mlp to 256 to support batch_forward (#12030)
* Enable padding mlp
* padding to 256
* update style
* Install 27191 runtime in 0.5.4 docker image (#12040)
* fix rebase error
* fix rebase error
* vLLM: format for 0.5.4 rebase (#12043)
* format
* Update model_convert.py
* Fix serving docker related modifications (#12046)
* Fix undesired modifications (#12048)
* fix
* Refine offline_inference arguments
---------
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
|
2024-09-10 15:37:43 +08:00 |
|
Wang, Jian4
|
2b993ad479
|
vllm update for glm-4 model automatic not_convert (#12003)
|
2024-09-04 13:50:32 +08:00 |
|
Wang, Jian4
|
7d103417b8
|
Fix glm4-9b-chat nan error on vllm 0.3.3 (#11970)
* fix nan value
* update
|
2024-08-30 09:50:18 +08:00 |
|
Xiangyu Tian
|
7ca557aada
|
LLM: Fix vLLM CPU convert error (#11926)
|
2024-08-27 09:22:19 +08:00 |
|
Guancheng Fu
|
537c0d2767
|
fix vllm qwen2 models (#11879)
|
2024-08-21 11:05:24 +08:00 |
|
Guancheng Fu
|
e70ae0638e
|
Fix vLLM not convert issues (#11817)
* Fix not convert issues
* refine
|
2024-08-15 19:04:05 +08:00 |
|
Xiangyu Tian
|
044e486480
|
Fix vLLM CPU /chat endpoint (#11748)
|
2024-08-09 10:33:52 +08:00 |
|
Guancheng Fu
|
06930ab258
|
Enable ipex-llm optimization for lm head (#11589)
* basic
* Modify convert.py
* fix
|
2024-07-16 16:48:44 +08:00 |
|
Xiangyu Tian
|
b30bf7648e
|
Fix vLLM CPU api_server params (#11384)
|
2024-06-21 13:00:06 +08:00 |
|
Xiangyu Tian
|
4b07712fd8
|
LLM: Fix vLLM CPU model convert mismatch (#11254)
Fix vLLM CPU model convert mismatch.
|
2024-06-07 15:54:34 +08:00 |
|
Xiangyu Tian
|
ac3d53ff5d
|
LLM: Fix vLLM CPU version error (#11206)
Fix vLLM CPU version error
|
2024-06-04 19:10:23 +08:00 |
|
Xiangyu Tian
|
b3f6faa038
|
LLM: Add CPU vLLM entrypoint (#11083)
Add CPU vLLM entrypoint and update CPU vLLM serving example.
|
2024-05-24 09:16:59 +08:00 |
|
Guancheng Fu
|
990535b1cf
|
Add tensor parallel for vLLM (#10879)
* initial
* test initial tp
* initial sup
* fix format
* fix
* fix
|
2024-04-26 17:10:49 +08:00 |
|
Guancheng Fu
|
47bd5f504c
|
[vLLM]Remove vllm-v1, refactor v2 (#10842)
* remove vllm-v1
* fix format
|
2024-04-22 17:51:32 +08:00 |
|
Xiangyu Tian
|
08018a18df
|
Remove not-imported MistralConfig (#10670)
|
2024-04-07 10:32:05 +08:00 |
|
Jiao Wang
|
69bdbf5806
|
Fix vllm print error message issue (#10664)
* update chatglm readme
* Add condition to invalidInputError
* update
* update
* style
|
2024-04-05 15:08:13 -07:00 |
|
Shaojun Liu
|
a10f5a1b8d
|
add python style check (#10620)
* add python style check
* fix style checks
* update runner
* add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow
* update tag to 2.1.0-SNAPSHOT
|
2024-04-02 16:17:56 +08:00 |
|
Wang, Jian4
|
9df70d95eb
|
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
|
2024-03-22 15:41:21 +08:00 |
|