Yishuo Wang
|
ea65e4fecc
|
remove falcon support and related UT (#12656)
|
2025-01-07 09:26:00 +08:00 |
|
Yishuo Wang
|
502461d836
|
remove unnecessary ipex kernel usage (#12649)
|
2025-01-03 16:45:24 +08:00 |
|
Yina Chen
|
8e5328e9b4
|
add disable opts for awq (#12641)
|
2025-01-02 15:45:22 +08:00 |
|
Yishuo Wang
|
2d08155513
|
remove bmm, which is only required in ipex 2.0 (#12630)
|
2024-12-27 17:28:57 +08:00 |
|
Yishuo Wang
|
c72a5db757
|
remove unused code again (#12624)
|
2024-12-27 14:17:11 +08:00 |
|
Yishuo Wang
|
1604b4ead8
|
small fix (#12616)
|
2024-12-26 11:35:12 +08:00 |
|
Yishuo Wang
|
6249c1e373
|
rewrite llama optimization (#12609)
|
2024-12-25 17:04:32 +08:00 |
|
Yishuo Wang
|
073f936c37
|
refactor mistral and phi3 (#12605)
|
2024-12-24 17:52:32 +08:00 |
|
Yishuo Wang
|
3eeb02f1be
|
support Megrez-3B-Omni (#12582)
|
2024-12-19 17:23:01 +08:00 |
|
Yishuo Wang
|
a608f26cc8
|
use new fused layer norm (#12553)
|
2024-12-17 13:52:35 +08:00 |
|
Yishuo Wang
|
ffce86d69f
|
add basic glm-edge-v support (#12533)
|
2024-12-12 17:25:48 +08:00 |
|
Yishuo Wang
|
3e0823d2ae
|
add basic glm-edge support (#12531)
|
2024-12-12 16:02:22 +08:00 |
|
Yishuo Wang
|
77404d2a63
|
support new model (#12523)
|
2024-12-11 13:41:15 +08:00 |
|
Yishuo Wang
|
a9e3f7f14c
|
optimize minicpm (#12496)
|
2024-12-04 17:14:16 +08:00 |
|
Yishuo Wang
|
6f3441ba4c
|
fix glm4-9b overflow (#12455)
|
2024-11-27 17:39:13 +08:00 |
|
Yishuo Wang
|
cdd41f5e4c
|
optimize sdxl again (#12441)
|
2024-11-25 17:46:46 +08:00 |
|
Yishuo Wang
|
8164aed802
|
small change (#12439)
|
2024-11-25 14:35:49 +08:00 |
|
Yishuo Wang
|
be132c4209
|
fix and optimize sd (#12436)
|
2024-11-25 14:09:48 +08:00 |
|
Yuwen Hu
|
e0918934c8
|
Add fused_mlp to glm4v models (#12378)
|
2024-11-11 17:10:25 +08:00 |
|
Yuwen Hu
|
1a6cbc473f
|
Add fused mlp optimizations to glm4 models (#12360)
* Add fused mlp to glm4 models
* Small fix
|
2024-11-07 18:52:47 +08:00 |
|
Yuwen Hu
|
872a74481a
|
Small optimization to glm4 models (#12351)
|
2024-11-06 19:16:58 +08:00 |
|
Yishuo Wang
|
e23ef7d088
|
optimize glm4v's vision part (#12346)
|
2024-11-06 15:43:40 +08:00 |
|
Yishuo Wang
|
c8b7265359
|
Add basic glm4v support (#12345)
|
2024-11-06 13:50:10 +08:00 |
|
Zhao Changmin
|
1b637e4477
|
Add chatglm2&3 fuse mlp (#12328)
* add chatglm fuse mlp
|
2024-11-04 18:04:41 +08:00 |
|
Xin Qiu
|
97a0f7fd35
|
Codegeex support (#12303)
* new codegeex attn
* use kv cache
* add compress/quantize kv
* remove compress/quantize kv
* fix style check
* fix style
* fix codegeex
|
2024-10-31 15:28:56 +08:00 |
|
Yuwen Hu
|
43b25a2fe7
|
Fix llama 3.2 vision on LNL (#12264)
* Fix llama 3.2 vision on LNL
* Small fix
|
2024-10-25 16:23:31 +08:00 |
|
Yishuo Wang
|
f3a2b20e6b
|
Optimize gpt2 (#12259)
|
2024-10-24 13:44:24 +08:00 |
|
Yuwen Hu
|
b3df47486d
|
Fix Gemma 2 on LNL (#12240)
* Fix gemma 2 on LNL
* Python style fix
|
2024-10-21 18:25:53 +08:00 |
|
Yishuo Wang
|
a4a758656a
|
refactor gemma to reduce old fuse rope usage (#12215)
|
2024-10-16 17:40:28 +08:00 |
|
Yishuo Wang
|
e279148aa0
|
optimize llama3.2 vision again (#12211)
|
2024-10-16 14:29:48 +08:00 |
|
Yishuo Wang
|
d5344587ab
|
optimize internvl2 vision model's attention (#12198)
|
2024-10-15 10:51:00 +08:00 |
|
Yuwen Hu
|
f8d1adc573
|
Fix Llama 3.2 & 3.1 on LNL (#12196)
|
2024-10-14 17:39:20 +08:00 |
|
Yishuo Wang
|
535bee5381
|
fix qwen2 vl again (#12174)
|
2024-10-10 13:50:01 +08:00 |
|
Yishuo Wang
|
78d253165d
|
optimize qwen2 vl perf again (#12167)
|
2024-10-09 16:43:48 +08:00 |
|
Yishuo Wang
|
644af2a76e
|
add basic llama 3.2 vision support (#12163)
|
2024-10-08 10:46:48 +08:00 |
|
Yishuo Wang
|
584c3489e7
|
add basic support for llama3.2 (#12125)
|
2024-09-26 15:46:19 +08:00 |
|
Yishuo Wang
|
77af9bc5fa
|
support passing None to low_bit in optimize_model (#12121)
|
2024-09-26 11:09:35 +08:00 |
|
Yishuo Wang
|
9239fd4f12
|
add basic support and optimization for qwen2-vl (#12104)
|
2024-09-20 17:23:06 +08:00 |
|
Wang, Jian4
|
40e463c66b
|
Enable vllm load gptq model (#12083)
* enable vllm load gptq model
* update
* update
* update
* update style
|
2024-09-18 14:41:00 +08:00 |
|
Yishuo Wang
|
d8c044e79d
|
optimize minicpm3 kv cache (#12052)
|
2024-09-10 16:51:21 +08:00 |
|
Guancheng Fu
|
69c8d36f16
|
Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042)
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* Remove duplicate layer
* LLM: Update vLLM to v0.5.4 (#11746)
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* update 0.5.4 api_server
* add dockerfile
* fix
* fix
* refine
* fix
---------
Co-authored-by: gc-fu <guancheng.fu@intel.com>
* Add vllm-0.5.4 Dockerfile (#11838)
* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957)
* Fix vLLM not convert issues (#11817) (#11918)
* Fix not convert issues
* refine
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969)
* init
* update mlp forward
* fix minicpm error in vllm 0.5.4
* fix dependabot alerts (#12008)
* Update 0.5.4 dockerfile (#12021)
* Add vllm awq loading logic (#11987)
* [ADD] Add vllm awq loading logic
* [FIX] fix the module.linear_method path
* [FIX] fix quant_config path error
* Enable Qwen padding mlp to 256 to support batch_forward (#12030)
* Enable padding mlp
* padding to 256
* update style
* Install 27191 runtime in 0.5.4 docker image (#12040)
* fix rebase error
* fix rebase error
* vLLM: format for 0.5.4 rebase (#12043)
* format
* Update model_convert.py
* Fix serving docker related modifications (#12046)
* Fix undesired modifications (#12048)
* fix
* Refine offline_inference arguments
---------
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
|
2024-09-10 15:37:43 +08:00 |
|
Yishuo Wang
|
abc370728c
|
optimize minicpm3 again (#12047)
|
2024-09-10 14:19:57 +08:00 |
|
Yishuo Wang
|
048b4590aa
|
add basic minicpm3 optimization (#12039)
|
2024-09-09 17:25:08 +08:00 |
|
Yuwen Hu
|
a9e485eb1b
|
Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer (#11963)
* Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer
* Style fixes
|
2024-08-29 19:22:09 +08:00 |
|
Guancheng Fu
|
0a7bd274e2
|
Add vllm awq loading logic (#11950)
* add vllm awq loading logic
* fix
* refine
|
2024-08-28 16:46:18 +08:00 |
|
Yina Chen
|
23631cd357
|
disable lm_head opt for baichuan2-13b (#11905)
|
2024-08-23 15:39:47 +08:00 |
|
hxsz1997
|
650e6e6ce4
|
Merge pull request #11891 from hxsz1997/baichuan2-compresskv
Add compress_kv for Baichuan2
|
2024-08-23 06:09:58 +03:00 |
|
Huang, Xinshengzi
|
4cf03d6212
|
update baichuan-7b
|
2024-08-22 18:16:33 +08:00 |
|
Guancheng Fu
|
278b191dc1
|
Fix optimize lm head error (#11899)
|
2024-08-22 17:45:26 +08:00 |
|
Huang, Xinshengzi
|
86248b0505
|
add compress_kv for baichuan2
|
2024-08-22 10:59:08 +08:00 |
|