Guancheng Fu
57a023aadc
Fix vllm tp ( #11297 )
2024-06-13 10:47:48 +08:00
binbin Deng
220151e2a1
Refactor pipeline parallel multi-stage implementation ( #11286 )
2024-06-13 10:00:23 +08:00
Ruonan Wang
14b1e6b699
Fix gguf_q4k ( #11293 )
...
* udpate embedding parameter
* update benchmark
2024-06-12 20:43:08 +08:00
Yuwen Hu
8edcdeb0e7
Fix bug that torch.ops.torch_ipex.matmul_bias_out cannot work on Linux MTL for short input ( #11292 )
2024-06-12 19:12:57 +08:00
Xin Qiu
592f7aa61e
Refine glm1-4 sdp ( #11276 )
...
* chatglm
* update
* update
* change chatglm
* update sdpa
* update
* fix style
* fix
* fix glm
* update glm2-32k
* update glm2-32k
* fix cpu
* update
* change lower_bound
2024-06-12 17:11:56 +08:00
Yuwen Hu
cffb932f05
Expose timeout for streamer for fastchat worker ( #11288 )
...
* Expose timeout for stremer for fastchat worker
* Change to read from env variables
2024-06-12 17:02:40 +08:00
Qiyuan Gong
0d9cc9c106
Remove duplicate check for ipex ( #11281 )
...
* Replacing builtin.import is causing lots of unpredicted problems. Remove this function.
2024-06-12 13:52:02 +08:00
Yishuo Wang
10e480ee96
refactor internlm and internlm2 ( #11274 )
2024-06-11 14:19:19 +08:00
Xiangyu Tian
4b07712fd8
LLM: Fix vLLM CPU model convert mismatch ( #11254 )
...
Fix vLLM CPU model convert mismatch.
2024-06-07 15:54:34 +08:00
Yishuo Wang
42fab480ea
support stablm2 12b ( #11265 )
2024-06-07 15:46:00 +08:00
Xin Qiu
dbc3c2d72d
glm4 sdp ( #11253 )
...
* glm4 sdp
* fix style
* update comment
2024-06-07 15:42:23 +08:00
Xin Qiu
151fcf37bb
check devie name in use_flash_attention ( #11263 )
2024-06-07 15:07:47 +08:00
Yishuo Wang
2623944604
qwen2 sdpa small fix ( #11261 )
2024-06-07 14:42:18 +08:00
Yishuo Wang
ea0d03fd28
Refactor baichuan1 7B and 13B ( #11258 )
2024-06-07 14:29:20 +08:00
Qiyuan Gong
1aa9c9597a
Avoid duplicate import in IPEX auto importer ( #11227 )
...
* Add custom import to avoid ipex duplicate importing
* Add scope limitation
2024-06-07 14:08:00 +08:00
Yishuo Wang
ef8e9b2ecd
Refactor qwen2 moe ( #11244 )
2024-06-07 13:14:54 +08:00
Zhao Changmin
b7948671de
[WIP] Add look up table in 1st token stage ( #11193 )
...
* lookuptb
2024-06-07 10:51:05 +08:00
Xin Qiu
2f809116e2
optimize Chatglm4 ( #11239 )
...
* chatglm4
* update
* update
* add rms norm
* chatglm4
2024-06-06 18:25:20 +08:00
Yishuo Wang
2e4ccd541c
fix qwen2 cpu ( #11240 )
2024-06-06 16:24:19 +08:00
Yishuo Wang
e738ec38f4
disable quantize kv in specific qwen model ( #11238 )
2024-06-06 14:08:39 +08:00
Yishuo Wang
c4e5806e01
add latest optimization in starcoder2 ( #11236 )
2024-06-06 14:02:17 +08:00
Yishuo Wang
ba27e750b1
refactor yuan2 ( #11235 )
2024-06-06 13:17:54 +08:00
Guoqiong Song
f6d5c6af78
fix issue 1407 ( #11171 )
2024-06-05 13:35:57 -07:00
Yina Chen
ed67435491
Support Fp6 k in ipex-llm ( #11222 )
...
* support fp6_k
* support fp6_k
* remove
* fix style
2024-06-05 17:34:36 +08:00
binbin Deng
a6674f5bce
Fix should_use_fuse_rope error of Qwen1.5-MoE-A2.7B-Chat ( #11216 )
2024-06-05 15:56:10 +08:00
Xin Qiu
566691c5a3
quantized attention forward for minicpm ( #11200 )
...
* quantized minicpm
* fix style check
2024-06-05 09:15:25 +08:00
Jiao Wang
bb83bc23fd
Fix Starcoder issue on CPU on transformers 4.36+ ( #11190 )
...
* fix starcoder for sdpa
* update
* style
2024-06-04 10:05:40 -07:00
Xiangyu Tian
ac3d53ff5d
LLM: Fix vLLM CPU version error ( #11206 )
...
Fix vLLM CPU version error
2024-06-04 19:10:23 +08:00
Ruonan Wang
1dde204775
update q6k ( #11205 )
2024-06-04 17:14:33 +08:00
Qiyuan Gong
ce3f08b25a
Fix IPEX auto importer ( #11192 )
...
* Fix ipex auto importer with Python builtins.
* Raise errors if the user imports ipex manually before importing ipex_llm. Do nothing if they import ipex after importing ipex_llm.
* Remove import ipex in examples.
2024-06-04 16:57:18 +08:00
Yishuo Wang
6454655dcc
use sdp in baichuan2 13b ( #11198 )
2024-06-04 15:39:00 +08:00
Yishuo Wang
d90cd977d0
refactor stablelm ( #11195 )
2024-06-04 13:14:43 +08:00
Xin Qiu
5f13700c9f
optimize Minicpm ( #11189 )
...
* minicpm optimize
* update
2024-06-03 18:28:29 +08:00
Shaojun Liu
401013a630
Remove chatglm_C Module to Eliminate LGPL Dependency ( #11178 )
...
* remove chatglm_C.**.pyd to solve ngsolve weak copyright vunl
* fix style check error
* remove chatglm native int4 from langchain
2024-05-31 17:03:11 +08:00
Ruonan Wang
50b5f4476f
update q4k convert ( #11179 )
2024-05-31 11:36:53 +08:00
ZehuaCao
4127b99ed6
Fix null pointer dereferences error. ( #11125 )
...
* delete unused function on tgi_server
* update
* update
* fix style
2024-05-30 16:16:10 +08:00
Guancheng Fu
50ee004ac7
Fix vllm condition ( #11169 )
...
* add use-vllm
* done
* fix style
* fix done
2024-05-30 15:23:17 +08:00
Ruonan Wang
9bfbf78bf4
update api usage of xe_batch & fp16 ( #11164 )
...
* update api usage
* update setup.py
2024-05-29 15:15:14 +08:00
Yina Chen
e29e2f1c78
Support new fp8 e4m3 ( #11158 )
2024-05-29 14:27:14 +08:00
Yishuo Wang
bc5008f0d5
disable sdp_causal in phi-3 to fix overflow ( #11157 )
2024-05-28 17:25:53 +08:00
SONG Ge
33852bd23e
Refactor pipeline parallel device config ( #11149 )
...
* refactor pipeline parallel device config
* meet comments
* update example
* add warnings and update code doc
2024-05-28 16:52:46 +08:00
Yishuo Wang
d307622797
fix first token sdp with batch ( #11153 )
2024-05-28 15:03:06 +08:00
Yina Chen
3464440839
fix qwen import error ( #11154 )
2024-05-28 14:50:12 +08:00
Yina Chen
b6b70d1ba0
Divide core-xe packages ( #11131 )
...
* temp
* add batch
* fix style
* update package name
* fix style
* add workflow
* use temp version to run uts
* trigger performance test
* trigger win igpu perf
* revert workflow & setup
2024-05-28 12:00:18 +08:00
binbin Deng
c9168b85b7
Fix error during merging adapter ( #11145 )
2024-05-27 19:41:42 +08:00
Guancheng Fu
daf7b1cd56
[Docker] Fix image using two cards error ( #11144 )
...
* fix all
* done
2024-05-27 16:20:13 +08:00
binbin Deng
367de141f2
Fix mixtral-8x7b with transformers=4.37.0 ( #11132 )
2024-05-27 09:50:54 +08:00
Guancheng Fu
fabc395d0d
add langchain vllm interface ( #11121 )
...
* done
* fix
* fix
* add vllm
* add langchain vllm exampels
* add docs
* temp
2024-05-24 17:19:27 +08:00
ZehuaCao
63e95698eb
[LLM]Reopen autotp generate_stream ( #11120 )
...
* reopen autotp generate_stream
* fix style error
* update
2024-05-24 17:16:14 +08:00
Yishuo Wang
1dc680341b
fix phi-3-vision import ( #11129 )
2024-05-24 15:57:15 +08:00