Qiyuan Gong
1eb884a249
IPEX Duplicate importer V2 ( #11310 )
...
* Add gguf support.
* Avoid error when import ipex-llm for multiple times.
* Add check to avoid duplicate replace and revert.
* Add calling from check to avoid raising exceptions in the submodule.
* Add BIGDL_CHECK_DUPLICATE_IMPORT for controlling duplicate checker. Default is true.
2024-06-19 16:29:19 +08:00
Yishuo Wang
ae7b662ed2
add fp16 NPU Linear support and fix intel_npu_acceleration_library version 1.0 support ( #11352 )
2024-06-19 09:14:59 +08:00
Guoqiong Song
c44b1942ed
fix mistral for transformers>=4.39 ( #11191 )
...
* fix mistral for transformers>=4.39
2024-06-18 13:39:35 -07:00
Yishuo Wang
83082e5cc7
add initial support for intel npu acceleration library ( #11347 )
2024-06-18 16:07:16 +08:00
Yina Chen
5dad33e5af
Support fp8_e4m3 scale search ( #11339 )
...
* fp8e4m3 switch off
* fix style
2024-06-18 11:47:43 +08:00
binbin Deng
e50c890e1f
Support finishing PP inference once eos_token_id is found ( #11336 )
2024-06-18 09:55:40 +08:00
SONG Ge
ef4b6519fb
Add phi-3 model support for pipeline parallel inference ( #11334 )
...
* add phi-3 model support
* add phi3 example
2024-06-17 17:44:24 +08:00
Xin Qiu
183e0c6cf5
glm-4v-9b support ( #11327 )
...
* chatglm4v support
* fix style check
* update glm4v
2024-06-17 13:52:37 +08:00
binbin Deng
6ea1e71af0
Update PP inference benchmark script ( #11323 )
2024-06-17 09:59:36 +08:00
SONG Ge
be00380f1a
Fix pipeline parallel inference past_key_value error in Baichuan ( #11318 )
...
* fix past_key_value error
* add baichuan2 example
* fix style
* update doc
* add script link in doc
* fix import error
* update
2024-06-17 09:29:32 +08:00
Yina Chen
0af0102e61
Add quantization scale search switch ( #11326 )
...
* add scale_search switch
* remove llama3 instruct
* remove print
2024-06-14 18:46:52 +08:00
Ruonan Wang
8a3247ac71
support batch forward for q4_k, q6_k ( #11325 )
2024-06-14 18:25:50 +08:00
Yishuo Wang
e8dd8e97ef
fix chatglm lookahead on ARC ( #11320 )
2024-06-14 16:26:11 +08:00
Yishuo Wang
91965b5d05
add glm_sdpa back to fix chatglm-6b ( #11313 )
2024-06-14 10:31:43 +08:00
Yishuo Wang
7f65836cb9
fix chatglm2/3-32k/128k fp16 ( #11311 )
2024-06-14 09:58:07 +08:00
Xin Qiu
1b0c4c8cb8
use new rotary two in chatglm4 ( #11312 )
...
* use new rotary two in chatglm4
* rempve
2024-06-13 19:02:18 +08:00
Xin Qiu
f1410d6823
refactor chatglm4 ( #11301 )
...
* glm4
* remove useless code
* stype
* add rope_ratio
* update
* fix fp16
* fix style
2024-06-13 18:06:04 +08:00
Yishuo Wang
5e25766855
fix and optimize chatglm2-32k and chatglm3-128k ( #11306 )
2024-06-13 17:37:58 +08:00
binbin Deng
60cb1dac7c
Support PP for qwen1.5 ( #11300 )
2024-06-13 17:35:24 +08:00
Yishuo Wang
a24666b8f3
fix chatglm3-6b-32k ( #11303 )
2024-06-13 16:01:34 +08:00
Yishuo Wang
01fe0fc1a2
refactor chatglm2/3 ( #11290 )
2024-06-13 12:22:58 +08:00
Guancheng Fu
57a023aadc
Fix vllm tp ( #11297 )
2024-06-13 10:47:48 +08:00
binbin Deng
220151e2a1
Refactor pipeline parallel multi-stage implementation ( #11286 )
2024-06-13 10:00:23 +08:00
Ruonan Wang
14b1e6b699
Fix gguf_q4k ( #11293 )
...
* udpate embedding parameter
* update benchmark
2024-06-12 20:43:08 +08:00
Yuwen Hu
8edcdeb0e7
Fix bug that torch.ops.torch_ipex.matmul_bias_out cannot work on Linux MTL for short input ( #11292 )
2024-06-12 19:12:57 +08:00
Xin Qiu
592f7aa61e
Refine glm1-4 sdp ( #11276 )
...
* chatglm
* update
* update
* change chatglm
* update sdpa
* update
* fix style
* fix
* fix glm
* update glm2-32k
* update glm2-32k
* fix cpu
* update
* change lower_bound
2024-06-12 17:11:56 +08:00
Yuwen Hu
cffb932f05
Expose timeout for streamer for fastchat worker ( #11288 )
...
* Expose timeout for stremer for fastchat worker
* Change to read from env variables
2024-06-12 17:02:40 +08:00
Qiyuan Gong
0d9cc9c106
Remove duplicate check for ipex ( #11281 )
...
* Replacing builtin.import is causing lots of unpredicted problems. Remove this function.
2024-06-12 13:52:02 +08:00
Yishuo Wang
10e480ee96
refactor internlm and internlm2 ( #11274 )
2024-06-11 14:19:19 +08:00
Xiangyu Tian
4b07712fd8
LLM: Fix vLLM CPU model convert mismatch ( #11254 )
...
Fix vLLM CPU model convert mismatch.
2024-06-07 15:54:34 +08:00
Yishuo Wang
42fab480ea
support stablm2 12b ( #11265 )
2024-06-07 15:46:00 +08:00
Xin Qiu
dbc3c2d72d
glm4 sdp ( #11253 )
...
* glm4 sdp
* fix style
* update comment
2024-06-07 15:42:23 +08:00
Xin Qiu
151fcf37bb
check devie name in use_flash_attention ( #11263 )
2024-06-07 15:07:47 +08:00
Yishuo Wang
2623944604
qwen2 sdpa small fix ( #11261 )
2024-06-07 14:42:18 +08:00
Yishuo Wang
ea0d03fd28
Refactor baichuan1 7B and 13B ( #11258 )
2024-06-07 14:29:20 +08:00
Qiyuan Gong
1aa9c9597a
Avoid duplicate import in IPEX auto importer ( #11227 )
...
* Add custom import to avoid ipex duplicate importing
* Add scope limitation
2024-06-07 14:08:00 +08:00
Yishuo Wang
ef8e9b2ecd
Refactor qwen2 moe ( #11244 )
2024-06-07 13:14:54 +08:00
Zhao Changmin
b7948671de
[WIP] Add look up table in 1st token stage ( #11193 )
...
* lookuptb
2024-06-07 10:51:05 +08:00
Xin Qiu
2f809116e2
optimize Chatglm4 ( #11239 )
...
* chatglm4
* update
* update
* add rms norm
* chatglm4
2024-06-06 18:25:20 +08:00
Yishuo Wang
2e4ccd541c
fix qwen2 cpu ( #11240 )
2024-06-06 16:24:19 +08:00
Yishuo Wang
e738ec38f4
disable quantize kv in specific qwen model ( #11238 )
2024-06-06 14:08:39 +08:00
Yishuo Wang
c4e5806e01
add latest optimization in starcoder2 ( #11236 )
2024-06-06 14:02:17 +08:00
Yishuo Wang
ba27e750b1
refactor yuan2 ( #11235 )
2024-06-06 13:17:54 +08:00
Guoqiong Song
f6d5c6af78
fix issue 1407 ( #11171 )
2024-06-05 13:35:57 -07:00
Yina Chen
ed67435491
Support Fp6 k in ipex-llm ( #11222 )
...
* support fp6_k
* support fp6_k
* remove
* fix style
2024-06-05 17:34:36 +08:00
binbin Deng
a6674f5bce
Fix should_use_fuse_rope error of Qwen1.5-MoE-A2.7B-Chat ( #11216 )
2024-06-05 15:56:10 +08:00
Xin Qiu
566691c5a3
quantized attention forward for minicpm ( #11200 )
...
* quantized minicpm
* fix style check
2024-06-05 09:15:25 +08:00
Jiao Wang
bb83bc23fd
Fix Starcoder issue on CPU on transformers 4.36+ ( #11190 )
...
* fix starcoder for sdpa
* update
* style
2024-06-04 10:05:40 -07:00
Xiangyu Tian
ac3d53ff5d
LLM: Fix vLLM CPU version error ( #11206 )
...
Fix vLLM CPU version error
2024-06-04 19:10:23 +08:00
Ruonan Wang
1dde204775
update q6k ( #11205 )
2024-06-04 17:14:33 +08:00