Xin Qiu
|
30795bdfbc
|
Gemma optimization: rms_norm, kv_cache, fused_rope, fused_rope+qkv (#10212)
* gemma optimization
* update
* update
* fix style
* meet code review
|
2024-02-23 10:07:24 +08:00 |
|
Guoqiong Song
|
63681af97e
|
falcon for transformers 4.36 (#9960)
* falcon for transformers 4.36
|
2024-02-22 17:04:40 -08:00 |
|
Yina Chen
|
ce5840a8b7
|
GPT-J rope optimization on xpu (#10182)
* optimize
* update
* fix style & move use_fuse_rope
* add ipex version check
* fix style
* update
* fix style
* meet comments
* address comments
* fix style
|
2024-02-22 16:25:12 +08:00 |
|
Xiangyu Tian
|
f445217d02
|
LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189)
Update IPEX to 2.2.0+cpu and refactor for _ipex_optimize.
|
2024-02-22 16:01:11 +08:00 |
|
Heyang Sun
|
c876d9b5ca
|
Support for MPT rotary embedding (#10208)
|
2024-02-22 15:16:31 +08:00 |
|
Ruonan Wang
|
5e1fee5e05
|
LLM: add GGUF-IQ2 examples (#10207)
* add iq2 examples
* small fix
* meet code review
* fix
* meet review
* small fix
|
2024-02-22 14:18:45 +08:00 |
|
SONG Ge
|
ca1166a0e5
|
[LLM] Add quantize kv_cache for Baichuan2-13B (#10203)
* add quantize kv_cache for baichuan2-13b
* style fix
|
2024-02-22 13:43:35 +08:00 |
|
Ruonan Wang
|
34ee1aa91f
|
LLM: add esimd sdp support for chatglm3 (#10205)
* add esimd sdp support
* fix style
|
2024-02-22 13:37:16 +08:00 |
|
Ruonan Wang
|
f7c96b19ef
|
LLM: support iq2 for mixtral (#10191)
* support name mapping for mixtral
* support mixtral mixed quantization
* fix style
* fix
|
2024-02-21 16:00:29 +08:00 |
|
Xin Qiu
|
56ad781f2f
|
qwen2 cpu fix (#10187)
|
2024-02-21 11:23:51 +08:00 |
|
Zhao Changmin
|
4fbf449c2d
|
for rwkv4 (#10179)
|
2024-02-21 10:11:10 +08:00 |
|
Ruonan Wang
|
3288acb8de
|
LLM : Support embedding quantization (only q2k now) (#10170)
* basic logic added
* basic support
* support save&load, update mixed strategy
* fix style
* use int8 for lm_head
* add check for xpu
|
2024-02-20 16:56:57 +08:00 |
|
binbin Deng
|
2bb96c775c
|
LLM: fix device setting during saving optimized model (#10154)
|
2024-02-20 09:52:59 +08:00 |
|
Xin Qiu
|
1f6d5b9f30
|
enable fused rmsnorm and rope qwen2 (#10163)
* qwen2
* change convert
* cleanup
|
2024-02-20 08:33:09 +08:00 |
|
Zhao Changmin
|
f8730e8dc1
|
Skip rescale rwkv linear when load_low_bit (#10164)
* rwkv_ld
|
2024-02-19 15:56:42 +08:00 |
|
Heyang Sun
|
3e2af5ec0a
|
Fix IPEX Baichuan Speculative (#10162)
* Fix IPEX Baichuan Speculative
* compatible with 13B
* Update speculative.py
|
2024-02-19 15:27:34 +08:00 |
|
Yina Chen
|
23c91cdce6
|
[LLM] Add min_step_draft in speculative decoding (#10142)
* Fix gptj kvcache & position id
* Add min_draft_tokens in speculative decoding
* fix style
* update
|
2024-02-19 14:31:41 +08:00 |
|
Wang, Jian4
|
f2417e083c
|
LLM: enable chatglm3-6b target_model ipex (#10085)
* init
* always make casual_mask
* not return last tensor
* update
* optimize_model = False
* enable optimized=False
* enable optimized_model=true
* speed_up ipex target_model
* remove if True
* use group_size
* update python style
* update
* update
|
2024-02-19 13:38:32 +08:00 |
|
Yina Chen
|
1508d6b089
|
Fix gptj kvcache & position id (#10141)
|
2024-02-18 10:02:49 +08:00 |
|
Yishuo Wang
|
4d33aac7f9
|
quick fix qwen2 fp8 kv cache (#10135)
|
2024-02-08 17:04:59 +08:00 |
|
Cengguang Zhang
|
39d90839aa
|
LLM: add quantize kv cache for llama. (#10086)
* feat: add quantize kv cache for llama.
* fix style.
* add quantized attention forward function.
* revert style.
* fix style.
* fix style.
* update quantized kv cache and add quantize_qkv
* fix style.
* fix style.
* optimize quantize kv cache.
* fix style.
|
2024-02-08 16:49:22 +08:00 |
|
Yishuo Wang
|
d848efe17c
|
add quantize kv cache support for qwen2 (#10134)
|
2024-02-08 16:17:21 +08:00 |
|
SONG Ge
|
3f79128ed7
|
[LLM] Enable kv_cache optimization for Qwen2 on transformers-v4.37.0 (#10131)
* add support for kv_cache optimization on transformers-v4.37.0
* enable attention forward
* style fix
* disable rotary for now
|
2024-02-08 14:20:26 +08:00 |
|
Ruonan Wang
|
063dc145ac
|
LLM: basic support for q2k (#10132)
* basic support for q2k
* fix style
|
2024-02-08 13:52:01 +08:00 |
|
Cengguang Zhang
|
0cf6a12691
|
LLM: add default torch_dtype for fp16. (#10124)
* set default torch_dtype for fp16.
* fix style.
* bug fix.
* update bug fix.
|
2024-02-08 10:24:16 +08:00 |
|
Yishuo Wang
|
1aa0c623ce
|
disable fused layer norm on UHD (#10130)
|
2024-02-08 10:20:01 +08:00 |
|
Yuwen Hu
|
a8450fc300
|
[LLM] Support MLP optimization for Qwen1.5 (#10123)
|
2024-02-08 09:15:34 +08:00 |
|
binbin Deng
|
925f82107e
|
LLM: support models hosted by modelscope (#10106)
|
2024-02-07 16:46:36 +08:00 |
|
Xiangyu Tian
|
8953acd7d6
|
[LLM] Fix log condition for BIGDL_OPT_IPEX (#10115)
Fix log condition for BIGDL_OPT_IPEX
|
2024-02-07 10:27:10 +08:00 |
|
Yuwen Hu
|
518ef95abc
|
Small fix for Nonetype error (#10104)
|
2024-02-06 14:58:52 +08:00 |
|
Ruonan Wang
|
d61f4905ac
|
LLM: 2bit quantization initial support (#10042)
* basis quantize support
* fix new module name
* small update
* and mixed int4 with iq2_xxs
* remove print
* code refactor
* fix style
* meet code review
|
2024-02-06 14:58:32 +08:00 |
|
Jiao Wang
|
33b9e7744d
|
fix dimension (#10097)
|
2024-02-05 15:07:38 -08:00 |
|
Zhicun
|
7d2be7994f
|
add phixtral and optimize phi-moe (#10052)
|
2024-02-05 11:12:47 +08:00 |
|
Zhicun
|
676d6923f2
|
LLM: modify transformersembeddings.embed() in langchain (#10051)
|
2024-02-05 10:42:10 +08:00 |
|
Jin Qiao
|
ad050107b3
|
LLM: fix mpt load_low_bit issue (#10075)
* fix
* retry
* retry
|
2024-02-05 10:17:07 +08:00 |
|
Ruonan Wang
|
8e33cb0f38
|
LLM: support speecht5_tts (#10077)
* support speecht5_tts
* fix
|
2024-02-04 13:26:42 +08:00 |
|
ivy-lv11
|
428b7105f6
|
Add HF and PyTorch example InternLM2 (#10061)
|
2024-02-04 10:25:55 +08:00 |
|
Yina Chen
|
77be19bb97
|
LLM: Support gpt-j in speculative decoding (#10067)
* gptj
* support gptj in speculative decoding
* fix
* update readme
* small fix
|
2024-02-02 14:54:55 +08:00 |
|
Xin Qiu
|
6e0f1a1e92
|
use apply_rotary_pos_emb_cache_freq_xpu in mixtral (#10060)
* use apply_rotary_pos_emb_cache_freq_xpu in mixtral
* fix style
|
2024-02-01 15:40:49 +08:00 |
|
Heyang Sun
|
601024f418
|
Mistral CPU example of speculative decoding (#10024)
* Mistral CPU example of speculative decoding
* update transformres version
* update example
* Update README.md
|
2024-02-01 10:52:32 +08:00 |
|
Heyang Sun
|
968e70544d
|
Enable IPEX Mistral in Speculative (#10059)
|
2024-02-01 10:48:16 +08:00 |
|
Yina Chen
|
3ca03d4e97
|
Add deepmind sample into bigdl-llm speculative decoding (#10041)
* migrate deepmind sample
* update
* meet comments
* fix style
* fix style
|
2024-02-01 09:57:02 +08:00 |
|
Wang, Jian4
|
7e5cd42a5c
|
LLM : Update optimize ipex bf16 (#10038)
* use 4.35.2 and remove
* update rmsnorm
* remove
* remove
* update python style
* update
* update python style
* update
* fix style
* update
* remove whitespace
|
2024-01-31 10:59:55 +08:00 |
|
Ruonan Wang
|
3685622f29
|
LLM: fix llama 4.36 forward(#10047)
|
2024-01-31 10:31:10 +08:00 |
|
Yishuo Wang
|
53a5140eff
|
Optimize rwkv v5 rest token again (#10043)
|
2024-01-31 10:01:11 +08:00 |
|
Ruonan Wang
|
6b63ba23d1
|
LLM: add full module name during convert (#10035)
|
2024-01-30 14:43:07 +08:00 |
|
Yishuo Wang
|
7dfa6dbe46
|
add rwkv time shift optimization (#10032)
|
2024-01-30 14:10:55 +08:00 |
|
Xiangyu Tian
|
f57d0fda8b
|
[LLM] Use IPEX Optimization for Self Speculative Decoding (#9997)
Use IPEX Optimization for Self Speculative Decoding
|
2024-01-30 09:11:06 +08:00 |
|
Ruonan Wang
|
ccf8f613fb
|
LLM: update fp16 Linear on ARC/FLEX (#10023)
|
2024-01-29 18:25:26 +08:00 |
|
Shaojun Liu
|
824c8029d7
|
Fix "local variable 'model' referenced before assignment" (#10022)
|
2024-01-29 16:18:04 +08:00 |
|