ipex-llm

Author	SHA1	Message	Date
Ruonan Wang	f7c96b19ef	LLM: support iq2 for mixtral (#10191 ) * support name mapping for mixtral * support mixtral mixed quantization * fix style * fix	2024-02-21 16:00:29 +08:00
Xin Qiu	56ad781f2f	qwen2 cpu fix (#10187 )	2024-02-21 11:23:51 +08:00
Zhao Changmin	4fbf449c2d	for rwkv4 (#10179 )	2024-02-21 10:11:10 +08:00
Ruonan Wang	3288acb8de	LLM : Support embedding quantization (only q2k now) (#10170 ) * basic logic added * basic support * support save&load, update mixed strategy * fix style * use int8 for lm_head * add check for xpu	2024-02-20 16:56:57 +08:00
binbin Deng	2bb96c775c	LLM: fix device setting during saving optimized model (#10154 )	2024-02-20 09:52:59 +08:00
Xin Qiu	1f6d5b9f30	enable fused rmsnorm and rope qwen2 (#10163 ) * qwen2 * change convert * cleanup	2024-02-20 08:33:09 +08:00
Zhao Changmin	f8730e8dc1	Skip rescale rwkv linear when load_low_bit (#10164 ) * rwkv_ld	2024-02-19 15:56:42 +08:00
Heyang Sun	3e2af5ec0a	Fix IPEX Baichuan Speculative (#10162 ) * Fix IPEX Baichuan Speculative * compatible with 13B * Update speculative.py	2024-02-19 15:27:34 +08:00
Yina Chen	23c91cdce6	[LLM] Add min_step_draft in speculative decoding (#10142 ) * Fix gptj kvcache & position id * Add min_draft_tokens in speculative decoding * fix style * update	2024-02-19 14:31:41 +08:00
Wang, Jian4	f2417e083c	LLM: enable chatglm3-6b target_model ipex (#10085 ) * init * always make casual_mask * not return last tensor * update * optimize_model = False * enable optimized=False * enable optimized_model=true * speed_up ipex target_model * remove if True * use group_size * update python style * update * update	2024-02-19 13:38:32 +08:00
Yina Chen	1508d6b089	Fix gptj kvcache & position id (#10141 )	2024-02-18 10:02:49 +08:00
Yishuo Wang	4d33aac7f9	quick fix qwen2 fp8 kv cache (#10135 )	2024-02-08 17:04:59 +08:00
Cengguang Zhang	39d90839aa	LLM: add quantize kv cache for llama. (#10086 ) * feat: add quantize kv cache for llama. * fix style. * add quantized attention forward function. * revert style. * fix style. * fix style. * update quantized kv cache and add quantize_qkv * fix style. * fix style. * optimize quantize kv cache. * fix style.	2024-02-08 16:49:22 +08:00
Yishuo Wang	d848efe17c	add quantize kv cache support for qwen2 (#10134 )	2024-02-08 16:17:21 +08:00
SONG Ge	3f79128ed7	[LLM] Enable kv_cache optimization for Qwen2 on transformers-v4.37.0 (#10131 ) * add support for kv_cache optimization on transformers-v4.37.0 * enable attention forward * style fix * disable rotary for now	2024-02-08 14:20:26 +08:00
Ruonan Wang	063dc145ac	LLM: basic support for q2k (#10132 ) * basic support for q2k * fix style	2024-02-08 13:52:01 +08:00
Cengguang Zhang	0cf6a12691	LLM: add default torch_dtype for fp16. (#10124 ) * set default torch_dtype for fp16. * fix style. * bug fix. * update bug fix.	2024-02-08 10:24:16 +08:00
Yishuo Wang	1aa0c623ce	disable fused layer norm on UHD (#10130 )	2024-02-08 10:20:01 +08:00
Yuwen Hu	a8450fc300	[LLM] Support MLP optimization for Qwen1.5 (#10123 )	2024-02-08 09:15:34 +08:00
binbin Deng	925f82107e	LLM: support models hosted by modelscope (#10106 )	2024-02-07 16:46:36 +08:00
Xiangyu Tian	8953acd7d6	[LLM] Fix log condition for BIGDL_OPT_IPEX (#10115 ) Fix log condition for BIGDL_OPT_IPEX	2024-02-07 10:27:10 +08:00
Yuwen Hu	518ef95abc	Small fix for Nonetype error (#10104 )	2024-02-06 14:58:52 +08:00
Ruonan Wang	d61f4905ac	LLM: 2bit quantization initial support (#10042 ) * basis quantize support * fix new module name * small update * and mixed int4 with iq2_xxs * remove print * code refactor * fix style * meet code review	2024-02-06 14:58:32 +08:00
Jiao Wang	33b9e7744d	fix dimension (#10097 )	2024-02-05 15:07:38 -08:00
Zhicun	7d2be7994f	add phixtral and optimize phi-moe (#10052 )	2024-02-05 11:12:47 +08:00
Zhicun	676d6923f2	LLM: modify transformersembeddings.embed() in langchain (#10051 )	2024-02-05 10:42:10 +08:00
Jin Qiao	ad050107b3	LLM: fix mpt load_low_bit issue (#10075 ) * fix * retry * retry	2024-02-05 10:17:07 +08:00
Ruonan Wang	8e33cb0f38	LLM: support speecht5_tts (#10077 ) * support speecht5_tts * fix	2024-02-04 13:26:42 +08:00
ivy-lv11	428b7105f6	Add HF and PyTorch example InternLM2 (#10061 )	2024-02-04 10:25:55 +08:00
Yina Chen	77be19bb97	LLM: Support gpt-j in speculative decoding (#10067 ) * gptj * support gptj in speculative decoding * fix * update readme * small fix	2024-02-02 14:54:55 +08:00
Xin Qiu	6e0f1a1e92	use apply_rotary_pos_emb_cache_freq_xpu in mixtral (#10060 ) * use apply_rotary_pos_emb_cache_freq_xpu in mixtral * fix style	2024-02-01 15:40:49 +08:00
Heyang Sun	601024f418	Mistral CPU example of speculative decoding (#10024 ) * Mistral CPU example of speculative decoding * update transformres version * update example * Update README.md	2024-02-01 10:52:32 +08:00
Heyang Sun	968e70544d	Enable IPEX Mistral in Speculative (#10059 )	2024-02-01 10:48:16 +08:00
Yina Chen	3ca03d4e97	Add deepmind sample into bigdl-llm speculative decoding (#10041 ) * migrate deepmind sample * update * meet comments * fix style * fix style	2024-02-01 09:57:02 +08:00
Wang, Jian4	7e5cd42a5c	LLM : Update optimize ipex bf16 (#10038 ) * use 4.35.2 and remove * update rmsnorm * remove * remove * update python style * update * update python style * update * fix style * update * remove whitespace	2024-01-31 10:59:55 +08:00
Ruonan Wang	3685622f29	LLM: fix llama 4.36 forward(#10047 )	2024-01-31 10:31:10 +08:00
Yishuo Wang	53a5140eff	Optimize rwkv v5 rest token again (#10043 )	2024-01-31 10:01:11 +08:00
Ruonan Wang	6b63ba23d1	LLM: add full module name during convert (#10035 )	2024-01-30 14:43:07 +08:00
Yishuo Wang	7dfa6dbe46	add rwkv time shift optimization (#10032 )	2024-01-30 14:10:55 +08:00
Xiangyu Tian	f57d0fda8b	[LLM] Use IPEX Optimization for Self Speculative Decoding (#9997 ) Use IPEX Optimization for Self Speculative Decoding	2024-01-30 09:11:06 +08:00
Ruonan Wang	ccf8f613fb	LLM: update fp16 Linear on ARC/FLEX (#10023 )	2024-01-29 18:25:26 +08:00
Shaojun Liu	824c8029d7	Fix "local variable 'model' referenced before assignment" (#10022 )	2024-01-29 16:18:04 +08:00
Xiangyu Tian	f37e4702bc	[LLM] Use IPEX Optimization for BF16 Model (#9988 ) Use IPEX Optimization for BF16 Model by env BIGDL_OPT_IPEX=true	2024-01-29 11:28:25 +08:00
Yishuo Wang	d720554d43	simplify quantize kv cache api (#10011 )	2024-01-29 09:23:57 +08:00
Yina Chen	a3322e2a6c	add fp8 e5 to use_xmx (#10015 )	2024-01-26 18:29:46 +08:00
Qiyuan Gong	9e18ea187f	[LLM] Avoid KV Cache OOM when seq len is larger than 1 (#10006 ) * Avoid OOM during muti-round streaming chat with kv cache * For llama like kv cache, i.e., [bs, n_head, seq_len, head_dim], use is_enough_kv_cache_room_4_31. * Other models need to compare kv cache size with kv_len.	2024-01-26 17:30:08 +08:00
Ruonan Wang	a00efa0564	LLM: add mlp & qkv fusion for FP16 Llama-7B (#9932 ) * add mlp fusion for llama * add mlp fusion * fix style * update * add mm_qkv_out * fix style * update * meet code review * meet code review	2024-01-26 11:50:38 +08:00
Wang, Jian4	98ea3459e5	LLM : Fix llama draft_model dtype error (#10005 ) * fix llama draft_model dtype error * updat	2024-01-26 10:59:48 +08:00
Yishuo Wang	aae1870096	fix qwen kv cache length (#9998 )	2024-01-26 10:15:01 +08:00
Yishuo Wang	24b34b6e46	change xmx condition (#10000 )	2024-01-25 17:48:11 +08:00

1 2 3 4 5 ...

432 commits