ipex-llm

Author	SHA1	Message	Date
Xin Qiu	399843faf0	Baichuan 7b fp16 sdp and qwen2 pvc sdp (#10435 ) * add baichuan sdp * update * baichuan2 * fix * fix style * revert 13b * revert	2024-03-18 10:15:34 +08:00
Yishuo Wang	bd64488b2a	add mask support for llama/chatglm fp8 sdp (#10433 ) * add mask support for fp8 sdp * fix chatglm2 dtype * update	2024-03-15 17:36:52 +08:00
Xin Qiu	24473e331a	Qwen2 fp16 sdp (#10427 ) * qwen2 sdp and refine * update * update * fix style * remove use_flash_attention	2024-03-15 13:12:03 +08:00
Ruonan Wang	b036205be2	LLM: add fp8 sdp for chatglm2/3 (#10411 ) * add fp8 sdp for chatglm2 * fix style	2024-03-15 09:38:18 +08:00
Wang, Jian4	fe8976a00f	LLM: Support gguf models use low_bit and fix no json(#10408 ) * support others model use low_bit * update readme * update to add *.json	2024-03-15 09:34:18 +08:00
Xin Qiu	cda38f85a9	Qwen fp16 sdp (#10401 ) * qwen sdp * fix * update * update * update sdp * update * fix style check * add to origin type	2024-03-15 08:51:50 +08:00
dingbaorong	1c0f7ed3fa	add xpu support (#10419 )	2024-03-14 17:13:48 +08:00
Heyang Sun	7d29765092	refactor qwen2 forward to enable XPU (#10409 ) * refactor awen2 forward to enable XPU * Update qwen2.py	2024-03-14 11:03:05 +08:00
ZehuaCao	f66329e35d	Fix multiple get_enable_ipex function error (#10400 ) * fix multiple get_enable_ipex function error * remove get_enable_ipex_low_bit function	2024-03-14 10:14:13 +08:00
Kai Huang	76e30d8ec8	Empty cache for lm_head (#10317 ) * empty cache * add comments	2024-03-13 20:31:53 +08:00
Yishuo Wang	06a851afa9	support new baichuan model (#10404 )	2024-03-13 17:45:50 +08:00
Yishuo Wang	b268baafd6	use fp8 sdp in llama (#10396 )	2024-03-13 16:45:38 +08:00
Xiangyu Tian	60043a3ae8	LLM: Support Baichuan2-13b in BigDL-vLLM (#10398 ) Support Baichuan2-13b in BigDL-vLLM.	2024-03-13 16:21:06 +08:00
Xiangyu Tian	e10de2c42d	[Fix] LLM: Fix condition check error for speculative decoding on CPU (#10402 ) Fix condition check error for speculative decoding on CPU	2024-03-13 16:05:06 +08:00
Heyang Sun	d72c0fad0d	Qwen2 SDPA forward on CPU (#10395 ) * Fix Qwen1.5 CPU forward * Update convert.py * Update qwen2.py	2024-03-13 13:10:03 +08:00
Wang, Jian4	0193f29411	LLM : Enable gguf float16 and Yuan2 model (#10372 ) * enable float16 * add yun files * enable yun * enable set low_bit on yuan2 * update * update license * update generate * update readme * update python style * update	2024-03-13 10:19:18 +08:00
Yina Chen	f5d65203c0	First token lm_head optimization (#10318 ) * add lm head linear * update * address comments and fix style * address comment	2024-03-13 10:11:32 +08:00
Xin Qiu	28c4a8cf5c	Qwen fused qkv (#10368 ) * fused qkv + rope for qwen * quantized kv cache * fix * update qwen * fixed quantized qkv * fix * meet code review * update split * convert.py * extend when no enough kv * fix	2024-03-12 17:39:00 +08:00
Yishuo Wang	741c2bf1df	use new rms norm (#10384 )	2024-03-12 17:29:51 +08:00
Xiangyu Tian	0ded0b4b13	LLM: Enable BigDL IPEX optimization for int4 (#10319 ) Enable BigDL IPEX optimization for int4	2024-03-12 17:08:50 +08:00
Zhao Changmin	df2b84f7de	Enable kv cache on arc batch (#10308 )	2024-03-12 16:46:04 +08:00
Guancheng Fu	cc4148636d	[FastChat-integration] Add initial implementation for loader (#10323 ) * add initial implementation for loader * add test method for model_loader * data * Refine	2024-03-12 10:54:59 +08:00
binbin Deng	dbcfc5c2fa	LLM: fix error of 'AI-ModelScope/phi-2' hosted by ModelScope hub (#10364 )	2024-03-11 16:19:17 +08:00
Chen, Zhentao	a425eaabfc	fix from_pretrained when device_map=None (#10361 ) * pr trigger * fix error when device_map=None * fix device_map=None	2024-03-11 16:06:12 +08:00
Yina Chen	d7b765fd3f	serving xpu memory opt (#10358 )	2024-03-11 15:21:22 +08:00
Ruonan Wang	be29833b2b	LLM: fix qwen2 (#10356 )	2024-03-11 09:29:08 +08:00
Zhicun	9026c08633	Fix llamaindex AutoTokenizer bug (#10345 ) * fix tokenizer * fix AutoTokenizer bug * modify code style	2024-03-08 16:24:50 +08:00
Keyan (Kyrie) Zhang	7a621a4db0	Fix device_map bug by raise an error when using device_map=xpu (#10340 ) * Fix device_map bug by raise an error when using device_map=xpu * Fix sync error * Fix python style * Use invalidInputError instead of invalidOperationError	2024-03-08 13:38:52 +08:00
Yishuo Wang	1ac193ba02	add rope theta argument (#10343 )	2024-03-07 17:27:19 +08:00
Cengguang Zhang	496d18ab6d	LLM: add quantize kv cache support for baichuan 7b and 13b. (#10330 ) * add quantize kv cache for baichuan 7b and 13b. * fix typo. * fix. * fix style. * fix style.	2024-03-07 16:17:38 +08:00
Yina Chen	9ea499ca68	Optimize speculative decoding PVC memory usage (#10329 ) * optimize memory * update * update * update * support other models * update * fix style	2024-03-06 09:54:21 +08:00
dingbaorong	cc796848ea	fix typos (#10274 ) Co-authored-by: Ariadne <wyn2000330@126.com>	2024-03-05 18:38:22 +08:00
Yishuo Wang	0011ff9f64	optimize bge large performance (#10324 )	2024-03-05 17:06:03 +08:00
Cengguang Zhang	30d009bca7	LLM: support quantized kv cache for Mistral in transformers >=4.36.0 (#10326 ) * support quantize kv for mistral in transformers 4.36 * update mistral support. * fix style.	2024-03-05 16:23:50 +08:00
dingbaorong	1e6f0c6f1a	Add llamaindex gpu example (#10314 ) * add llamaindex example * fix core dump * refine readme * add trouble shooting * refine readme --------- Co-authored-by: Ariadne <wyn2000330@126.com>	2024-03-05 13:36:00 +08:00
dingbaorong	fc7f10cd12	add langchain gpu example (#10277 ) * first draft * fix * add readme for transformer_int4_gpu * fix doc * check device_map * add arc ut test * fix ut test * fix langchain ut * Refine README * fix gpu mem too high * fix ut test --------- Co-authored-by: Ariadne <wyn2000330@126.com>	2024-03-05 13:33:57 +08:00
Cengguang Zhang	ab9fc2485f	LLM: add quantize kv support for llama transformer 4.36 (#10298 ) * add quantize kv support for llama transformer 4.36 * fix style. * fix style.	2024-03-04 10:33:35 +08:00
SONG Ge	0ab40917fb	[LLM] Split merged_qk to separated q/k linear (#10299 ) * modify merge_qk_linear to separated q/k linear * update	2024-03-01 16:48:55 +08:00
Yang Wang	f4d7dbcde2	use fused qkv forward in qwen2 (#10185 ) * use fused qkv forward in qwen2 * support both * fix style * fix rope * remove pring * fix style * clean up	2024-03-01 16:46:35 +08:00
Wang, Jian4	beb9433cec	LLM: Reduce speculative _ipex_optimize_model memory use (#10281 ) * use tpp * update ipex	2024-03-01 13:48:23 +08:00
Yuwen Hu	f0ff0eebe1	[LLM] Support quantize kv cache for Baichuan2 7B (#10280 ) * Add quatized kv cache framework for Baichuan2 7B * Support quantize kv cache for baichuan2 * Small fix * Fix python style	2024-03-01 13:35:42 +08:00
SONG Ge	273de341d7	hot-fix silu error import (#10292 )	2024-03-01 10:11:37 +08:00
Xin Qiu	232273a1b5	Enable Gemma fused mlp + Gelu (#10276 ) * update llama mlp forward * add all * fix style check * split * update * update * update * fix style	2024-02-29 16:53:24 +08:00
Guancheng Fu	2d930bdca8	Add vLLM bf16 support (#10278 ) * add argument load_in_low_bit * add docs * modify gpu doc * done --------- Co-authored-by: ivy-lv11 <lvzc@lamda.nju.edu.cn>	2024-02-29 16:33:42 +08:00
SONG Ge	13b0bc9075	[LLM] Add quantize_kv optimization for yuan2 model (#10243 ) * add initial quantize_kv support for yuan2 model * fix yuan2 quantize_kv generation * apply fp16 conv layer optimizations * disable mlp for quantize_kv	2024-02-29 16:33:26 +08:00
Zhicun	4e6cc424f1	Add LlamaIndex RAG (#10263 ) * run demo * format code * add llamaindex * add custom LLM with bigdl * update * add readme * begin ut * add unit test * add license * add license * revised * update * modify docs * remove data folder * update * modify prompt * fixed * fixed * fixed	2024-02-29 15:21:19 +08:00
Ruonan Wang	a9fd20b6ba	LLM: Update qkv fusion for GGUF-IQ2 (#10271 ) * first commit * update mistral * fix transformers==4.36.0 * fix * disable qk for mixtral now * fix style	2024-02-29 12:49:53 +08:00
Ruonan Wang	4b08bc1417	LLM: relax batch check of flash atttention by double check attention mask (#10270 ) * relax batch check * fix * fix style	2024-02-29 09:39:55 +08:00
Yina Chen	07f36fbfcc	Fix gptj failed to extend (#10269 )	2024-02-29 09:39:27 +08:00
Yishuo Wang	cccb02dad1	fix baichuan2 13b 2k input (#10267 )	2024-02-28 17:20:20 +08:00

1 2 3 4 5 ...

504 commits