ipex-llm

Author	SHA1	Message	Date
Kai Huang	c8679ad592	Qwen layernorm as input (#12309 ) * qwen layernorm as input * add group size	2024-11-04 09:51:15 +08:00
Ch1y0q	48123af463	add `npu_group_size` for `transformers_int4_npu_win` in all-in-one benchmark api (#12316 ) * add `npu_group_size` for `transformers_int4_npu_win` small bugfix * update	2024-11-01 18:44:27 +08:00
binbin Deng	f53bb4ea0b	[NPU L0] Update 1st token generation (#12314 )	2024-11-01 17:02:07 +08:00
binbin Deng	d409d9d0eb	[NPU L0] Update streaming mode of example (#12312 )	2024-11-01 15:38:10 +08:00
Yina Chen	05c5d0267a	[NPU] Llama2 prefill use ov sdp (#12310 ) * prefill use sdp * add param * update * fix style * fix style * meet comments	2024-11-01 11:05:20 +08:00
binbin Deng	eda764909c	Add minicpm-2b in L0 pipeline (#12308 )	2024-11-01 09:30:01 +08:00
Yishuo Wang	b9853f98b3	fix qwen2 attention_mask slice (#12307 )	2024-10-31 17:00:05 +08:00
binbin Deng	4892df61c9	Add qwen2-1.5b in l0 pipeline example (#12306 )	2024-10-31 16:44:25 +08:00
Xin Qiu	97a0f7fd35	Codegeex support (#12303 ) * new codegeex attn * use kv cache * add compress/quantize kv * remove compress/quantize kv * fix style check * fix style * fix codegeex	2024-10-31 15:28:56 +08:00
Yishuo Wang	72605c7016	fix llama3.1/3.2 quantize kv check (#12302 )	2024-10-31 11:55:07 +08:00
Kai Huang	416c19165c	Add Qwen pipeline and example (#12292 ) * support qwen pipeline * update error msg * style * meet review * minor	2024-10-31 11:25:25 +08:00
Yina Chen	0763268e4c	[NPU]Qwen2 groupwise performance opt (#12299 ) * qwen2 gw performance opt * remove debug	2024-10-30 17:40:21 +08:00
binbin Deng	41b8064554	Support minicpm-1B in level0 pipeline (#12297 )	2024-10-30 17:21:47 +08:00
Jinhe	46d8300f6b	bugfix for qlora finetuning on GPU (#12298 ) * bugfix for qlora 100 step error * indent fix * annotation fix	2024-10-30 16:54:10 +08:00
Yina Chen	70037ad55f	Groupwise prefill optimization (#12291 ) * except lm_head * remove * support gw lm_head * update * fix * remove run.bat * fix style * support llama3 * slice -> split * remove debug * fix style * add dpu	2024-10-30 14:59:45 +08:00
Yishuo Wang	540eaeb12c	refactor attention_softmax (#12295 )	2024-10-30 13:20:50 +08:00
Ruonan Wang	2b2cb9c693	[NPU pipeline] Support save & load and update examples (#12293 ) * support save & load, update llama examples * update baichuan2 example * update readme	2024-10-30 10:02:00 +08:00
Yuwen Hu	5a15098835	Initial support for quantized forward on CPU when `quantization_group_size=0` (#12282 ) * Initial support for quantized forward on CPU when quantization_group_size=0 * Style fix * Style fix * Small fix * Small fix	2024-10-29 19:40:17 +08:00
binbin Deng	3feb58d1e4	Support baichuan2 for level0 pipeline (#12289 )	2024-10-29 19:24:16 +08:00
Zhao Changmin	546f455e8e	Patch sdpa check function in specific module attributes table (#12285 )	2024-10-29 18:41:09 +08:00
Ruonan Wang	821b0033ed	[NPU L0] update layernorm & code refactor (#12287 ) * update layernorm & code refactor * fix style * add common utils * change to Pool() * remove print	2024-10-29 15:01:45 +08:00
Yina Chen	4467645088	[NPU] Support l0 Llama groupwise (#12276 ) * except lm_head * remove * support gw lm_head * update * fix * remove run.bat * fix style * support llama3	2024-10-28 17:06:55 +08:00
Ruonan Wang	3fe2ea3081	[NPU] Reuse prefill of acc lib for pipeline (#12279 ) * first commit * update example * fix style * update example * embedding as const * fix generate * code refactor * meet code review * fix style * change max_output_len to max_context_len * fix all-in-one * fix example * add check for new tokens	2024-10-28 16:05:49 +08:00
SONG Ge	08cb065370	hot-fix redundant import funasr (#12277 )	2024-10-25 19:40:39 +08:00
SONG Ge	a0c6432899	[NPU] Add support for loading a FunASR model (#12073 ) * add support for loading funasr model * add initial support for paraformer-encoder * add npu ops impl * add encoder-decoder npu pipeline * move paraformer encoders prefix 30 layers to npu and keep the rest layers on cpu	2024-10-25 17:22:01 +08:00
Yuwen Hu	43b25a2fe7	Fix llama 3.2 vision on LNL (#12264 ) * Fix llama 3.2 vision on LNL * Small fix	2024-10-25 16:23:31 +08:00
Ruonan Wang	ae57e23e4f	fix incompatibility between llama GW & llama pipeline (#12267 ) * fix * fix	2024-10-25 10:31:44 +08:00
Yina Chen	b5e663854b	[NPU] Support llama groupwise (#12260 ) * support llama gw * support llama gw lm_head * fix style * remove unused code	2024-10-24 18:06:45 +08:00
Xin Qiu	39c9d1de52	fix code geex (#12261 )	2024-10-24 14:34:01 +08:00
Yishuo Wang	f3a2b20e6b	Optimize gpt2 (#12259 )	2024-10-24 13:44:24 +08:00
Ruonan Wang	821fd96367	Initial integrate our L0 Llama impl into ipex-llm (#12255 ) * temp save * initial support * fix * simplify code * fix style * fix example * make default value of pipeline as False	2024-10-24 09:49:27 +08:00
Yishuo Wang	cacc891962	Fix PR validation (#12253 )	2024-10-23 18:10:47 +08:00
binbin Deng	b685cf4349	Fix npu group size setting of optimize_model=False (#12256 )	2024-10-23 17:53:54 +08:00
binbin Deng	567b77a76b	Support IR and blob format for llama level0 pipeline (#12251 )	2024-10-23 16:02:35 +08:00
Yishuo Wang	578aef245d	Fix models auto choose SdpaAttention with ipex 2.3 (#12252 )	2024-10-23 15:33:45 +08:00
Yishuo Wang	88dc120a4c	fix fp16 linear (#12250 )	2024-10-23 14:35:19 +08:00
Yina Chen	e8cf7f32f5	npu gw small fix (#12249 )	2024-10-23 14:26:01 +08:00
Yina Chen	e37f951cce	[NPU] Groupwise (#12241 ) * dq divide * fix * support attn divide * update qwen2 7b * divide down_proj & other linear * use concat & reduce sum * support scale after * support qwen2 * w/ mm * update reshape * spda * split * split 2+ * update * lm head-> 28 * no scale * update * update * update * fix style * fix style * to split linear * update * update code * address comments * fix style & remove redundant code & revert benchmark scripts * fix style & remove code * update save & load --------- Co-authored-by: Yang Wang <yang3.wang@intel.com>	2024-10-23 14:10:58 +08:00
Yina Chen	ec465fbcd7	Add lookup generate in load_low_bit (#12243 ) * add lookup generate in load_low_bit * update comment	2024-10-22 15:51:52 +08:00
Yuwen Hu	b3df47486d	Fix Gemma 2 on LNL (#12240 ) * Fix gemma 2 on LNL * Python style fix	2024-10-21 18:25:53 +08:00
Yishuo Wang	9ea694484d	refactor ot remove old rope usage (#12224 )	2024-10-17 17:06:09 +08:00
Yishuo Wang	324bcb057e	refactor to reduce old rope usage (#12219 )	2024-10-17 14:45:09 +08:00
Yishuo Wang	a4a758656a	refactor gemma to reduce old fuse rope usage (#12215 )	2024-10-16 17:40:28 +08:00
Yishuo Wang	9104a168f6	refactor phi-2 to reduce old fuse rope usage (#12214 )	2024-10-16 17:08:14 +08:00
Yishuo Wang	bb247e991b	refactor merge_qkv and attention_softmax (#12213 )	2024-10-16 15:58:14 +08:00
Yishuo Wang	e279148aa0	optimize llama3.2 vision again (#12211 )	2024-10-16 14:29:48 +08:00
Yishuo Wang	f6611f9d3a	optimize llama3.2 vison attention again (#12204 )	2024-10-15 16:08:20 +08:00
Yishuo Wang	9b81236a2e	optimzie qwen2-vl vision (#12203 )	2024-10-15 15:54:25 +08:00
Yishuo Wang	d5344587ab	optimize internvl2 vision model's attention (#12198 )	2024-10-15 10:51:00 +08:00
Yuwen Hu	f8d1adc573	Fix Llama 3.2 & 3.1 on LNL (#12196 )	2024-10-14 17:39:20 +08:00

1 2 3 4 5 ...

536 commits