ipex-llm

Author	SHA1	Message	Date
Kai Huang	4c112ee70c	Rename qwen in model name for arc perf test (#9712 )	2023-12-18 20:34:31 +08:00
Cengguang Zhang	4d22add4af	LLM: fix qwen efficiency issue in perf-test.	2023-12-18 18:32:54 +08:00
Ruonan Wang	8ed89557e5	LLM: add mlp optimization of mixtral (#9709 )	2023-12-18 16:59:52 +08:00
Chen, Zhentao	b3647507c0	Fix harness workflow (#9704 ) * error when larger than 0.001 * fix env setup * fix typo * fix typo	2023-12-18 15:42:10 +08:00
binbin Deng	12df70953e	LLM: add resume_from_checkpoint related section (#9705 )	2023-12-18 12:27:02 +08:00
Xin Qiu	320110d158	handle empty fused norm result (#9688 ) * handle empty fused norm result * remove fast_rms_norm * fix style	2023-12-18 09:56:11 +08:00
Lilac09	a5c481fedd	add chat.py denpendency in Dockerfile (#9699 )	2023-12-18 09:00:22 +08:00
Yuwen Hu	c00a9144c4	[LLM] Small fixes to llm gpu installation guide (#9700 )	2023-12-15 23:13:33 +08:00
Ziteng Zhang	67cc155771	[LLM] Correct chat format of llama and add llama_stream_chat in chat.py * correct chat format of llama * add llama_stream_chat	2023-12-15 16:36:46 +08:00
Ziteng Zhang	0d41b7ba7b	[LLM] Correct chat format & add stop words for chatglm3 in chat.py * correct chat format of chatglm3 * correct stop words of chatglm3	2023-12-15 16:35:17 +08:00
Ziteng Zhang	d57efd8eb9	[LM] Add stop_word for Qwen model and correct qwen chat format in chat.py (#9642 ) * add stop words list for qwen * change qwen chat format	2023-12-15 14:53:58 +08:00
SONG Ge	d5b81af7bd	Support mixtral attention optimization on transformers-v4.36.0 (#9674 ) * add example code to support mistral/mixtral attention on transformers v4.36.0 * update * style fix * add update for seen-tokens * support mixtral * rm mistral change * small fix * add more comments and remove use_cache part --------- Co-authored-by: plusbang <binbin1.deng@intel.com>	2023-12-15 14:30:23 +08:00
Cengguang Zhang	adbef56001	LLM: update qwen attention forward. (#9695 ) * feat: update qwen attention forward. * fix: style.	2023-12-15 14:06:15 +08:00
Wang, Jian4	b8437a1c1e	LLM: Add gguf mistral model support (#9691 ) * add mistral support * need to upgrade transformers version * update	2023-12-15 13:37:39 +08:00
Wang, Jian4	496bb2e845	LLM: Support load BaiChuan model family gguf model (#9685 ) * support baichuan model family gguf model * update gguf generate.py * add verify models * add support model_family * update * update style * update type * update readme * update * remove support model_family	2023-12-15 13:34:33 +08:00
Lilac09	3afed99216	fix path issue (#9696 )	2023-12-15 11:21:49 +08:00
Jason Dai	37f509bb95	Update readme (#9692 )	2023-12-14 19:50:21 +08:00
WeiguangHan	1f0245039d	LLM: check the final csv results for arc perf test (#9684 ) * LLM: check the final csv results for arc perf test * delete useless python script * change threshold * revert the llm_performance_tests.yml	2023-12-14 19:46:08 +08:00
Yuwen Hu	68d0c255fc	[LLM] Update GPU installation doc (#9693 ) * Temp * Add win 2.1 doc supports * Reorg layout * Fix based on comments * Small fix	2023-12-14 18:32:51 +08:00
Yishuo Wang	9a330bfc2b	fix fuse mlp when using q5_0 or fp8 (#9689 )	2023-12-14 16:16:05 +08:00
Yuwen Hu	82ac2dbf55	[LLM] Small fixes for win igpu test for ipex 2.1 (#9686 ) * Fixes to install for igpu performance tests * Small update for core performance tests model lists	2023-12-14 15:39:51 +08:00
WeiguangHan	3e8d198b57	LLM: add eval func (#9662 ) * Add eval func * add left eval	2023-12-14 14:59:02 +08:00
Ziteng Zhang	21c7503a42	[LLM] Correct prompt format of Qwen in generate.py (#9678 ) * Change qwen prompt format to chatml	2023-12-14 14:01:30 +08:00
Qiyuan Gong	223c9622f7	[LLM] Mixtral CPU examples (#9673 ) * Mixtral CPU PyTorch and hugging face examples, based on #9661 and #9671	2023-12-14 10:35:11 +08:00
Xin Qiu	5e46e0e5af	fix baichuan2-7b 1st token performance regression on xpu (#9683 ) * fix baichuan2-7b 1st token performance regression * add comments * fix style	2023-12-14 09:58:32 +08:00
ZehuaCao	877229f3be	[LLM]Add Yi-34B-AWQ to verified AWQ model. (#9676 ) * verfiy Yi-34B-AWQ * update	2023-12-14 09:55:47 +08:00
binbin Deng	68a4be762f	remove disco mixtral, update oneapi version (#9671 )	2023-12-13 23:24:59 +08:00
Ruonan Wang	1456d30765	LLM: add dot to option name in setup (#9682 )	2023-12-13 20:57:27 +08:00
Yuwen Hu	cbdd49f229	[LLM] win igpu performance for ipex 2.1 and oneapi 2024.0 (#9679 ) * Change igpu win tests for ipex 2.1 and oneapi 2024.0 * Qwen model repo id updates; updates model list for 512-64 * Add .eval for win igpu all-in-one benchmark for best performance	2023-12-13 18:52:29 +08:00
Mingyu Wei	16febc949c	[LLM] Add exclude option in all-in-one performance test (#9632 ) * add exclude option in all-in-one perf test * update arc-perf-test.yaml * Exclude in_out_pairs in main function * fix some bugs * address Kai's comments * define excludes at the beginning * add bloomz:2048 to exclude	2023-12-13 18:13:06 +08:00
Ruonan Wang	9b9cd51de1	LLM: update setup to provide new install option to support ipex 2.1 & oneapi 2024 (#9647 ) * update setup * default to 2.0 now * meet code review	2023-12-13 17:31:56 +08:00
Yishuo Wang	09ca540f9b	use fuse mlp in qwen (#9672 )	2023-12-13 17:20:08 +08:00
Ruonan Wang	c7741c4e84	LLM: update moe block convert to optimize rest token latency of Mixtral (#9669 ) * update moe block convert * further accelerate final_hidden_states * fix style * fix style	2023-12-13 16:17:06 +08:00
ZehuaCao	503880809c	verfiy codeLlama (#9668 )	2023-12-13 15:39:31 +08:00
Xiangyu Tian	1c6499e880	[LLM] vLLM: Support Mixtral Model (#9670 ) Add Mixtral support for BigDL vLLM.	2023-12-13 14:44:47 +08:00
Ruonan Wang	dc5b1d7e9d	LLM: integrate sdp kernel for FP16 rest token inference on GPU [DG2/ATSM] (#9633 ) * integrate sdp * update api * fix style * meet code review * fix * distinguish mtl from arc * small fix	2023-12-13 11:29:57 +08:00
Qiyuan Gong	5b0e7e308c	[LLM] Add support for empty activation (#9664 ) * Add support for empty activation, e.g., [0, 4096]. Empty activation is allowed by PyTorch. * Add comments.	2023-12-13 11:07:45 +08:00
SONG Ge	284e7697b1	[LLM] Optimize ChatGLM2 kv_cache to support beam_search on ARC (#9579 ) * optimize kv_cache to support beam_search on Arc * correctness test update * fix query_length issue * simplify implementation * only enable the optimization on gpu device * limit the beam_search support only enabled with gpu device and batch_size > 1 * add comments for beam_search case and revert ut change * meet comments * add more comments to describe the differece between multi-cases	2023-12-13 11:02:14 +08:00
Heyang Sun	c64e2248ef	fix str returned by get_int_from_str rather than expected int (#9667 )	2023-12-13 11:01:21 +08:00
binbin Deng	bf1bcf4a14	add official Mixtral model support (#9663 )	2023-12-12 22:27:07 +08:00
Ziteng Zhang	8931f2eb62	[LLM] Fix transformer qwen size mismatch and rename causal_mask (#9655 ) * Fix size mismatching caused by context_layer * Change registered_causal_mask to causal_mask	2023-12-12 20:57:40 +08:00
binbin Deng	2fe38b4b9b	LLM: add mixtral GPU examples (#9661 )	2023-12-12 20:26:36 +08:00
Yuwen Hu	968d99e6f5	Remove empty cache between each iteration of generation (#9660 )	2023-12-12 17:24:06 +08:00
Xin Qiu	0e639b920f	disable test_optimized_model.py temporarily due to out of memory on A730M(pr validation machine) (#9658 ) * disable test_optimized_model.py * disable seq2seq	2023-12-12 17:13:52 +08:00
binbin Deng	59ce86d292	LLM: support `optimize_model=True` for Mixtral model (#9657 )	2023-12-12 16:41:26 +08:00
Yuwen Hu	017932a7fb	Small fix for html generation (#9656 )	2023-12-12 14:06:18 +08:00
WeiguangHan	1e25499de0	LLM: test new oneapi (#9654 ) * test new oneapi * revert llm_performance_tests.yml	2023-12-12 11:12:14 +08:00
Yuwen Hu	d272b6dc47	[LLM] Enable generation of html again for win igpu tests (#9652 ) * Enable generation of html again and comment out rwkv for 32-512 as it is not very stable * Small fix	2023-12-11 19:15:17 +08:00
WeiguangHan	afa895877c	LLM: fix the issue that may generate blank html (#9650 ) * LLM: fix the issue that may generate blank html * reslove some comments	2023-12-11 19:14:57 +08:00
Yining Wang	a04a027b4c	Edit gpu doc (#9583 ) * harness: run llama2-7b * harness: run llama2-7b * harness: run llama2-7b * harness: run llama2-7b * edit-gpu-doc * fix some format problem * fix spelling problems * fix evaluation yml * delete redundant space * fix some problems * address comments * change link	2023-12-11 14:59:07 +08:00

... 3 4 5 6 7 ...

2050 commits