ipex-llm

Author	SHA1	Message	Date
Xin Qiu	30795bdfbc	Gemma optimization: rms_norm, kv_cache, fused_rope, fused_rope+qkv (#10212 ) * gemma optimization * update * update * fix style * meet code review	2024-02-23 10:07:24 +08:00
Guoqiong Song	63681af97e	falcon for transformers 4.36 (#9960 ) * falcon for transformers 4.36	2024-02-22 17:04:40 -08:00
Jason Dai	84d5f40936	Update README.md (#10213 )	2024-02-22 17:22:59 +08:00
Yina Chen	ce5840a8b7	GPT-J rope optimization on xpu (#10182 ) * optimize * update * fix style & move use_fuse_rope * add ipex version check * fix style * update * fix style * meet comments * address comments * fix style	2024-02-22 16:25:12 +08:00
Xiangyu Tian	f445217d02	LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189 ) Update IPEX to 2.2.0+cpu and refactor for _ipex_optimize.	2024-02-22 16:01:11 +08:00
Heyang Sun	c876d9b5ca	Support for MPT rotary embedding (#10208 )	2024-02-22 15:16:31 +08:00
Ruonan Wang	5e1fee5e05	LLM: add GGUF-IQ2 examples (#10207 ) * add iq2 examples * small fix * meet code review * fix * meet review * small fix	2024-02-22 14:18:45 +08:00
Yuwen Hu	21de2613ce	[LLM] Add model loading time record for all-in-one benchmark (#10201 ) * Add model loading time record in csv for all-in-one benchmark * Small fix * Small fix to number after .	2024-02-22 13:57:18 +08:00
Ovo233	60e11b6739	LLM: Add mlp layer unit tests (#10200 ) * add mlp layer unit tests * add download baichuan-13b * exclude llama for now * install additional packages * rename bash file * switch to Baichuan2 * delete attention related code * fix name errors in yml file	2024-02-22 13:44:45 +08:00
SONG Ge	ca1166a0e5	[LLM] Add quantize kv_cache for Baichuan2-13B (#10203 ) * add quantize kv_cache for baichuan2-13b * style fix	2024-02-22 13:43:35 +08:00
Ruonan Wang	34ee1aa91f	LLM: add esimd sdp support for chatglm3 (#10205 ) * add esimd sdp support * fix style	2024-02-22 13:37:16 +08:00
Yuxuan Xia	7cbc2429a6	Fix C-Eval ChatGLM loading issue (#10206 ) * Add c-eval workflow and modify running files * Modify the chatglm evaluator file * Modify the ceval workflow for triggering test * Modify the ceval workflow file * Modify the ceval workflow file * Modify ceval workflow * Adjust the ceval dataset download * Add ceval workflow dependencies * Modify ceval workflow dataset download * Add ceval test dependencies * Add ceval test dependencies * Correct the result print * Fix the nightly test trigger time * Fix ChatGLM loading issue	2024-02-22 10:00:43 +08:00
Yuwen Hu	94cb16fe40	[LLM] Small updates to Win GPU Install Doc (#10199 ) * Make Offline installer as default for win gpu doc for oneAPI * Small other fixes	2024-02-21 17:58:40 +08:00
binbin Deng	9975b029c5	LLM: add qlora finetuning example using `trl.SFTTrainer` (#10183 )	2024-02-21 16:40:04 +08:00
Ruonan Wang	f7c96b19ef	LLM: support iq2 for mixtral (#10191 ) * support name mapping for mixtral * support mixtral mixed quantization * fix style * fix	2024-02-21 16:00:29 +08:00
yb-peng	b1a97b71a9	Harness eval: Add is_last parameter and fix logical operator in highlight_vals (#10192 ) * Add is_last parameter and fix logical operator in highlight_vals * Add script to update HTML files in parent folder * Add running update_html_in_parent_folder.py in summarize step * Add licence info * Remove update_html_in_parent_folder.py in Summarize the results for pull request	2024-02-21 14:45:32 +08:00
Zhicun	c7e839e66c	Add Qwen1.5-7B-Chat (#10113 ) * add Qwen1.5-7B-Chat * modify Qwen1.5 example * update README * update prompt format * update folder name and example README * add Chinese prompt sample output * update link in README * correct the link * update transformer version	2024-02-21 13:29:29 +08:00
Xin Qiu	56ad781f2f	qwen2 cpu fix (#10187 )	2024-02-21 11:23:51 +08:00
Chen, Zhentao	39d37bd042	upgrade harness package version in workflow (#10188 ) * upgrade harness * update readme	2024-02-21 11:21:30 +08:00
Yuwen Hu	001c13243e	[LLM] Add support for `low_low_bit` benchmark on Windows GPU (#10167 ) * Add support for low_low_bit performance test on Windows GPU * Small fix * Small fix * Save memory during converting model process * Drop the results for first time when loading in low bit on mtl igpu for better performance * Small fix	2024-02-21 10:51:52 +08:00
Ziteng Zhang	276ef0e885	Speculative Ziya on CPU (#10160 ) * Speculative Ziya on CPU * Without part of Accelerate with BIGDL_OPT_IPEX	2024-02-21 10:30:39 +08:00
Zhao Changmin	4fbf449c2d	for rwkv4 (#10179 )	2024-02-21 10:11:10 +08:00
yb-peng	de3dc609ee	Modify harness evaluation workflow (#10174 ) * Modify table head in harness * Specify the file path of fp16.csv * change run to run nightly and run pr to debug * Modify the way to get fp16.csv to downloading from github * Change the method to calculate diff in html table * Change the method to calculate diff in html table * Re-arrange job order * Re-arrange job order * Change limit * Change fp16.csv path * Change highlight rules * Change limit	2024-02-20 18:55:43 +08:00
Ruonan Wang	3288acb8de	LLM : Support embedding quantization (only q2k now) (#10170 ) * basic logic added * basic support * support save&load, update mixed strategy * fix style * use int8 for lm_head * add check for xpu	2024-02-20 16:56:57 +08:00
hxsz1997	6e10d98a8d	Fix some typos (#10175 ) * add llm-ppl workflow * update the DATASET_DIR * test multiple precisions * modify nightly test * match the updated ppl code * add matrix.include * fix the include error * update the include * add more model * update the precision of include * update nightly time and add more models * fix the workflow_dispatch description, change default model of pr and modify the env * modify workflow_dispatch language options * modify options * modify language options * modeify workflow_dispatch type * modify type * modify the type of language * change seq_len type * fix some typos * revert changes to stress_test.txt	2024-02-20 14:14:53 +08:00
Zhicun	add3899311	Add ziya CPU example (#10114 ) * ziya on CPU * add README for ziya * specify use_cache * add arc CPU * update prompt format * update link * add comments to emphasize use_cache * update pip cmd	2024-02-20 13:59:52 +08:00
binbin Deng	2bb96c775c	LLM: fix device setting during saving optimized model (#10154 )	2024-02-20 09:52:59 +08:00
Xin Qiu	1f6d5b9f30	enable fused rmsnorm and rope qwen2 (#10163 ) * qwen2 * change convert * cleanup	2024-02-20 08:33:09 +08:00
yb-peng	e31210ba00	Modify html table style and add fp16.csv in harness (#10169 ) * Specify the version of pandas in harness evaluation workflow * Specify the version of pandas in harness evaluation workflow * Modify html table style and add fp16.csv in harness * Modify comments	2024-02-19 18:13:40 +08:00
WeiguangHan	6c09aed90d	LLM: add qwen_1.5_7b model for arc perf test (#10166 ) * LLM: add qwen_1.5_7b model for arc perf test * small fix * revert some codes	2024-02-19 17:21:00 +08:00
Yuxuan Xia	209122559a	Add Ceval workflow and modify the result printing (#10140 ) * Add c-eval workflow and modify running files * Modify the chatglm evaluator file * Modify the ceval workflow for triggering test * Modify the ceval workflow file * Modify the ceval workflow file * Modify ceval workflow * Adjust the ceval dataset download * Add ceval workflow dependencies * Modify ceval workflow dataset download * Add ceval test dependencies * Add ceval test dependencies * Correct the result print	2024-02-19 17:06:53 +08:00
Zhao Changmin	f8730e8dc1	Skip rescale rwkv linear when load_low_bit (#10164 ) * rwkv_ld	2024-02-19 15:56:42 +08:00
Heyang Sun	3e2af5ec0a	Fix IPEX Baichuan Speculative (#10162 ) * Fix IPEX Baichuan Speculative * compatible with 13B * Update speculative.py	2024-02-19 15:27:34 +08:00
Yina Chen	23c91cdce6	[LLM] Add min_step_draft in speculative decoding (#10142 ) * Fix gptj kvcache & position id * Add min_draft_tokens in speculative decoding * fix style * update	2024-02-19 14:31:41 +08:00
Chen, Zhentao	14ba2c5135	Harness: remove deprecated files (#10165 )	2024-02-19 14:27:49 +08:00
Wang, Jian4	d3591383d5	LLM : Add CPU chatglm3 speculative example (#10004 ) * init chatglm * update * update	2024-02-19 13:38:52 +08:00
Wang, Jian4	f2417e083c	LLM: enable chatglm3-6b target_model ipex (#10085 ) * init * always make casual_mask * not return last tensor * update * optimize_model = False * enable optimized=False * enable optimized_model=true * speed_up ipex target_model * remove if True * use group_size * update python style * update * update	2024-02-19 13:38:32 +08:00
Heyang Sun	177273c1a4	IPEX Speculative Support for Baichuan2 7B (#10112 ) * IPEX Speculative Support for Baichuan2 7B * fix license problems * refine	2024-02-19 09:12:57 +08:00
Yina Chen	1508d6b089	Fix gptj kvcache & position id (#10141 )	2024-02-18 10:02:49 +08:00
yb-peng	b4dc33def6	In harness-evaluation workflow, add statistical tables (#10118 ) * chnage storage * fix typo * change label * change label to arc03 * change needs in the last step * add generate csv in harness/make_table_results.py * modify needs in the last job * add csv to html * mfix path issue in llm-harness-summary-nightly * modify output_path * modify args in make_table_results.py * modify make table command in summary * change pr env label * remove irrelevant code in summary; add set output path step; add limit in harness run * re-organize code structure * modify limit in run harness * modify csv_to_html input path * modify needs in summary-nightly	2024-02-08 19:01:05 +08:00
Yishuo Wang	4d33aac7f9	quick fix qwen2 fp8 kv cache (#10135 )	2024-02-08 17:04:59 +08:00
Cengguang Zhang	39d90839aa	LLM: add quantize kv cache for llama. (#10086 ) * feat: add quantize kv cache for llama. * fix style. * add quantized attention forward function. * revert style. * fix style. * fix style. * update quantized kv cache and add quantize_qkv * fix style. * fix style. * optimize quantize kv cache. * fix style.	2024-02-08 16:49:22 +08:00
Yishuo Wang	d848efe17c	add quantize kv cache support for qwen2 (#10134 )	2024-02-08 16:17:21 +08:00
SONG Ge	3f79128ed7	[LLM] Enable kv_cache optimization for Qwen2 on transformers-v4.37.0 (#10131 ) * add support for kv_cache optimization on transformers-v4.37.0 * enable attention forward * style fix * disable rotary for now	2024-02-08 14:20:26 +08:00
Ruonan Wang	063dc145ac	LLM: basic support for q2k (#10132 ) * basic support for q2k * fix style	2024-02-08 13:52:01 +08:00
binbin Deng	11fe5a87ec	LLM: add Modelscope model example (#10126 )	2024-02-08 11:18:07 +08:00
Cengguang Zhang	0cf6a12691	LLM: add default torch_dtype for fp16. (#10124 ) * set default torch_dtype for fp16. * fix style. * bug fix. * update bug fix.	2024-02-08 10:24:16 +08:00
Yishuo Wang	1aa0c623ce	disable fused layer norm on UHD (#10130 )	2024-02-08 10:20:01 +08:00
Yuwen Hu	a8450fc300	[LLM] Support MLP optimization for Qwen1.5 (#10123 )	2024-02-08 09:15:34 +08:00
Yuwen Hu	81ed65fbe7	[LLM] Add qwen1.5-7B in iGPU perf (#10127 ) * Add qwen1.5 test config yaml with transformers 4.37.0 * Update for yaml file	2024-02-07 22:31:20 +08:00
Jin Qiao	0fcfbfaf6f	LLM: add rwkv5 eagle GPU HF example (#10122 ) * LLM: add rwkv5 eagle example * fix * fix link	2024-02-07 16:58:29 +08:00
binbin Deng	925f82107e	LLM: support models hosted by modelscope (#10106 )	2024-02-07 16:46:36 +08:00
binbin Deng	c1ec3d8921	LLM: update FAQ about too many open files (#10119 )	2024-02-07 15:02:24 +08:00
Keyan (Kyrie) Zhang	2e80701f58	Unit test on final logits and the logits of the last attention layer (#10093 ) * Add unit test on final logits and attention * Add unit test on final logits and attention * Modify unit test on final logits and attention	2024-02-07 14:25:36 +08:00
Yuxuan Xia	3832eb0ce0	Add ChatGLM C-Eval Evaluator (#10095 ) * Add ChatGLM ceval evaluator * Modify ChatGLM Evaluator Reference	2024-02-07 11:27:06 +08:00
Jin Qiao	63050c954d	fix (#10117 )	2024-02-07 11:05:11 +08:00
Jin Qiao	d3d2ee1b63	LLM: add speech T5 GPU example (#10090 ) * add speech t5 example * fix * fix	2024-02-07 10:50:02 +08:00
Jin Qiao	2f4c754759	LLM: add bark gpu example (#10091 ) * add bark gpu example * fix * fix license * add bark * add example * fix * another way	2024-02-07 10:47:11 +08:00
Xiangyu Tian	8953acd7d6	[LLM] Fix log condition for BIGDL_OPT_IPEX (#10115 ) Fix log condition for BIGDL_OPT_IPEX	2024-02-07 10:27:10 +08:00
SONG Ge	0eccb94d75	remove text-generation-webui from bigdl repo (#10107 )	2024-02-06 17:46:52 +08:00
Ovo233	2aaa21c41d	LLM: Update ppl tests (#10092 ) * update ppl tests * use load_dataset api * add exception handling * add language argument * address comments	2024-02-06 17:31:48 +08:00
Yuwen Hu	3a46b57253	[LLM] Add RWKV4 HF GPU Example (#10105 ) * Add GPU HF example for RWKV 4 * Add link to rwkv4 * fix	2024-02-06 16:30:24 +08:00
Yuwen Hu	518ef95abc	Small fix for Nonetype error (#10104 )	2024-02-06 14:58:52 +08:00
Ruonan Wang	d61f4905ac	LLM: 2bit quantization initial support (#10042 ) * basis quantize support * fix new module name * small update * and mixed int4 with iq2_xxs * remove print * code refactor * fix style * meet code review	2024-02-06 14:58:32 +08:00
dingbaorong	36c9442c6d	Arc Stable version test (#10087 ) * add batch_size in stable version test * add batch_size in excludes * add excludes for batch_size * fix ci * triger regression test * fix xpu version * disable ci * address kai's comment --------- Co-authored-by: Ariadne <wyn2000330@126.com>	2024-02-06 10:23:50 +08:00
Jiao Wang	33b9e7744d	fix dimension (#10097 )	2024-02-05 15:07:38 -08:00
SONG Ge	4b02ff188b	[WebUI] Add prompt format and stopping words for Qwen (#10066 ) * add prompt format and stopping_words for qwen mdoel * performance optimization * optimize * update * meet comments	2024-02-05 18:23:13 +08:00
WeiguangHan	0aecd8637b	LLM: small fix for the html script (#10094 )	2024-02-05 17:27:34 +08:00
Zhicun	7d2be7994f	add phixtral and optimize phi-moe (#10052 )	2024-02-05 11:12:47 +08:00
Zhicun	676d6923f2	LLM: modify transformersembeddings.embed() in langchain (#10051 )	2024-02-05 10:42:10 +08:00
Jin Qiao	ad050107b3	LLM: fix mpt load_low_bit issue (#10075 ) * fix * retry * retry	2024-02-05 10:17:07 +08:00
SONG Ge	9050991e4e	fix gradio check issue temply (#10082 )	2024-02-04 16:46:29 +08:00
WeiguangHan	c2e562d037	LLM: add batch_size to the csv and html (#10080 ) * LLM: add batch_size to the csv and html * small fix	2024-02-04 16:35:44 +08:00
binbin Deng	7e49fbc5dd	LLM: make finetuning examples more common for other models (#10078 )	2024-02-04 16:03:52 +08:00
Heyang Sun	90f004b80b	remove benchmarkwrapper form deepspeed example (#10079 )	2024-02-04 15:42:15 +08:00
Ruonan Wang	8e33cb0f38	LLM: support speecht5_tts (#10077 ) * support speecht5_tts * fix	2024-02-04 13:26:42 +08:00
ivy-lv11	428b7105f6	Add HF and PyTorch example InternLM2 (#10061 )	2024-02-04 10:25:55 +08:00
Yina Chen	77be19bb97	LLM: Support gpt-j in speculative decoding (#10067 ) * gptj * support gptj in speculative decoding * fix * update readme * small fix	2024-02-02 14:54:55 +08:00
SONG Ge	19183ef476	[WebUI] Reset bigdl-llm loader options with default value (#10064 ) * reset bigdl-llm loader options with default value * remove options which maybe complex for naive users	2024-02-01 15:45:39 +08:00
Xin Qiu	6e0f1a1e92	use apply_rotary_pos_emb_cache_freq_xpu in mixtral (#10060 ) * use apply_rotary_pos_emb_cache_freq_xpu in mixtral * fix style	2024-02-01 15:40:49 +08:00
binbin Deng	aae20d728e	LLM: Add initial DPO finetuning example (#10021 )	2024-02-01 14:18:08 +08:00
Heyang Sun	601024f418	Mistral CPU example of speculative decoding (#10024 ) * Mistral CPU example of speculative decoding * update transformres version * update example * Update README.md	2024-02-01 10:52:32 +08:00
Heyang Sun	968e70544d	Enable IPEX Mistral in Speculative (#10059 )	2024-02-01 10:48:16 +08:00
Yina Chen	3ca03d4e97	Add deepmind sample into bigdl-llm speculative decoding (#10041 ) * migrate deepmind sample * update * meet comments * fix style * fix style	2024-02-01 09:57:02 +08:00
WeiguangHan	d2d3f6b091	LLM: ensure the result of daily arc perf test (#10016 ) * ensure the result of daily arc perf test * small fix * small fix * small fix * small fix * small fix * small fix * small fix * small fix * small fix * small fix * concat more csvs * small fix * revert some files	2024-01-31 18:26:21 +08:00
WeiguangHan	9724939499	temporarily disable bloom 2k input (#10056 )	2024-01-31 17:49:12 +08:00
Jin Qiao	8c8fc148c9	LLM: add rwkv 5 (#10048 )	2024-01-31 15:54:55 +08:00
WeiguangHan	a9018a0e95	LLM: modify the GPU example for redpajama model (#10044 ) * LLM: modify the GPU example for redpajama model * small fix	2024-01-31 14:32:08 +08:00
Yuxuan Xia	95636cad97	Add AutoGen CPU and XPU Example (#9980 ) * Add AutoGen example * Adjust AutoGen README * Adjust AutoGen README * Change AutoGen README * Change AutoGen README	2024-01-31 11:31:18 +08:00
Heyang Sun	7284edd9b7	Vicuna CPU example of speculative decoding (#10018 ) * Vicuna CPU example of speculative decoding * Update speculative.py * Update README.md * add requirements for ipex * Update README.md * Update speculative.py * Update speculative.py	2024-01-31 11:23:50 +08:00
Wang, Jian4	7e5cd42a5c	LLM : Update optimize ipex bf16 (#10038 ) * use 4.35.2 and remove * update rmsnorm * remove * remove * update python style * update * update python style * update * fix style * update * remove whitespace	2024-01-31 10:59:55 +08:00
Wang, Jian4	fb53b994f8	LLM : Add llama ipex optimized (#10046 ) * init ipex * remove padding	2024-01-31 10:38:46 +08:00
Ruonan Wang	3685622f29	LLM: fix llama 4.36 forward(#10047 )	2024-01-31 10:31:10 +08:00
Yishuo Wang	53a5140eff	Optimize rwkv v5 rest token again (#10043 )	2024-01-31 10:01:11 +08:00
Heyang Sun	b1ff28ceb6	LLama2 CPU example of speculative decoding (#9962 ) * LLama2 example of speculative decoding * add docs * Update speculative.py * Update README.md * Update README.md * Update speculative.py * remove autocast	2024-01-31 09:45:20 +08:00
WeiguangHan	0fcad6ce14	LLM: add gpu example for redpajama models (#10040 )	2024-01-30 19:39:28 +08:00
Xiangyu Tian	9978089796	[LLM] Enable BIGDL_OPT_IPEX in speculative baichuan2 13b example (#10028 ) Enable BIGDL_OPT_IPEX in speculative baichuan2 13b example	2024-01-30 17:11:37 +08:00
Ovo233	226f398c2a	fix ppl test errors (#10036 )	2024-01-30 16:26:21 +08:00
Xin Qiu	13e61738c5	hide detail memory for each token in benchmark_utils.py (#10037 )	2024-01-30 16:04:17 +08:00
Ruonan Wang	6b63ba23d1	LLM: add full module name during convert (#10035 )	2024-01-30 14:43:07 +08:00

1 2 3 4 5 ...

962 commits