ipex-llm

Author	SHA1	Message	Date
Jinhe	da3d7a3a53	delete transformers version requirement (#11845 ) * delete transformers version requirement * delete transformers version requirement	2024-08-19 17:53:02 +08:00
Ruonan Wang	a0fbda5bc8	add MiniCPM-Llama3-V-2_5 into all-in-one benchmark (#11849 )	2024-08-19 17:51:16 +08:00
Yishuo Wang	9490781aec	optimize phi3 memory usage again (#11848 )	2024-08-19 17:26:59 +08:00
Yina Chen	3cd4e87168	Support compress KV with quantize KV (#11812 ) * update llama * support llama 4.41 * fix style * support minicpm * support qwen2 * support minicpm & update * support chatglm4 * support chatglm * remove print * add DynamicCompressFp8Cache & support qwen * support llama * support minicpm phi3 * update chatglm2/4 * small fix & support qwen 4.42 * remove print	2024-08-19 15:32:32 +08:00
Zhao Changmin	6841a9ac8f	fix load low bit com dtype (#11832 )	2024-08-19 13:43:19 +08:00
Yuwen Hu	cfc959defa	Fixes regarding utf-8 in all-in-one benchmark (#11839 )	2024-08-19 10:38:00 +08:00
Chu,Youcheng	46a1cbfa64	feat: add mixed_precision argument on ppl longbench evaluation (#11837 ) * feat: add mixed_precision argument on ppl longbench evaluation * fix: delete two spaces --------- Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>	2024-08-19 10:00:44 +08:00
Yuwen Hu	580c94d0e2	Remove gemma-2-9b-it 3k input from igpu-perf (#11834 )	2024-08-17 13:10:05 +08:00
Jin, Qiao	9f17234f3b	Add MiniCPM-V-2_6 to iGPU Perf (#11810 ) * Add MiniCPM-V-2_6 to iGPU Perf * keep last model in yaml * fix MINICPM_V_IDS * Restore tested model list * Small fix --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-08-16 18:41:21 +08:00
Yuwen Hu	96796f95cb	Update all-in-one benchmark prompts for `continuation` task & lookup update for minicpmv (#11827 ) * Update all-in-one benchmark prompts for continuation task * Small fix * Add pure-text benchmark support for minicpm-v-2_6 * Support lookahead for model.llm generate of minicpmv * Add prompt reference * Small update * Small fix	2024-08-16 17:16:35 +08:00
Yishuo Wang	e966e85df8	force lm_head optimization in any model if set environment variable (#11830 )	2024-08-16 16:48:45 +08:00
RyuKosei	3b630fb9df	updated ppl README (#11807 ) * edit README.md * update the branch * edited README.md * updated * updated description --------- Co-authored-by: jenniew <jenniewang123@gmail.com>	2024-08-16 15:49:25 +08:00
Jinhe	e07a55665c	Codegeex2 tokenization fix (#11831 ) * updated tokenizer file * updated tokenizer file * updated tokenizer file * updated tokenizer file * new folder	2024-08-16 15:48:47 +08:00
Jinhe	adfbb9124a	Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples (#11815 ) * model to fp16 & 2_6 reorganize * revisions * revisions * half * deleted transformer version requirements * deleted transformer version requirements --------- Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>	2024-08-16 14:48:56 +08:00
Chu,Youcheng	f463268e36	fix: add run oneAPI instruction for the example of codeshell (#11828 ) * fix: delete ipex extension import in ppl wikitext evaluation * feat: add mixed_precision argument on ppl wikitext evaluation * fix: delete mix_precision command in perplex evaluation for wikitext * fix: remove fp16 mixed-presicion argument * fix: Add a space. * fix: add run oneAPI instruction for the example of codeshell * fix: textual adjustments * fix: Textual adjustment --------- Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>	2024-08-16 14:29:06 +08:00
Yishuo Wang	17a0beb21f	optimize qwen2-audio again (#11825 )	2024-08-16 11:11:35 +08:00
Yuwen Hu	9e9086cc2a	Update `IPEX_LLM_PERFORMANCE_MODE` (#11823 )	2024-08-16 09:48:36 +08:00
Wang, Jian4	5a80fd2633	Fix lightweight-serving no streaming resp on mtl (#11822 )	2024-08-16 09:43:03 +08:00
Guancheng Fu	e70ae0638e	Fix vLLM not convert issues (#11817 ) * Fix not convert issues * refine	2024-08-15 19:04:05 +08:00
Yishuo Wang	750d4ad5dc	fix minicpm-v-2 fp16 (#11819 )	2024-08-15 18:34:40 +08:00
Yuwen Hu	6543321f04	Remove 4k igpu perf on gemma-2-9b-it (#11820 )	2024-08-15 18:06:19 +08:00
Chu,Youcheng	28d1c972da	add mixed_precision argument on ppl wikitext evaluation (#11813 ) * fix: delete ipex extension import in ppl wikitext evaluation * feat: add mixed_precision argument on ppl wikitext evaluation * fix: delete mix_precision command in perplex evaluation for wikitext * fix: remove fp16 mixed-presicion argument * fix: Add a space. --------- Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>	2024-08-15 17:58:53 +08:00
Yishuo Wang	828ab16537	fix phi3 and minicpmv cpu (#11818 )	2024-08-15 17:43:29 +08:00
Yishuo Wang	4e178f0c5d	rewrite minicpmv optimization (#11816 )	2024-08-15 17:27:12 +08:00
Ch1y0q	447c8ed324	update transformers version for `replit-code-v1-3b`, `internlm2-chat-… (#11811 ) * update transformers version for `replit-code-v1-3b`, `internlm2-chat-7b` and mistral * remove for default transformers version	2024-08-15 16:40:48 +08:00
Jinhe	2fbbb51e71	transformers==4.37, yi & yuan2 & vicuna (#11805 ) * transformers==4.37 * added yi model * added yi model * xxxx * delete prompt template * / and delete	2024-08-15 15:39:24 +08:00
Jinhe	f43da2d455	deletion of specification of transformers version (#11808 )	2024-08-15 15:23:32 +08:00
Yishuo Wang	07b7f13982	support and optimize qwen2-audio (#11809 )	2024-08-15 14:59:04 +08:00
Chu,Youcheng	3ac83f8396	fix: delete ipex extension import in ppl wikitext evaluation (#11806 ) Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>	2024-08-15 13:40:01 +08:00
Yishuo Wang	9a93808fc5	fix and optimize minicpm v 2 (#11799 )	2024-08-14 17:27:23 +08:00
Jinhe	d8d887edd2	added minicpm-v-2_6 (#11794 )	2024-08-14 16:23:44 +08:00
Yishuo Wang	3d6cfa291d	optimize minicpm v 2.5 (#11793 )	2024-08-14 16:07:24 +08:00
Yuwen Hu	356281cb80	Further all-in-one benchmark update `continuation` task (#11784 ) * Further update prompt for continuation task, and disable lookup candidate update strategy on MTL * style fix	2024-08-14 14:39:34 +08:00
Ruonan Wang	43cca3be27	fix gemma2 runtime error caused by sliding window (#11788 ) * fix runtime error * revert workflow	2024-08-14 10:43:33 +08:00
Yang Wang	51bcac1229	follow up on experimental support of fused decoder layer for llama2 (#11785 ) * clean up and support transpose value cache * refine * fix style * fix style	2024-08-13 18:53:55 -07:00
Yishuo Wang	cb79dcda93	refactor llama convert to fix minicpm-v 2.5 optimization (#11783 )	2024-08-14 09:29:57 +08:00
Yina Chen	7cd6ec9723	MiniCPM-V support compresskv (#11779 ) * fix check error * fix other models * remove print	2024-08-13 19:03:40 +08:00
Qiyuan Gong	3998de14f0	Fix mistral forward_qkv in q4_0 (#11781 ) * Fix mistral forward_qkv without self.rotary_emb.base in q4_0. * Replace apply_rotary_pos_emb_no_cache_xpu with rotary_half_inplaced. * Revert https://github.com/intel-analytics/ipex-llm/pull/11765	2024-08-13 16:48:19 +08:00
Heyang Sun	70c828b87c	deepspeed zero3 QLoRA finetuning (#11625 ) * deepspeed zero3 QLoRA finetuning * Update convert.py * Update low_bit_linear.py * Update utils.py * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update low_bit_linear.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update utils.py * Update convert.py * Update alpaca_qlora_finetuning.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update deepspeed_zero3.json * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update low_bit_linear.py * Update low_bit_linear.py * Update utils.py * fix style * fix style * Update alpaca_qlora_finetuning.py * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update convert.py * Update low_bit_linear.py * Update model.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update low_bit_linear.py * Update low_bit_linear.py	2024-08-13 16:15:29 +08:00
Yishuo Wang	a184b120c9	fix minicpm-v 2.5 (#11780 )	2024-08-13 16:14:00 +08:00
Yuwen Hu	ec184af243	Add `gemma-2-2b-it` and `gemma-2-9b-it` to igpu nightly performance test (#11778 ) * add yaml and modify `concat_csv.py` for `transformers` 4.43.1 (#11758) * add yaml and modify `concat_csv.py` for `transformers` 4.43.1 * remove 4.43 for arc; fix; * remove 4096-512 for 4.43 * comment some models * Small fix * uncomment models (#11777) --------- Co-authored-by: Ch1y0q <qiyue2001@gmail.com>	2024-08-13 15:39:56 +08:00
Qiyuan Gong	a88c132e54	Reduce Mistral softmax memory only in low memory mode (#11775 ) * Reduce Mistral softmax memory only in low memory mode	2024-08-13 14:50:54 +08:00
Yishuo Wang	aa861df066	use new fp32 softmax kernel (#11776 )	2024-08-13 14:48:11 +08:00
binbin Deng	23d3acdc77	Add experimental support of fused decoder layer for llama2 (#11768 )	2024-08-13 14:41:36 +08:00
Jin, Qiao	c28b3389e6	Update npu multimodal example (#11773 )	2024-08-13 14:14:59 +08:00
Yuwen Hu	81824ff8c9	Fix stdout in all-in-one benchmark to utf-8 (#11772 )	2024-08-13 10:51:08 +08:00
Yishuo Wang	a1eb793f70	optimize minicpm v 2_6 firs token perf (#11770 )	2024-08-13 09:51:18 +08:00
Yina Chen	841dbcdf3a	Fix compresskv with lookahead issue (#11767 ) * fix compresskv + lookahead attn_mask qwen2 * support llama chatglm * support mistral & chatglm * address comments * revert run.py	2024-08-12 18:53:55 +08:00
Yuwen Hu	f97a77ea4e	Update all-in-one benchmark for `continuation` task input preparation (#11760 ) * All use 8192.txt for prompt preparation for now * Small fix * Fix text encoding mode to utf-8 * Small update	2024-08-12 17:49:45 +08:00
Xu, Shuo	1b05caba2b	Set mistral fuse rope to false except fp6 & fp16 (#11765 ) * set mistral fuse rope to false except fp6 & fp16 * lint * lint --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-08-12 17:25:07 +08:00

1 2 3 4 5 ...

1712 commits