ipex-llm

Author	SHA1	Message	Date
Ch1y0q	447c8ed324	update transformers version for `replit-code-v1-3b`, `internlm2-chat-… (#11811 ) * update transformers version for `replit-code-v1-3b`, `internlm2-chat-7b` and mistral * remove for default transformers version	2024-08-15 16:40:48 +08:00
Jinhe	2fbbb51e71	transformers==4.37, yi & yuan2 & vicuna (#11805 ) * transformers==4.37 * added yi model * added yi model * xxxx * delete prompt template * / and delete	2024-08-15 15:39:24 +08:00
Jinhe	f43da2d455	deletion of specification of transformers version (#11808 )	2024-08-15 15:23:32 +08:00
Yishuo Wang	07b7f13982	support and optimize qwen2-audio (#11809 )	2024-08-15 14:59:04 +08:00
Chu,Youcheng	3ac83f8396	fix: delete ipex extension import in ppl wikitext evaluation (#11806 ) Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>	2024-08-15 13:40:01 +08:00
Yuwen Hu	016e840eed	Fix performance tests (#11802 ) * Fix performance tests * Small fix	2024-08-15 01:37:01 +08:00
Shaojun Liu	e3c1dae619	Fix Windows Unit Test (#11801 ) * Update llm_unit_tests.yml * remove debug information * Delete .github/actions/llm/cli-test-windows directory	2024-08-14 19:16:48 +08:00
Yishuo Wang	9a93808fc5	fix and optimize minicpm v 2 (#11799 )	2024-08-14 17:27:23 +08:00
Jinhe	d8d887edd2	added minicpm-v-2_6 (#11794 )	2024-08-14 16:23:44 +08:00
Yishuo Wang	3d6cfa291d	optimize minicpm v 2.5 (#11793 )	2024-08-14 16:07:24 +08:00
Yuwen Hu	356281cb80	Further all-in-one benchmark update `continuation` task (#11784 ) * Further update prompt for continuation task, and disable lookup candidate update strategy on MTL * style fix	2024-08-14 14:39:34 +08:00
Ruonan Wang	43cca3be27	fix gemma2 runtime error caused by sliding window (#11788 ) * fix runtime error * revert workflow	2024-08-14 10:43:33 +08:00
Jinhe	dbd14251dd	Troubleshoot for sycl not found (#11774 ) * added troubleshoot for sycl not found problem * added troubleshoot for sycl not found problem * revision on troubleshoot * revision on troubleshoot	2024-08-14 10:26:01 +08:00
Yang Wang	51bcac1229	follow up on experimental support of fused decoder layer for llama2 (#11785 ) * clean up and support transpose value cache * refine * fix style * fix style	2024-08-13 18:53:55 -07:00
Yishuo Wang	cb79dcda93	refactor llama convert to fix minicpm-v 2.5 optimization (#11783 )	2024-08-14 09:29:57 +08:00
Yina Chen	7cd6ec9723	MiniCPM-V support compresskv (#11779 ) * fix check error * fix other models * remove print	2024-08-13 19:03:40 +08:00
Qiyuan Gong	3998de14f0	Fix mistral forward_qkv in q4_0 (#11781 ) * Fix mistral forward_qkv without self.rotary_emb.base in q4_0. * Replace apply_rotary_pos_emb_no_cache_xpu with rotary_half_inplaced. * Revert https://github.com/intel-analytics/ipex-llm/pull/11765	2024-08-13 16:48:19 +08:00
Heyang Sun	70c828b87c	deepspeed zero3 QLoRA finetuning (#11625 ) * deepspeed zero3 QLoRA finetuning * Update convert.py * Update low_bit_linear.py * Update utils.py * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update low_bit_linear.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update utils.py * Update convert.py * Update alpaca_qlora_finetuning.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update deepspeed_zero3.json * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update low_bit_linear.py * Update low_bit_linear.py * Update utils.py * fix style * fix style * Update alpaca_qlora_finetuning.py * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update convert.py * Update low_bit_linear.py * Update model.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update low_bit_linear.py * Update low_bit_linear.py	2024-08-13 16:15:29 +08:00
Yishuo Wang	a184b120c9	fix minicpm-v 2.5 (#11780 )	2024-08-13 16:14:00 +08:00
Yuwen Hu	ec184af243	Add `gemma-2-2b-it` and `gemma-2-9b-it` to igpu nightly performance test (#11778 ) * add yaml and modify `concat_csv.py` for `transformers` 4.43.1 (#11758) * add yaml and modify `concat_csv.py` for `transformers` 4.43.1 * remove 4.43 for arc; fix; * remove 4096-512 for 4.43 * comment some models * Small fix * uncomment models (#11777) --------- Co-authored-by: Ch1y0q <qiyue2001@gmail.com>	2024-08-13 15:39:56 +08:00
Qiyuan Gong	a88c132e54	Reduce Mistral softmax memory only in low memory mode (#11775 ) * Reduce Mistral softmax memory only in low memory mode	2024-08-13 14:50:54 +08:00
Yishuo Wang	aa861df066	use new fp32 softmax kernel (#11776 )	2024-08-13 14:48:11 +08:00
binbin Deng	23d3acdc77	Add experimental support of fused decoder layer for llama2 (#11768 )	2024-08-13 14:41:36 +08:00
Jin, Qiao	c28b3389e6	Update npu multimodal example (#11773 )	2024-08-13 14:14:59 +08:00
Yuwen Hu	81824ff8c9	Fix stdout in all-in-one benchmark to utf-8 (#11772 )	2024-08-13 10:51:08 +08:00
Yishuo Wang	a1eb793f70	optimize minicpm v 2_6 firs token perf (#11770 )	2024-08-13 09:51:18 +08:00
Yina Chen	841dbcdf3a	Fix compresskv with lookahead issue (#11767 ) * fix compresskv + lookahead attn_mask qwen2 * support llama chatglm * support mistral & chatglm * address comments * revert run.py	2024-08-12 18:53:55 +08:00
Yuwen Hu	f97a77ea4e	Update all-in-one benchmark for `continuation` task input preparation (#11760 ) * All use 8192.txt for prompt preparation for now * Small fix * Fix text encoding mode to utf-8 * Small update	2024-08-12 17:49:45 +08:00
Xu, Shuo	1b05caba2b	Set mistral fuse rope to false except fp6 & fp16 (#11765 ) * set mistral fuse rope to false except fp6 & fp16 * lint * lint --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-08-12 17:25:07 +08:00
Ruonan Wang	8db34057b4	optimize lookahead init time (#11769 )	2024-08-12 17:19:12 +08:00
Jin, Qiao	05989ad0f9	Update npu example and all in one benckmark (#11766 )	2024-08-12 16:46:46 +08:00
Yishuo Wang	57d177738d	optimize minicpm-v-2_6 repetition penalty (#11763 )	2024-08-12 14:10:10 +08:00
Shaojun Liu	fac4c01a6e	Revert to use out-of-tree GPU driver (#11761 ) * Revert to use out-of-tree GPU driver since the performance with out-of-tree driver is better than upsteam's * add spaces * add troubleshooting case * update Troubleshooting	2024-08-12 13:41:47 +08:00
Wang, Jian4	245dba0abc	Fix lightweight-serving codegeex error (#11759 )	2024-08-12 10:35:37 +08:00
Ruonan Wang	66fe2ee464	initial support of `IPEX_LLM_PERFORMANCE_MODE` (#11754 ) * add perf mode * update * fix style	2024-08-09 19:04:09 +08:00
Yina Chen	4b9c57cc60	Support compress kv with lookahead (#11752 ) * support compress kv with lookahead * enough kv miss param	2024-08-09 17:39:57 +08:00
Yishuo Wang	93455aac09	fix minicpm V 2.6 repeat output (#11753 )	2024-08-09 17:39:24 +08:00
Ruonan Wang	7e917d6cfb	fix gptq of llama (#11749 ) * fix gptq of llama * small fix	2024-08-09 16:39:25 +08:00
Yina Chen	dd46c141bd	Phi3 support compresskv (#11733 ) * phi3 support compresskv * fix phi3 mtl error * fix conflict with quant kv * fix abnormal on mtl * fix style * use slide windows size to compress kv * support sliding window * fix style * fix style * temp: partial support quant kv * support quant kv with compress kv, todo: model check * temp * fix style * fix style * remove prepare * address comment * default -> 1.8k	2024-08-09 15:43:43 +08:00
Qiyuan Gong	d8808cc2e3	Mistral apply_rotary_pos_emb_no_cache_xpu use rope_theta from config (#11747 ) mistral-7B-instruct-v0.2 and mistral-7B-instruct-v0.1 use different rope_theta (0.2 is 1e, 0.1 is 1e5). Pass self.config.rope_theta to apply_rotary_pos_emb_no_cache_xpu to avoid output difference.	2024-08-09 10:35:51 +08:00
Xiangyu Tian	044e486480	Fix vLLM CPU /chat endpoint (#11748 )	2024-08-09 10:33:52 +08:00
Jinhe	27b4b104ed	Add `qwen2-1.5b-instruct` into igpu performance (#11735 ) * updated qwen1.5B to all transformer==4.37 yaml * updated qwen1.5B to all transformer==4.37 yaml	2024-08-08 16:42:18 +08:00
Shaojun Liu	107f7aafd0	enable inference mode for deepspeed tp serving (#11742 )	2024-08-08 14:38:30 +08:00
Zijie Li	9e65cf00b3	Add openai-whisper pytorch gpu (#11736 ) * Add openai-whisper pytorch gpu * Update README.md * Update README.md * fix typo * fix names update readme * Update README.md	2024-08-08 12:32:59 +08:00
Yuwen Hu	7e61fa1af7	Revise GPU driver related guide in for Windows users (#11740 )	2024-08-08 11:26:26 +08:00
Jinhe	d0c89fb715	updated llama.cpp and ollama quickstart (#11732 ) * updated llama.cpp and ollama quickstart.md * added qwen2-1.5B sample output * revision on quickstart updates * revision on quickstart updates * revision on qwen2 readme * added 2 troubleshoots“ ” * troubleshoot revision	2024-08-08 11:04:01 +08:00
Yishuo Wang	54cc9353db	support and optimize minicpm-v-2_6 (#11738 )	2024-08-07 18:21:16 +08:00
Yina Chen	e956e71fc1	fix conflict with quant kv (#11737 )	2024-08-07 18:10:30 +08:00
Ruonan Wang	00a5574c8a	Use `merge_qkv` to replace `fused_qkv` for llama2 (#11727 ) * update 4.38 * support new versions * update * fix style * fix style * update rope * temp test sdpa * fix style * fix cpu ut	2024-08-07 18:04:01 +08:00
Yina Chen	d2abc9711b	Fix MTL 4k input qwen2 compresskv error (#11734 ) * fix * fix style	2024-08-07 16:21:57 +08:00

1 2 3 4 5 ...

3278 commits