Commit graph

3295 commits

Author SHA1 Message Date
Yuwen Hu
96796f95cb
Update all-in-one benchmark prompts for continuation task & lookup update for minicpmv (#11827)
* Update all-in-one benchmark prompts for continuation task

* Small fix

* Add pure-text benchmark support for minicpm-v-2_6

* Support lookahead for model.llm generate of minicpmv

* Add prompt reference

* Small update

* Small fix
2024-08-16 17:16:35 +08:00
Yishuo Wang
e966e85df8
force lm_head optimization in any model if set environment variable (#11830) 2024-08-16 16:48:45 +08:00
RyuKosei
3b630fb9df
updated ppl README (#11807)
* edit README.md

* update the branch

* edited README.md

* updated

* updated description

---------

Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-08-16 15:49:25 +08:00
Jinhe
e07a55665c
Codegeex2 tokenization fix (#11831)
* updated tokenizer file

* updated tokenizer file

* updated tokenizer file

* updated tokenizer file

* new folder
2024-08-16 15:48:47 +08:00
Jinhe
a508b0a902
added link to minicpm-v-2_6 example (#11829) 2024-08-16 14:49:23 +08:00
Jinhe
adfbb9124a
Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples (#11815)
* model to fp16 & 2_6 reorganize

* revisions

* revisions

* half

* deleted transformer version requirements

* deleted transformer version requirements

---------

Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
2024-08-16 14:48:56 +08:00
Chu,Youcheng
f463268e36
fix: add run oneAPI instruction for the example of codeshell (#11828)
* fix: delete ipex extension import in ppl wikitext evaluation

* feat: add mixed_precision argument on ppl wikitext evaluation

* fix: delete mix_precision command in perplex evaluation for wikitext

* fix: remove fp16 mixed-presicion argument

* fix: Add a space.

* fix: add run oneAPI instruction for the example of codeshell

* fix: textual adjustments

* fix: Textual adjustment

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-16 14:29:06 +08:00
Yishuo Wang
17a0beb21f
optimize qwen2-audio again (#11825) 2024-08-16 11:11:35 +08:00
Jason Dai
6a8d07ddb4
Update README.md (#11824) 2024-08-16 10:22:02 +08:00
Yuwen Hu
9e9086cc2a
Update IPEX_LLM_PERFORMANCE_MODE (#11823) 2024-08-16 09:48:36 +08:00
Wang, Jian4
5a80fd2633
Fix lightweight-serving no streaming resp on mtl (#11822) 2024-08-16 09:43:03 +08:00
Guancheng Fu
e70ae0638e
Fix vLLM not convert issues (#11817)
* Fix not convert issues

* refine
2024-08-15 19:04:05 +08:00
Yishuo Wang
750d4ad5dc
fix minicpm-v-2 fp16 (#11819) 2024-08-15 18:34:40 +08:00
Yuwen Hu
6543321f04
Remove 4k igpu perf on gemma-2-9b-it (#11820) 2024-08-15 18:06:19 +08:00
Chu,Youcheng
28d1c972da
add mixed_precision argument on ppl wikitext evaluation (#11813)
* fix: delete ipex extension import in ppl wikitext evaluation

* feat: add mixed_precision argument on ppl wikitext evaluation

* fix: delete mix_precision command in perplex evaluation for wikitext

* fix: remove fp16 mixed-presicion argument

* fix: Add a space.

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-15 17:58:53 +08:00
Yishuo Wang
828ab16537
fix phi3 and minicpmv cpu (#11818) 2024-08-15 17:43:29 +08:00
Yishuo Wang
4e178f0c5d
rewrite minicpmv optimization (#11816) 2024-08-15 17:27:12 +08:00
Ch1y0q
447c8ed324
update transformers version for replit-code-v1-3b, `internlm2-chat-… (#11811)
* update transformers version for `replit-code-v1-3b`, `internlm2-chat-7b` and mistral

* remove for default transformers version
2024-08-15 16:40:48 +08:00
Jinhe
2fbbb51e71
transformers==4.37, yi & yuan2 & vicuna (#11805)
* transformers==4.37

* added yi model

* added yi model

* xxxx

* delete prompt template

* / and delete
2024-08-15 15:39:24 +08:00
Jinhe
f43da2d455
deletion of specification of transformers version (#11808) 2024-08-15 15:23:32 +08:00
Yishuo Wang
07b7f13982
support and optimize qwen2-audio (#11809) 2024-08-15 14:59:04 +08:00
Chu,Youcheng
3ac83f8396
fix: delete ipex extension import in ppl wikitext evaluation (#11806)
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-15 13:40:01 +08:00
Yuwen Hu
016e840eed
Fix performance tests (#11802)
* Fix performance tests

* Small fix
2024-08-15 01:37:01 +08:00
Shaojun Liu
e3c1dae619
Fix Windows Unit Test (#11801)
* Update llm_unit_tests.yml

* remove debug information

* Delete .github/actions/llm/cli-test-windows directory
2024-08-14 19:16:48 +08:00
Yishuo Wang
9a93808fc5
fix and optimize minicpm v 2 (#11799) 2024-08-14 17:27:23 +08:00
Jinhe
d8d887edd2
added minicpm-v-2_6 (#11794) 2024-08-14 16:23:44 +08:00
Yishuo Wang
3d6cfa291d
optimize minicpm v 2.5 (#11793) 2024-08-14 16:07:24 +08:00
Yuwen Hu
356281cb80
Further all-in-one benchmark update continuation task (#11784)
* Further update prompt for continuation task, and disable lookup candidate update strategy on MTL

* style fix
2024-08-14 14:39:34 +08:00
Ruonan Wang
43cca3be27
fix gemma2 runtime error caused by sliding window (#11788)
* fix runtime error

* revert workflow
2024-08-14 10:43:33 +08:00
Jinhe
dbd14251dd
Troubleshoot for sycl not found (#11774)
* added troubleshoot for sycl not found problem

* added troubleshoot for sycl not found problem

* revision on troubleshoot

* revision on troubleshoot
2024-08-14 10:26:01 +08:00
Yang Wang
51bcac1229
follow up on experimental support of fused decoder layer for llama2 (#11785)
* clean up and support transpose value cache

* refine

* fix style

* fix style
2024-08-13 18:53:55 -07:00
Yishuo Wang
cb79dcda93
refactor llama convert to fix minicpm-v 2.5 optimization (#11783) 2024-08-14 09:29:57 +08:00
Yina Chen
7cd6ec9723
MiniCPM-V support compresskv (#11779)
* fix check error

* fix other models

* remove print
2024-08-13 19:03:40 +08:00
Qiyuan Gong
3998de14f0
Fix mistral forward_qkv in q4_0 (#11781)
* Fix mistral forward_qkv without self.rotary_emb.base in q4_0.
* Replace apply_rotary_pos_emb_no_cache_xpu with rotary_half_inplaced.
* Revert https://github.com/intel-analytics/ipex-llm/pull/11765
2024-08-13 16:48:19 +08:00
Heyang Sun
70c828b87c
deepspeed zero3 QLoRA finetuning (#11625)
* deepspeed zero3 QLoRA finetuning

* Update convert.py

* Update low_bit_linear.py

* Update utils.py

* Update qlora_finetune_llama2_13b_arch_2_card.sh

* Update low_bit_linear.py

* Update alpaca_qlora_finetuning.py

* Update low_bit_linear.py

* Update utils.py

* Update convert.py

* Update alpaca_qlora_finetuning.py

* Update alpaca_qlora_finetuning.py

* Update low_bit_linear.py

* Update deepspeed_zero3.json

* Update qlora_finetune_llama2_13b_arch_2_card.sh

* Update low_bit_linear.py

* Update low_bit_linear.py

* Update utils.py

* fix style

* fix style

* Update alpaca_qlora_finetuning.py

* Update qlora_finetune_llama2_13b_arch_2_card.sh

* Update convert.py

* Update low_bit_linear.py

* Update model.py

* Update alpaca_qlora_finetuning.py

* Update low_bit_linear.py

* Update low_bit_linear.py

* Update low_bit_linear.py
2024-08-13 16:15:29 +08:00
Yishuo Wang
a184b120c9
fix minicpm-v 2.5 (#11780) 2024-08-13 16:14:00 +08:00
Yuwen Hu
ec184af243
Add gemma-2-2b-it and gemma-2-9b-it to igpu nightly performance test (#11778)
* add yaml and modify `concat_csv.py` for `transformers` 4.43.1 (#11758)

* add yaml and modify `concat_csv.py` for `transformers` 4.43.1

* remove 4.43 for arc; fix;

* remove 4096-512 for 4.43

* comment some models

* Small fix

* uncomment models (#11777)

---------

Co-authored-by: Ch1y0q <qiyue2001@gmail.com>
2024-08-13 15:39:56 +08:00
Qiyuan Gong
a88c132e54
Reduce Mistral softmax memory only in low memory mode (#11775)
* Reduce Mistral softmax memory only in low memory mode
2024-08-13 14:50:54 +08:00
Yishuo Wang
aa861df066
use new fp32 softmax kernel (#11776) 2024-08-13 14:48:11 +08:00
binbin Deng
23d3acdc77
Add experimental support of fused decoder layer for llama2 (#11768) 2024-08-13 14:41:36 +08:00
Jin, Qiao
c28b3389e6
Update npu multimodal example (#11773) 2024-08-13 14:14:59 +08:00
Yuwen Hu
81824ff8c9
Fix stdout in all-in-one benchmark to utf-8 (#11772) 2024-08-13 10:51:08 +08:00
Yishuo Wang
a1eb793f70
optimize minicpm v 2_6 firs token perf (#11770) 2024-08-13 09:51:18 +08:00
Yina Chen
841dbcdf3a
Fix compresskv with lookahead issue (#11767)
* fix compresskv + lookahead attn_mask qwen2

* support llama chatglm

* support mistral & chatglm

* address comments

* revert run.py
2024-08-12 18:53:55 +08:00
Yuwen Hu
f97a77ea4e
Update all-in-one benchmark for continuation task input preparation (#11760)
* All use 8192.txt for prompt preparation for now

* Small fix

* Fix text encoding mode to utf-8

* Small update
2024-08-12 17:49:45 +08:00
Xu, Shuo
1b05caba2b
Set mistral fuse rope to false except fp6 & fp16 (#11765)
* set mistral fuse rope to false except fp6 & fp16

* lint

* lint

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-08-12 17:25:07 +08:00
Ruonan Wang
8db34057b4
optimize lookahead init time (#11769) 2024-08-12 17:19:12 +08:00
Jin, Qiao
05989ad0f9
Update npu example and all in one benckmark (#11766) 2024-08-12 16:46:46 +08:00
Yishuo Wang
57d177738d
optimize minicpm-v-2_6 repetition penalty (#11763) 2024-08-12 14:10:10 +08:00
Shaojun Liu
fac4c01a6e
Revert to use out-of-tree GPU driver (#11761)
* Revert to use out-of-tree GPU driver since the performance with out-of-tree driver is better than upsteam's

* add spaces

* add troubleshooting case

* update Troubleshooting
2024-08-12 13:41:47 +08:00