Heyang Sun
ee6852c915
Fix typo ( #11862 )
2024-08-20 16:38:11 +08:00
Yishuo Wang
2946420e14
add minicpmv 2.6 load_low_bit workaround ( #11856 )
2024-08-20 11:16:02 +08:00
SONG Ge
7380823f3f
Update Llama2 multi-processes example ( #11852 )
...
* update llama2 multi-processes examples
* update
* update readme
* update
2024-08-19 19:49:01 +08:00
Yang Wang
99b05ba1dc
separate prefill into a process ( #11787 )
...
* seperate prefill into a process
* using model.share_memory()
* might work
* worked
* use long prompt
* refactor
* cleanup
* fix bug
* clean up
* changable inter and intra process stages
* refactor
* add max output len
* fix npu_model changes that may cause generate down
* fix npu_model generate import error
* fix generare forward error
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2024-08-19 17:53:36 +08:00
Jinhe
da3d7a3a53
delete transformers version requirement ( #11845 )
...
* delete transformers version requirement
* delete transformers version requirement
2024-08-19 17:53:02 +08:00
Ruonan Wang
a0fbda5bc8
add MiniCPM-Llama3-V-2_5 into all-in-one benchmark ( #11849 )
2024-08-19 17:51:16 +08:00
Yishuo Wang
9490781aec
optimize phi3 memory usage again ( #11848 )
2024-08-19 17:26:59 +08:00
Yina Chen
3cd4e87168
Support compress KV with quantize KV ( #11812 )
...
* update llama
* support llama 4.41
* fix style
* support minicpm
* support qwen2
* support minicpm & update
* support chatglm4
* support chatglm
* remove print
* add DynamicCompressFp8Cache & support qwen
* support llama
* support minicpm phi3
* update chatglm2/4
* small fix & support qwen 4.42
* remove print
2024-08-19 15:32:32 +08:00
Zhao Changmin
6841a9ac8f
fix load low bit com dtype ( #11832 )
2024-08-19 13:43:19 +08:00
Yuwen Hu
cfc959defa
Fixes regarding utf-8 in all-in-one benchmark ( #11839 )
2024-08-19 10:38:00 +08:00
Chu,Youcheng
46a1cbfa64
feat: add mixed_precision argument on ppl longbench evaluation ( #11837 )
...
* feat: add mixed_precision argument on ppl longbench evaluation
* fix: delete two spaces
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-19 10:00:44 +08:00
Yuwen Hu
580c94d0e2
Remove gemma-2-9b-it 3k input from igpu-perf ( #11834 )
2024-08-17 13:10:05 +08:00
Jin, Qiao
9f17234f3b
Add MiniCPM-V-2_6 to iGPU Perf ( #11810 )
...
* Add MiniCPM-V-2_6 to iGPU Perf
* keep last model in yaml
* fix MINICPM_V_IDS
* Restore tested model list
* Small fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-08-16 18:41:21 +08:00
Yuwen Hu
96796f95cb
Update all-in-one benchmark prompts for continuation task & lookup update for minicpmv ( #11827 )
...
* Update all-in-one benchmark prompts for continuation task
* Small fix
* Add pure-text benchmark support for minicpm-v-2_6
* Support lookahead for model.llm generate of minicpmv
* Add prompt reference
* Small update
* Small fix
2024-08-16 17:16:35 +08:00
Yishuo Wang
e966e85df8
force lm_head optimization in any model if set environment variable ( #11830 )
2024-08-16 16:48:45 +08:00
RyuKosei
3b630fb9df
updated ppl README ( #11807 )
...
* edit README.md
* update the branch
* edited README.md
* updated
* updated description
---------
Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-08-16 15:49:25 +08:00
Jinhe
e07a55665c
Codegeex2 tokenization fix ( #11831 )
...
* updated tokenizer file
* updated tokenizer file
* updated tokenizer file
* updated tokenizer file
* new folder
2024-08-16 15:48:47 +08:00
Jinhe
a508b0a902
added link to minicpm-v-2_6 example ( #11829 )
2024-08-16 14:49:23 +08:00
Jinhe
adfbb9124a
Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples ( #11815 )
...
* model to fp16 & 2_6 reorganize
* revisions
* revisions
* half
* deleted transformer version requirements
* deleted transformer version requirements
---------
Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
2024-08-16 14:48:56 +08:00
Chu,Youcheng
f463268e36
fix: add run oneAPI instruction for the example of codeshell ( #11828 )
...
* fix: delete ipex extension import in ppl wikitext evaluation
* feat: add mixed_precision argument on ppl wikitext evaluation
* fix: delete mix_precision command in perplex evaluation for wikitext
* fix: remove fp16 mixed-presicion argument
* fix: Add a space.
* fix: add run oneAPI instruction for the example of codeshell
* fix: textual adjustments
* fix: Textual adjustment
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-16 14:29:06 +08:00
Yishuo Wang
17a0beb21f
optimize qwen2-audio again ( #11825 )
2024-08-16 11:11:35 +08:00
Jason Dai
6a8d07ddb4
Update README.md ( #11824 )
2024-08-16 10:22:02 +08:00
Yuwen Hu
9e9086cc2a
Update IPEX_LLM_PERFORMANCE_MODE ( #11823 )
2024-08-16 09:48:36 +08:00
Wang, Jian4
5a80fd2633
Fix lightweight-serving no streaming resp on mtl ( #11822 )
2024-08-16 09:43:03 +08:00
Guancheng Fu
e70ae0638e
Fix vLLM not convert issues ( #11817 )
...
* Fix not convert issues
* refine
2024-08-15 19:04:05 +08:00
Yishuo Wang
750d4ad5dc
fix minicpm-v-2 fp16 ( #11819 )
2024-08-15 18:34:40 +08:00
Yuwen Hu
6543321f04
Remove 4k igpu perf on gemma-2-9b-it ( #11820 )
2024-08-15 18:06:19 +08:00
Chu,Youcheng
28d1c972da
add mixed_precision argument on ppl wikitext evaluation ( #11813 )
...
* fix: delete ipex extension import in ppl wikitext evaluation
* feat: add mixed_precision argument on ppl wikitext evaluation
* fix: delete mix_precision command in perplex evaluation for wikitext
* fix: remove fp16 mixed-presicion argument
* fix: Add a space.
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-15 17:58:53 +08:00
Yishuo Wang
828ab16537
fix phi3 and minicpmv cpu ( #11818 )
2024-08-15 17:43:29 +08:00
Yishuo Wang
4e178f0c5d
rewrite minicpmv optimization ( #11816 )
2024-08-15 17:27:12 +08:00
Ch1y0q
447c8ed324
update transformers version for replit-code-v1-3b, `internlm2-chat-… ( #11811 )
...
* update transformers version for `replit-code-v1-3b`, `internlm2-chat-7b` and mistral
* remove for default transformers version
2024-08-15 16:40:48 +08:00
Jinhe
2fbbb51e71
transformers==4.37, yi & yuan2 & vicuna ( #11805 )
...
* transformers==4.37
* added yi model
* added yi model
* xxxx
* delete prompt template
* / and delete
2024-08-15 15:39:24 +08:00
Jinhe
f43da2d455
deletion of specification of transformers version ( #11808 )
2024-08-15 15:23:32 +08:00
Yishuo Wang
07b7f13982
support and optimize qwen2-audio ( #11809 )
2024-08-15 14:59:04 +08:00
Chu,Youcheng
3ac83f8396
fix: delete ipex extension import in ppl wikitext evaluation ( #11806 )
...
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-15 13:40:01 +08:00
Yuwen Hu
016e840eed
Fix performance tests ( #11802 )
...
* Fix performance tests
* Small fix
2024-08-15 01:37:01 +08:00
Shaojun Liu
e3c1dae619
Fix Windows Unit Test ( #11801 )
...
* Update llm_unit_tests.yml
* remove debug information
* Delete .github/actions/llm/cli-test-windows directory
2024-08-14 19:16:48 +08:00
Yishuo Wang
9a93808fc5
fix and optimize minicpm v 2 ( #11799 )
2024-08-14 17:27:23 +08:00
Jinhe
d8d887edd2
added minicpm-v-2_6 ( #11794 )
2024-08-14 16:23:44 +08:00
Yishuo Wang
3d6cfa291d
optimize minicpm v 2.5 ( #11793 )
2024-08-14 16:07:24 +08:00
Yuwen Hu
356281cb80
Further all-in-one benchmark update continuation task ( #11784 )
...
* Further update prompt for continuation task, and disable lookup candidate update strategy on MTL
* style fix
2024-08-14 14:39:34 +08:00
Ruonan Wang
43cca3be27
fix gemma2 runtime error caused by sliding window ( #11788 )
...
* fix runtime error
* revert workflow
2024-08-14 10:43:33 +08:00
Jinhe
dbd14251dd
Troubleshoot for sycl not found ( #11774 )
...
* added troubleshoot for sycl not found problem
* added troubleshoot for sycl not found problem
* revision on troubleshoot
* revision on troubleshoot
2024-08-14 10:26:01 +08:00
Yang Wang
51bcac1229
follow up on experimental support of fused decoder layer for llama2 ( #11785 )
...
* clean up and support transpose value cache
* refine
* fix style
* fix style
2024-08-13 18:53:55 -07:00
Yishuo Wang
cb79dcda93
refactor llama convert to fix minicpm-v 2.5 optimization ( #11783 )
2024-08-14 09:29:57 +08:00
Yina Chen
7cd6ec9723
MiniCPM-V support compresskv ( #11779 )
...
* fix check error
* fix other models
* remove print
2024-08-13 19:03:40 +08:00
Qiyuan Gong
3998de14f0
Fix mistral forward_qkv in q4_0 ( #11781 )
...
* Fix mistral forward_qkv without self.rotary_emb.base in q4_0.
* Replace apply_rotary_pos_emb_no_cache_xpu with rotary_half_inplaced.
* Revert https://github.com/intel-analytics/ipex-llm/pull/11765
2024-08-13 16:48:19 +08:00
Heyang Sun
70c828b87c
deepspeed zero3 QLoRA finetuning ( #11625 )
...
* deepspeed zero3 QLoRA finetuning
* Update convert.py
* Update low_bit_linear.py
* Update utils.py
* Update qlora_finetune_llama2_13b_arch_2_card.sh
* Update low_bit_linear.py
* Update alpaca_qlora_finetuning.py
* Update low_bit_linear.py
* Update utils.py
* Update convert.py
* Update alpaca_qlora_finetuning.py
* Update alpaca_qlora_finetuning.py
* Update low_bit_linear.py
* Update deepspeed_zero3.json
* Update qlora_finetune_llama2_13b_arch_2_card.sh
* Update low_bit_linear.py
* Update low_bit_linear.py
* Update utils.py
* fix style
* fix style
* Update alpaca_qlora_finetuning.py
* Update qlora_finetune_llama2_13b_arch_2_card.sh
* Update convert.py
* Update low_bit_linear.py
* Update model.py
* Update alpaca_qlora_finetuning.py
* Update low_bit_linear.py
* Update low_bit_linear.py
* Update low_bit_linear.py
2024-08-13 16:15:29 +08:00
Yishuo Wang
a184b120c9
fix minicpm-v 2.5 ( #11780 )
2024-08-13 16:14:00 +08:00
Yuwen Hu
ec184af243
Add gemma-2-2b-it and gemma-2-9b-it to igpu nightly performance test ( #11778 )
...
* add yaml and modify `concat_csv.py` for `transformers` 4.43.1 (#11758 )
* add yaml and modify `concat_csv.py` for `transformers` 4.43.1
* remove 4.43 for arc; fix;
* remove 4096-512 for 4.43
* comment some models
* Small fix
* uncomment models (#11777 )
---------
Co-authored-by: Ch1y0q <qiyue2001@gmail.com>
2024-08-13 15:39:56 +08:00