-
5b83493b1a
Add ipex-llm npu option in setup.py (#11858)
SONG Ge
2024-08-20 17:29:49 +0800
-
ee6852c915
Fix typo (#11862)
Heyang Sun
2024-08-20 16:38:11 +0800
-
-
2946420e14
add minicpmv 2.6 load_low_bit workaround (#11856)
Yishuo Wang
2024-08-20 11:16:02 +0800
-
7380823f3f
Update Llama2 multi-processes example (#11852)
SONG Ge
2024-08-19 19:49:01 +0800
-
99b05ba1dc
separate prefill into a process (#11787)
Yang Wang
2024-08-19 02:53:36 -0700
-
da3d7a3a53
delete transformers version requirement (#11845)
Jinhe
2024-08-19 17:53:02 +0800
-
a0fbda5bc8
add MiniCPM-Llama3-V-2_5 into all-in-one benchmark (#11849)
Ruonan Wang
2024-08-19 02:51:16 -0700
-
9490781aec
optimize phi3 memory usage again (#11848)
Yishuo Wang
2024-08-19 17:26:59 +0800
-
3cd4e87168
Support compress KV with quantize KV (#11812)
Yina Chen
2024-08-19 10:32:32 +0300
-
6841a9ac8f
fix load low bit com dtype (#11832)
Zhao Changmin
2024-08-19 13:43:19 +0800
-
cfc959defa
Fixes regarding utf-8 in all-in-one benchmark (#11839)
Yuwen Hu
2024-08-19 10:38:00 +0800
-
46a1cbfa64
feat: add mixed_precision argument on ppl longbench evaluation (#11837)
Chu,Youcheng
2024-08-19 10:00:44 +0800
-
580c94d0e2
Remove gemma-2-9b-it 3k input from igpu-perf (#11834)
Yuwen Hu
2024-08-17 13:10:05 +0800
-
9f17234f3b
Add MiniCPM-V-2_6 to iGPU Perf (#11810)
Jin, Qiao
2024-08-16 18:41:21 +0800
-
96796f95cb
Update all-in-one benchmark prompts for
continuation task & lookup update for minicpmv (#11827)
Yuwen Hu
2024-08-16 17:16:35 +0800
-
e966e85df8
force lm_head optimization in any model if set environment variable (#11830)
Yishuo Wang
2024-08-16 16:48:45 +0800
-
3b630fb9df
updated ppl README (#11807)
RyuKosei
2024-08-16 15:49:25 +0800
-
e07a55665c
Codegeex2 tokenization fix (#11831)
Jinhe
2024-08-16 15:48:47 +0800
-
a508b0a902
added link to minicpm-v-2_6 example (#11829)
Jinhe
2024-08-16 14:49:23 +0800
-
adfbb9124a
Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples (#11815)
Jinhe
2024-08-16 14:48:56 +0800
-
f463268e36
fix: add run oneAPI instruction for the example of codeshell (#11828)
Chu,Youcheng
2024-08-16 14:29:06 +0800
-
17a0beb21f
optimize qwen2-audio again (#11825)
Yishuo Wang
2024-08-16 11:11:35 +0800
-
6a8d07ddb4
Update README.md (#11824)
Jason Dai
2024-08-16 10:22:02 +0800
-
9e9086cc2a
Update
IPEX_LLM_PERFORMANCE_MODE (#11823)
Yuwen Hu
2024-08-16 09:48:36 +0800
-
5a80fd2633
Fix lightweight-serving no streaming resp on mtl (#11822)
Wang, Jian4
2024-08-16 09:43:03 +0800
-
e70ae0638e
Fix vLLM not convert issues (#11817)
Guancheng Fu
2024-08-15 19:04:05 +0800
-
750d4ad5dc
fix minicpm-v-2 fp16 (#11819)
Yishuo Wang
2024-08-15 18:34:40 +0800
-
6543321f04
Remove 4k igpu perf on gemma-2-9b-it (#11820)
Yuwen Hu
2024-08-15 18:06:19 +0800
-
28d1c972da
add mixed_precision argument on ppl wikitext evaluation (#11813)
Chu,Youcheng
2024-08-15 17:58:53 +0800
-
828ab16537
fix phi3 and minicpmv cpu (#11818)
Yishuo Wang
2024-08-15 17:43:29 +0800
-
4e178f0c5d
rewrite minicpmv optimization (#11816)
Yishuo Wang
2024-08-15 17:27:12 +0800
-
447c8ed324
update transformers version for
replit-code-v1-3b, `internlm2-chat-… (#11811)
Ch1y0q
2024-08-15 16:40:48 +0800
-
2fbbb51e71
transformers==4.37, yi & yuan2 & vicuna (#11805)
Jinhe
2024-08-15 15:39:24 +0800
-
f43da2d455
deletion of specification of transformers version (#11808)
Jinhe
2024-08-15 15:23:32 +0800
-
07b7f13982
support and optimize qwen2-audio (#11809)
Yishuo Wang
2024-08-15 14:59:04 +0800
-
3ac83f8396
fix: delete ipex extension import in ppl wikitext evaluation (#11806)
Chu,Youcheng
2024-08-15 13:40:01 +0800
-
016e840eed
Fix performance tests (#11802)
Yuwen Hu
2024-08-15 01:37:01 +0800
-
e3c1dae619
Fix Windows Unit Test (#11801)
Shaojun Liu
2024-08-14 19:16:48 +0800
-
9a93808fc5
fix and optimize minicpm v 2 (#11799)
Yishuo Wang
2024-08-14 17:27:23 +0800
-
d8d887edd2
added minicpm-v-2_6 (#11794)
Jinhe
2024-08-14 16:23:44 +0800
-
3d6cfa291d
optimize minicpm v 2.5 (#11793)
Yishuo Wang
2024-08-14 16:07:24 +0800
-
356281cb80
Further all-in-one benchmark update
continuation task (#11784)
Yuwen Hu
2024-08-14 14:39:34 +0800
-
43cca3be27
fix gemma2 runtime error caused by sliding window (#11788)
Ruonan Wang
2024-08-14 05:43:33 +0300
-
dbd14251dd
Troubleshoot for sycl not found (#11774)
Jinhe
2024-08-14 10:26:01 +0800
-
51bcac1229
follow up on experimental support of fused decoder layer for llama2 (#11785)
Yang Wang
2024-08-13 18:53:55 -0700
-
cb79dcda93
refactor llama convert to fix minicpm-v 2.5 optimization (#11783)
Yishuo Wang
2024-08-14 09:29:57 +0800
-
7cd6ec9723
MiniCPM-V support compresskv (#11779)
Yina Chen
2024-08-13 14:03:40 +0300
-
3998de14f0
Fix mistral forward_qkv in q4_0 (#11781)
Qiyuan Gong
2024-08-13 16:48:19 +0800
-
70c828b87c
deepspeed zero3 QLoRA finetuning (#11625)
Heyang Sun
2024-08-13 16:15:29 +0800
-
a184b120c9
fix minicpm-v 2.5 (#11780)
Yishuo Wang
2024-08-13 16:14:00 +0800
-
ec184af243
Add
gemma-2-2b-it and gemma-2-9b-it to igpu nightly performance test (#11778)
Yuwen Hu
2024-08-13 15:39:56 +0800
-
a88c132e54
Reduce Mistral softmax memory only in low memory mode (#11775)
Qiyuan Gong
2024-08-13 14:50:54 +0800
-
aa861df066
use new fp32 softmax kernel (#11776)
Yishuo Wang
2024-08-13 14:48:11 +0800
-
23d3acdc77
Add experimental support of fused decoder layer for llama2 (#11768)
binbin Deng
2024-08-13 14:41:36 +0800
-
c28b3389e6
Update npu multimodal example (#11773)
Jin, Qiao
2024-08-13 14:14:59 +0800
-
81824ff8c9
Fix stdout in all-in-one benchmark to utf-8 (#11772)
Yuwen Hu
2024-08-13 10:51:08 +0800
-
a1eb793f70
optimize minicpm v 2_6 firs token perf (#11770)
Yishuo Wang
2024-08-13 09:51:18 +0800
-
841dbcdf3a
Fix compresskv with lookahead issue (#11767)
Yina Chen
2024-08-12 13:53:55 +0300
-
f97a77ea4e
Update all-in-one benchmark for
continuation task input preparation (#11760)
Yuwen Hu
2024-08-12 17:49:45 +0800
-
1b05caba2b
Set mistral fuse rope to false except fp6 & fp16 (#11765)
Xu, Shuo
2024-08-12 17:25:07 +0800
-
8db34057b4
optimize lookahead init time (#11769)
Ruonan Wang
2024-08-12 12:19:12 +0300
-
05989ad0f9
Update npu example and all in one benckmark (#11766)
Jin, Qiao
2024-08-12 16:46:46 +0800
-
57d177738d
optimize minicpm-v-2_6 repetition penalty (#11763)
Yishuo Wang
2024-08-12 14:10:10 +0800
-
fac4c01a6e
Revert to use out-of-tree GPU driver (#11761)
Shaojun Liu
2024-08-12 13:41:47 +0800
-
245dba0abc
Fix lightweight-serving codegeex error (#11759)
Wang, Jian4
2024-08-12 10:35:37 +0800
-
66fe2ee464
initial support of
IPEX_LLM_PERFORMANCE_MODE (#11754)
Ruonan Wang
2024-08-09 14:04:09 +0300
-
4b9c57cc60
Support compress kv with lookahead (#11752)
Yina Chen
2024-08-09 12:39:57 +0300
-
93455aac09
fix minicpm V 2.6 repeat output (#11753)
Yishuo Wang
2024-08-09 17:39:24 +0800
-
7e917d6cfb
fix gptq of llama (#11749)
Ruonan Wang
2024-08-09 11:39:25 +0300
-
dd46c141bd
Phi3 support compresskv (#11733)
Yina Chen
2024-08-09 10:43:43 +0300
-
d8808cc2e3
Mistral apply_rotary_pos_emb_no_cache_xpu use rope_theta from config (#11747)
Qiyuan Gong
2024-08-09 10:35:51 +0800
-
044e486480
Fix vLLM CPU /chat endpoint (#11748)
Xiangyu Tian
2024-08-09 10:33:52 +0800
-
27b4b104ed
Add
qwen2-1.5b-instruct into igpu performance (#11735)
Jinhe
2024-08-08 16:42:18 +0800
-
107f7aafd0
enable inference mode for deepspeed tp serving (#11742)
Shaojun Liu
2024-08-08 14:38:30 +0800
-
9e65cf00b3
Add openai-whisper pytorch gpu (#11736)
Zijie Li
2024-08-08 12:32:59 +0800
-
7e61fa1af7
Revise GPU driver related guide in for Windows users (#11740)
Yuwen Hu
2024-08-08 11:26:26 +0800
-
d0c89fb715
updated llama.cpp and ollama quickstart (#11732)
Jinhe
2024-08-08 11:04:01 +0800
-
54cc9353db
support and optimize minicpm-v-2_6 (#11738)
Yishuo Wang
2024-08-07 18:21:16 +0800
-
e956e71fc1
fix conflict with quant kv (#11737)
Yina Chen
2024-08-07 13:10:30 +0300
-
00a5574c8a
Use
merge_qkv to replace fused_qkv for llama2 (#11727)
Ruonan Wang
2024-08-07 13:04:01 +0300
-
d2abc9711b
Fix MTL 4k input qwen2 compresskv error (#11734)
Yina Chen
2024-08-07 11:21:57 +0300
-
a71ae7c22b
Support minicpm compresskv & modify default compresskv config & default enable compresskv on mtl 2.5k~4.5k (#11726)
Yina Chen
2024-08-07 06:35:39 +0300
-
c093f7d980
fix phi3 (#11729)
Yishuo Wang
2024-08-07 09:39:46 +0800
-
e32d13d78c
Remove Out of tree Driver from GPU driver installation document (#11728)
Qiyuan Gong
2024-08-07 09:38:19 +0800
-
e7f7141781
Add benchmark util for transformers 4.42 (#11725)
Zijie Li
2024-08-07 08:48:07 +0800
-
4676af2054
add
gemma2 example (#11724)
Ch1y0q
2024-08-06 21:17:50 +0800
-
985213614b
Removed no longer needed models for Arc nightly perf (#11722)
SichengStevenLi
2024-08-06 16:12:00 +0800
-
929675aa6b
support latest phi3 (#11721)
Yishuo Wang
2024-08-06 15:52:55 +0800
-
11650b6f81
upgrade glm-4v example transformers version (#11719)
Jin, Qiao
2024-08-06 14:55:09 +0800
-
bbdff6edeb
optimize internvl2 4b performance (#11720)
Yishuo Wang
2024-08-06 14:25:08 +0800
-
f44b732aa8
support internvl2-4b (#11718)
Yishuo Wang
2024-08-06 13:36:32 +0800
-
7f241133da
Add MiniCPM-Llama3-V-2_5 GPU example (#11693)
Jin, Qiao
2024-08-06 10:22:41 +0800
-
808d9a7bae
Add MiniCPM-V-2 GPU example (#11699)
Jin, Qiao
2024-08-06 10:22:33 +0800
-
8fb36b9f4a
add new benchmark_util.py (#11713)
Zijie Li
2024-08-05 16:18:48 +0800
-
493cbd9a36
Support lightweight-serving with internlm-xcomposer2-vl-7b multimodal input (#11703)
Wang, Jian4
2024-08-05 09:36:04 +0800
-
aa98ef96fe
change mixed_precision to q6_k (#11706)
Ruonan Wang
2024-08-02 10:55:16 +0300
-
1baa3efe0e
Optimizations for Pipeline Parallel Serving (#11702)
Xiangyu Tian
2024-08-02 12:06:59 +0800
-
8d1e0bd2f4
add sdp causal support in llama (#11705)
Yina Chen
2024-08-02 05:27:40 +0300
-
736a7ef72e
add
sdp_causal for mistral 4.36 (#11686)
Ruonan Wang
2024-08-01 13:57:31 +0300
-
45c730ff39
Chatglm support compresskv (#11690)
Yina Chen
2024-08-01 13:20:20 +0300