ipex-llm

ayo/ipex-llm

Fork 0

5b83493b1a

Add ipex-llm npu option in setup.py (#11858) SONG Ge 2024-08-20 17:29:49 +0800
ee6852c915

Fix typo (#11862) Heyang Sun 2024-08-20 16:38:11 +0800
2946420e14

add minicpmv 2.6 load_low_bit workaround (#11856) Yishuo Wang 2024-08-20 11:16:02 +0800
7380823f3f

Update Llama2 multi-processes example (#11852) SONG Ge 2024-08-19 19:49:01 +0800
99b05ba1dc

separate prefill into a process (#11787) Yang Wang 2024-08-19 02:53:36 -0700
da3d7a3a53

delete transformers version requirement (#11845) Jinhe 2024-08-19 17:53:02 +0800
a0fbda5bc8

add MiniCPM-Llama3-V-2_5 into all-in-one benchmark (#11849) Ruonan Wang 2024-08-19 02:51:16 -0700
9490781aec

optimize phi3 memory usage again (#11848) Yishuo Wang 2024-08-19 17:26:59 +0800
3cd4e87168

Support compress KV with quantize KV (#11812) Yina Chen 2024-08-19 10:32:32 +0300
6841a9ac8f

fix load low bit com dtype (#11832) Zhao Changmin 2024-08-19 13:43:19 +0800
cfc959defa

Fixes regarding utf-8 in all-in-one benchmark (#11839) Yuwen Hu 2024-08-19 10:38:00 +0800
46a1cbfa64

feat: add mixed_precision argument on ppl longbench evaluation (#11837) Chu,Youcheng 2024-08-19 10:00:44 +0800
580c94d0e2

Remove gemma-2-9b-it 3k input from igpu-perf (#11834) Yuwen Hu 2024-08-17 13:10:05 +0800
9f17234f3b

Add MiniCPM-V-2_6 to iGPU Perf (#11810) Jin, Qiao 2024-08-16 18:41:21 +0800
96796f95cb

Update all-in-one benchmark prompts for continuation task & lookup update for minicpmv (#11827) Yuwen Hu 2024-08-16 17:16:35 +0800
e966e85df8

force lm_head optimization in any model if set environment variable (#11830) Yishuo Wang 2024-08-16 16:48:45 +0800
3b630fb9df

updated ppl README (#11807) RyuKosei 2024-08-16 15:49:25 +0800
e07a55665c

Codegeex2 tokenization fix (#11831) Jinhe 2024-08-16 15:48:47 +0800
a508b0a902

added link to minicpm-v-2_6 example (#11829) Jinhe 2024-08-16 14:49:23 +0800
adfbb9124a

Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples (#11815) Jinhe 2024-08-16 14:48:56 +0800
f463268e36

fix: add run oneAPI instruction for the example of codeshell (#11828) Chu,Youcheng 2024-08-16 14:29:06 +0800
17a0beb21f

optimize qwen2-audio again (#11825) Yishuo Wang 2024-08-16 11:11:35 +0800
6a8d07ddb4

Update README.md (#11824) Jason Dai 2024-08-16 10:22:02 +0800
9e9086cc2a

Update IPEX_LLM_PERFORMANCE_MODE (#11823) Yuwen Hu 2024-08-16 09:48:36 +0800
5a80fd2633

Fix lightweight-serving no streaming resp on mtl (#11822) Wang, Jian4 2024-08-16 09:43:03 +0800
e70ae0638e

Fix vLLM not convert issues (#11817) Guancheng Fu 2024-08-15 19:04:05 +0800
750d4ad5dc

fix minicpm-v-2 fp16 (#11819) Yishuo Wang 2024-08-15 18:34:40 +0800
6543321f04

Remove 4k igpu perf on gemma-2-9b-it (#11820) Yuwen Hu 2024-08-15 18:06:19 +0800
28d1c972da

add mixed_precision argument on ppl wikitext evaluation (#11813) Chu,Youcheng 2024-08-15 17:58:53 +0800
828ab16537

fix phi3 and minicpmv cpu (#11818) Yishuo Wang 2024-08-15 17:43:29 +0800
4e178f0c5d

rewrite minicpmv optimization (#11816) Yishuo Wang 2024-08-15 17:27:12 +0800
447c8ed324

update transformers version for replit-code-v1-3b, `internlm2-chat-… (#11811) Ch1y0q 2024-08-15 16:40:48 +0800
2fbbb51e71

transformers==4.37, yi & yuan2 & vicuna (#11805) Jinhe 2024-08-15 15:39:24 +0800
f43da2d455

deletion of specification of transformers version (#11808) Jinhe 2024-08-15 15:23:32 +0800
07b7f13982

support and optimize qwen2-audio (#11809) Yishuo Wang 2024-08-15 14:59:04 +0800
3ac83f8396

fix: delete ipex extension import in ppl wikitext evaluation (#11806) Chu,Youcheng 2024-08-15 13:40:01 +0800
016e840eed

Fix performance tests (#11802) Yuwen Hu 2024-08-15 01:37:01 +0800
e3c1dae619

Fix Windows Unit Test (#11801) Shaojun Liu 2024-08-14 19:16:48 +0800
9a93808fc5

fix and optimize minicpm v 2 (#11799) Yishuo Wang 2024-08-14 17:27:23 +0800
d8d887edd2

added minicpm-v-2_6 (#11794) Jinhe 2024-08-14 16:23:44 +0800
3d6cfa291d

optimize minicpm v 2.5 (#11793) Yishuo Wang 2024-08-14 16:07:24 +0800
356281cb80

Further all-in-one benchmark update continuation task (#11784) Yuwen Hu 2024-08-14 14:39:34 +0800
43cca3be27

fix gemma2 runtime error caused by sliding window (#11788) Ruonan Wang 2024-08-14 05:43:33 +0300
dbd14251dd

Troubleshoot for sycl not found (#11774) Jinhe 2024-08-14 10:26:01 +0800
51bcac1229

follow up on experimental support of fused decoder layer for llama2 (#11785) Yang Wang 2024-08-13 18:53:55 -0700
cb79dcda93

refactor llama convert to fix minicpm-v 2.5 optimization (#11783) Yishuo Wang 2024-08-14 09:29:57 +0800
7cd6ec9723

MiniCPM-V support compresskv (#11779) Yina Chen 2024-08-13 14:03:40 +0300
3998de14f0

Fix mistral forward_qkv in q4_0 (#11781) Qiyuan Gong 2024-08-13 16:48:19 +0800
70c828b87c

deepspeed zero3 QLoRA finetuning (#11625) Heyang Sun 2024-08-13 16:15:29 +0800
a184b120c9

fix minicpm-v 2.5 (#11780) Yishuo Wang 2024-08-13 16:14:00 +0800
ec184af243

Add gemma-2-2b-it and gemma-2-9b-it to igpu nightly performance test (#11778) Yuwen Hu 2024-08-13 15:39:56 +0800
a88c132e54

Reduce Mistral softmax memory only in low memory mode (#11775) Qiyuan Gong 2024-08-13 14:50:54 +0800
aa861df066

use new fp32 softmax kernel (#11776) Yishuo Wang 2024-08-13 14:48:11 +0800
23d3acdc77

Add experimental support of fused decoder layer for llama2 (#11768) binbin Deng 2024-08-13 14:41:36 +0800
c28b3389e6

Update npu multimodal example (#11773) Jin, Qiao 2024-08-13 14:14:59 +0800
81824ff8c9

Fix stdout in all-in-one benchmark to utf-8 (#11772) Yuwen Hu 2024-08-13 10:51:08 +0800
a1eb793f70

optimize minicpm v 2_6 firs token perf (#11770) Yishuo Wang 2024-08-13 09:51:18 +0800
841dbcdf3a

Fix compresskv with lookahead issue (#11767) Yina Chen 2024-08-12 13:53:55 +0300
f97a77ea4e

Update all-in-one benchmark for continuation task input preparation (#11760) Yuwen Hu 2024-08-12 17:49:45 +0800
1b05caba2b

Set mistral fuse rope to false except fp6 & fp16 (#11765) Xu, Shuo 2024-08-12 17:25:07 +0800
8db34057b4

optimize lookahead init time (#11769) Ruonan Wang 2024-08-12 12:19:12 +0300
05989ad0f9

Update npu example and all in one benckmark (#11766) Jin, Qiao 2024-08-12 16:46:46 +0800
57d177738d

optimize minicpm-v-2_6 repetition penalty (#11763) Yishuo Wang 2024-08-12 14:10:10 +0800
fac4c01a6e

Revert to use out-of-tree GPU driver (#11761) Shaojun Liu 2024-08-12 13:41:47 +0800
245dba0abc

Fix lightweight-serving codegeex error (#11759) Wang, Jian4 2024-08-12 10:35:37 +0800
66fe2ee464

initial support of IPEX_LLM_PERFORMANCE_MODE (#11754) Ruonan Wang 2024-08-09 14:04:09 +0300
4b9c57cc60

Support compress kv with lookahead (#11752) Yina Chen 2024-08-09 12:39:57 +0300
93455aac09

fix minicpm V 2.6 repeat output (#11753) Yishuo Wang 2024-08-09 17:39:24 +0800
7e917d6cfb

fix gptq of llama (#11749) Ruonan Wang 2024-08-09 11:39:25 +0300
dd46c141bd

Phi3 support compresskv (#11733) Yina Chen 2024-08-09 10:43:43 +0300
d8808cc2e3

Mistral apply_rotary_pos_emb_no_cache_xpu use rope_theta from config (#11747) Qiyuan Gong 2024-08-09 10:35:51 +0800
044e486480

Fix vLLM CPU /chat endpoint (#11748) Xiangyu Tian 2024-08-09 10:33:52 +0800
27b4b104ed

Add qwen2-1.5b-instruct into igpu performance (#11735) Jinhe 2024-08-08 16:42:18 +0800
107f7aafd0

enable inference mode for deepspeed tp serving (#11742) Shaojun Liu 2024-08-08 14:38:30 +0800
9e65cf00b3

Add openai-whisper pytorch gpu (#11736) Zijie Li 2024-08-08 12:32:59 +0800
7e61fa1af7

Revise GPU driver related guide in for Windows users (#11740) Yuwen Hu 2024-08-08 11:26:26 +0800
d0c89fb715

updated llama.cpp and ollama quickstart (#11732) Jinhe 2024-08-08 11:04:01 +0800
54cc9353db

support and optimize minicpm-v-2_6 (#11738) Yishuo Wang 2024-08-07 18:21:16 +0800
e956e71fc1

fix conflict with quant kv (#11737) Yina Chen 2024-08-07 13:10:30 +0300
00a5574c8a

Use merge_qkv to replace fused_qkv for llama2 (#11727) Ruonan Wang 2024-08-07 13:04:01 +0300
d2abc9711b

Fix MTL 4k input qwen2 compresskv error (#11734) Yina Chen 2024-08-07 11:21:57 +0300
a71ae7c22b

Support minicpm compresskv & modify default compresskv config & default enable compresskv on mtl 2.5k~4.5k (#11726) Yina Chen 2024-08-07 06:35:39 +0300
c093f7d980

fix phi3 (#11729) Yishuo Wang 2024-08-07 09:39:46 +0800
e32d13d78c

Remove Out of tree Driver from GPU driver installation document (#11728) Qiyuan Gong 2024-08-07 09:38:19 +0800
e7f7141781

Add benchmark util for transformers 4.42 (#11725) Zijie Li 2024-08-07 08:48:07 +0800
4676af2054

add gemma2 example (#11724) Ch1y0q 2024-08-06 21:17:50 +0800
985213614b

Removed no longer needed models for Arc nightly perf (#11722) SichengStevenLi 2024-08-06 16:12:00 +0800
929675aa6b

support latest phi3 (#11721) Yishuo Wang 2024-08-06 15:52:55 +0800
11650b6f81

upgrade glm-4v example transformers version (#11719) Jin, Qiao 2024-08-06 14:55:09 +0800
bbdff6edeb

optimize internvl2 4b performance (#11720) Yishuo Wang 2024-08-06 14:25:08 +0800
f44b732aa8

support internvl2-4b (#11718) Yishuo Wang 2024-08-06 13:36:32 +0800
7f241133da

Add MiniCPM-Llama3-V-2_5 GPU example (#11693) Jin, Qiao 2024-08-06 10:22:41 +0800
808d9a7bae

Add MiniCPM-V-2 GPU example (#11699) Jin, Qiao 2024-08-06 10:22:33 +0800
8fb36b9f4a

add new benchmark_util.py (#11713) Zijie Li 2024-08-05 16:18:48 +0800
493cbd9a36

Support lightweight-serving with internlm-xcomposer2-vl-7b multimodal input (#11703) Wang, Jian4 2024-08-05 09:36:04 +0800
aa98ef96fe

change mixed_precision to q6_k (#11706) Ruonan Wang 2024-08-02 10:55:16 +0300
1baa3efe0e

Optimizations for Pipeline Parallel Serving (#11702) Xiangyu Tian 2024-08-02 12:06:59 +0800
8d1e0bd2f4

add sdp causal support in llama (#11705) Yina Chen 2024-08-02 05:27:40 +0300
736a7ef72e

add sdp_causal for mistral 4.36 (#11686) Ruonan Wang 2024-08-01 13:57:31 +0300
45c730ff39

Chatglm support compresskv (#11690) Yina Chen 2024-08-01 13:20:20 +0300

Commit graph Select branches Hide pull requests main Mono Color

Commit graph

Select branches

Hide pull requests

main