ipex-llm

Yishuo Wang 17a0beb21f optimize qwen2-audio again (#11825 )	2024-08-16 11:11:35 +08:00
..
__init__.py
aquila.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
baichuan.py	Refactor baichuan1 7B and 13B (#11258 )	2024-06-07 14:29:20 +08:00
bert.py
bloom.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
chatglm.py	add glm_sdpa back to fix chatglm-6b (#11313 )	2024-06-14 10:31:43 +08:00
chatglm2.py	MiniCPM-V support compresskv (#11779 )	2024-08-13 19:03:40 +08:00
chatglm4.py	MiniCPM-V support compresskv (#11779 )	2024-08-13 19:03:40 +08:00
chatglm4v.py	Support pipeline parallel for glm-4v (#11545 )	2024-07-11 16:06:06 +08:00
cohere.py	Fix cohere model on transformers>=4.41 (#11575 )	2024-07-17 17:18:59 -07:00
common.py	fix phi3 and minicpmv cpu (#11818 )	2024-08-15 17:43:29 +08:00
decilm.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
falcon.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
gemma.py	fix gemma for 4.41 (#11531 )	2024-07-18 15:02:50 -07:00
gemma2.py	fix gemma2 runtime error caused by sliding window (#11788 )	2024-08-14 10:43:33 +08:00
gptbigcode.py	Fix Starcoder issue on CPU on transformers 4.36+ (#11190 )	2024-06-04 10:05:40 -07:00
gptj.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
gptneox.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
internlm.py	fix internlm xcomposser stream chat (#11564 )	2024-07-11 18:21:17 +08:00
internvl.py	support internvl2-4b (#11718 )	2024-08-06 13:36:32 +08:00
llama.py	MiniCPM-V support compresskv (#11779 )	2024-08-13 19:03:40 +08:00
minicpm.py	rewrite minicpmv optimization (#11816 )	2024-08-15 17:27:12 +08:00
minicpmv.py	fix minicpm-v-2 fp16 (#11819 )	2024-08-15 18:34:40 +08:00
mistral.py	MiniCPM-V support compresskv (#11779 )	2024-08-13 19:03:40 +08:00
mixtral.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
mpt.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
phi.py	remove new_layout parameter (#10906 )	2024-04-29 10:31:50 +08:00
phi3.py	fix phi3 and minicpmv cpu (#11818 )	2024-08-15 17:43:29 +08:00
phixtral.py	Disable fast fused rope on UHD (#10780 )	2024-04-18 10:03:53 +08:00
qwen.py	fix first token sdp with batch (#11153 )	2024-05-28 15:03:06 +08:00
qwen2.py	optimize qwen2-audio again (#11825 )	2024-08-16 11:11:35 +08:00
qwen2_moe.py	Add more qwen1.5 and qwen2 support for pipeline parallel inference (#11423 )	2024-06-25 15:49:32 +08:00
qwen_vl.py	Support pipeline parallel for qwen-vl (#11503 )	2024-07-04 18:03:57 +08:00
rwkv4.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
rwkv5.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
stablelm.py	use new fuse rope in stablelm family (#11497 )	2024-07-03 11:08:26 +08:00
starcoder2.py	add latest optimization in starcoder2 (#11236 )	2024-06-06 14:02:17 +08:00
utils.py	Update `IPEX_LLM_PERFORMANCE_MODE` (#11823 )	2024-08-16 09:48:36 +08:00
yuan.py	refactor yuan2 (#11235 )	2024-06-06 13:17:54 +08:00

aquila.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

baichuan.py

Refactor baichuan1 7B and 13B (#11258 )

2024-06-07 14:29:20 +08:00

bloom.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

chatglm.py

add glm_sdpa back to fix chatglm-6b (#11313 )

2024-06-14 10:31:43 +08:00

chatglm2.py

MiniCPM-V support compresskv (#11779 )

2024-08-13 19:03:40 +08:00

chatglm4.py

MiniCPM-V support compresskv (#11779 )

2024-08-13 19:03:40 +08:00

chatglm4v.py

Support pipeline parallel for glm-4v (#11545 )

2024-07-11 16:06:06 +08:00

cohere.py

Fix cohere model on transformers>=4.41 (#11575 )

2024-07-17 17:18:59 -07:00

common.py

fix phi3 and minicpmv cpu (#11818 )

2024-08-15 17:43:29 +08:00

decilm.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

falcon.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

gemma.py

fix gemma for 4.41 (#11531 )

2024-07-18 15:02:50 -07:00

gemma2.py

fix gemma2 runtime error caused by sliding window (#11788 )

2024-08-14 10:43:33 +08:00

gptbigcode.py

Fix Starcoder issue on CPU on transformers 4.36+ (#11190 )

2024-06-04 10:05:40 -07:00

gptj.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

gptneox.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

internlm.py

fix internlm xcomposser stream chat (#11564 )

2024-07-11 18:21:17 +08:00

internvl.py

support internvl2-4b (#11718 )

2024-08-06 13:36:32 +08:00

llama.py

MiniCPM-V support compresskv (#11779 )

2024-08-13 19:03:40 +08:00

minicpm.py

rewrite minicpmv optimization (#11816 )

2024-08-15 17:27:12 +08:00

minicpmv.py

fix minicpm-v-2 fp16 (#11819 )

2024-08-15 18:34:40 +08:00

mistral.py

MiniCPM-V support compresskv (#11779 )

2024-08-13 19:03:40 +08:00

mixtral.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

mpt.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

phi.py

remove new_layout parameter (#10906 )

2024-04-29 10:31:50 +08:00

phi3.py

fix phi3 and minicpmv cpu (#11818 )

2024-08-15 17:43:29 +08:00

phixtral.py

Disable fast fused rope on UHD (#10780 )

2024-04-18 10:03:53 +08:00

qwen.py

fix first token sdp with batch (#11153 )

2024-05-28 15:03:06 +08:00

qwen2.py

optimize qwen2-audio again (#11825 )

2024-08-16 11:11:35 +08:00

qwen2_moe.py

Add more qwen1.5 and qwen2 support for pipeline parallel inference (#11423 )

2024-06-25 15:49:32 +08:00

qwen_vl.py

Support pipeline parallel for qwen-vl (#11503 )

2024-07-04 18:03:57 +08:00

rwkv4.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

rwkv5.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

stablelm.py

use new fuse rope in stablelm family (#11497 )

2024-07-03 11:08:26 +08:00

starcoder2.py

add latest optimization in starcoder2 (#11236 )

2024-06-06 14:02:17 +08:00

utils.py

Update IPEX_LLM_PERFORMANCE_MODE (#11823 )

2024-08-16 09:48:36 +08:00

yuan.py

refactor yuan2 (#11235 )

2024-06-06 13:17:54 +08:00