ipex-llm

Yishuo Wang a4a758656a refactor gemma to reduce old fuse rope usage (#12215 )	2024-10-16 17:40:28 +08:00
..
__init__.py	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
aquila.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
baichuan.py	add comment	2024-08-22 15:14:47 +08:00
bert.py	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
bloom.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
chatglm.py	add glm_sdpa back to fix chatglm-6b (#11313 )	2024-06-14 10:31:43 +08:00
chatglm2.py	Support compress KV with quantize KV (#11812 )	2024-08-19 15:32:32 +08:00
chatglm4.py	Support compress KV with quantize KV (#11812 )	2024-08-19 15:32:32 +08:00
chatglm4v.py	Support pipeline parallel for glm-4v (#11545 )	2024-07-11 16:06:06 +08:00
cohere.py	Fix cohere model on transformers>=4.41 (#11575 )	2024-07-17 17:18:59 -07:00
common.py	optimize minicpm3 again (#12047 )	2024-09-10 14:19:57 +08:00
decilm.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
falcon.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
gemma.py	refactor gemma to reduce old fuse rope usage (#12215 )	2024-10-16 17:40:28 +08:00
gemma2.py	optimize minicpm3 again (#12047 )	2024-09-10 14:19:57 +08:00
gptbigcode.py	Fix Starcoder issue on CPU on transformers 4.36+ (#11190 )	2024-06-04 10:05:40 -07:00
gptj.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
gptneox.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
internlm.py	refactor merge_qkv and attention_softmax (#12213 )	2024-10-16 15:58:14 +08:00
internvl.py	optimzie qwen2-vl vision (#12203 )	2024-10-15 15:54:25 +08:00
llama.py	remove some useless code (#12035 )	2024-09-06 17:51:08 +08:00
llama32.py	Fix Llama 3.2 & 3.1 on LNL (#12196 )	2024-10-14 17:39:20 +08:00
minicpm.py	Update compresskv model forward type logic (#11868 )	2024-08-20 18:11:37 +08:00
minicpm3.py	optimize minicpm3 kv cache (#12052 )	2024-09-10 16:51:21 +08:00
minicpmv.py	Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer (#11963 )	2024-08-29 19:22:09 +08:00
mistral.py	remove some useless code (#12035 )	2024-09-06 17:51:08 +08:00
mixtral.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
mllama.py	optimize llama3.2 vision again (#12211 )	2024-10-16 14:29:48 +08:00
mpt.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
phi.py	refactor phi-2 to reduce old fuse rope usage (#12214 )	2024-10-16 17:08:14 +08:00
phi3.py	use sdp_causal to reduce internvl2-4b memory usage if set environment variable (#11953 )	2024-08-28 17:35:05 +08:00
phixtral.py	Disable fast fused rope on UHD (#10780 )	2024-04-18 10:03:53 +08:00
qwen.py	fix first token sdp with batch (#11153 )	2024-05-28 15:03:06 +08:00
qwen2.py	Update compresskv model forward type logic (#11868 )	2024-08-20 18:11:37 +08:00
qwen2_moe.py	refactor merge_qkv and attention_softmax (#12213 )	2024-10-16 15:58:14 +08:00
qwen2_vl.py	optimzie qwen2-vl vision (#12203 )	2024-10-15 15:54:25 +08:00
qwen_vl.py	Support pipeline parallel for qwen-vl (#11503 )	2024-07-04 18:03:57 +08:00
rwkv4.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
rwkv5.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
sd15.py	fix sd1.5 (#12129 )	2024-09-26 17:15:16 +08:00
stablelm.py	refactor merge_qkv and attention_softmax (#12213 )	2024-10-16 15:58:14 +08:00
starcoder2.py	refactor merge_qkv and attention_softmax (#12213 )	2024-10-16 15:58:14 +08:00
utils.py	optimzie qwen2-vl vision (#12203 )	2024-10-15 15:54:25 +08:00
yuan.py	refactor merge_qkv and attention_softmax (#12213 )	2024-10-16 15:58:14 +08:00

__init__.py

Refactor bigdl.llm to ipex_llm (#24 )

2024-03-22 15:41:21 +08:00

aquila.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

baichuan.py

add comment

2024-08-22 15:14:47 +08:00

bert.py

Refactor bigdl.llm to ipex_llm (#24 )

2024-03-22 15:41:21 +08:00

bloom.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

chatglm.py

add glm_sdpa back to fix chatglm-6b (#11313 )

2024-06-14 10:31:43 +08:00

chatglm2.py

Support compress KV with quantize KV (#11812 )

2024-08-19 15:32:32 +08:00

chatglm4.py

Support compress KV with quantize KV (#11812 )

2024-08-19 15:32:32 +08:00

chatglm4v.py

Support pipeline parallel for glm-4v (#11545 )

2024-07-11 16:06:06 +08:00

cohere.py

Fix cohere model on transformers>=4.41 (#11575 )

2024-07-17 17:18:59 -07:00

common.py

optimize minicpm3 again (#12047 )

2024-09-10 14:19:57 +08:00

decilm.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

falcon.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

gemma.py

refactor gemma to reduce old fuse rope usage (#12215 )

2024-10-16 17:40:28 +08:00

gemma2.py

optimize minicpm3 again (#12047 )

2024-09-10 14:19:57 +08:00

gptbigcode.py

Fix Starcoder issue on CPU on transformers 4.36+ (#11190 )

2024-06-04 10:05:40 -07:00

gptj.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

gptneox.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

internlm.py

refactor merge_qkv and attention_softmax (#12213 )

2024-10-16 15:58:14 +08:00

internvl.py

optimzie qwen2-vl vision (#12203 )

2024-10-15 15:54:25 +08:00

llama.py

remove some useless code (#12035 )

2024-09-06 17:51:08 +08:00

llama32.py

Fix Llama 3.2 & 3.1 on LNL (#12196 )

2024-10-14 17:39:20 +08:00

minicpm.py

Update compresskv model forward type logic (#11868 )

2024-08-20 18:11:37 +08:00

minicpm3.py

optimize minicpm3 kv cache (#12052 )

2024-09-10 16:51:21 +08:00

minicpmv.py

Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer (#11963 )

2024-08-29 19:22:09 +08:00

mistral.py

remove some useless code (#12035 )

2024-09-06 17:51:08 +08:00

mixtral.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

mllama.py

optimize llama3.2 vision again (#12211 )

2024-10-16 14:29:48 +08:00

mpt.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

phi.py

refactor phi-2 to reduce old fuse rope usage (#12214 )

2024-10-16 17:08:14 +08:00

phi3.py

use sdp_causal to reduce internvl2-4b memory usage if set environment variable (#11953 )

2024-08-28 17:35:05 +08:00

phixtral.py

Disable fast fused rope on UHD (#10780 )

2024-04-18 10:03:53 +08:00

qwen.py

fix first token sdp with batch (#11153 )

2024-05-28 15:03:06 +08:00

qwen2.py

Update compresskv model forward type logic (#11868 )

2024-08-20 18:11:37 +08:00

qwen2_moe.py

refactor merge_qkv and attention_softmax (#12213 )

2024-10-16 15:58:14 +08:00

qwen2_vl.py

optimzie qwen2-vl vision (#12203 )

2024-10-15 15:54:25 +08:00

qwen_vl.py

Support pipeline parallel for qwen-vl (#11503 )

2024-07-04 18:03:57 +08:00

rwkv4.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

rwkv5.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

sd15.py

fix sd1.5 (#12129 )

2024-09-26 17:15:16 +08:00

stablelm.py

refactor merge_qkv and attention_softmax (#12213 )

2024-10-16 15:58:14 +08:00

starcoder2.py

refactor merge_qkv and attention_softmax (#12213 )

2024-10-16 15:58:14 +08:00

utils.py

optimzie qwen2-vl vision (#12203 )

2024-10-15 15:54:25 +08:00

yuan.py

refactor merge_qkv and attention_softmax (#12213 )

2024-10-16 15:58:14 +08:00