ipex-llm

Yishuo Wang 6249c1e373 rewrite llama optimization (#12609 )	2024-12-25 17:04:32 +08:00
..
__init__.py	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
aquila.py	refactor attention_softmax (#12295 )	2024-10-30 13:20:50 +08:00
baichuan.py	refactor baichuan, glm4 and minicpm3 (#12600 )	2024-12-24 14:16:30 +08:00
bert.py	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
bloom.py	refactor qwen2 and llama3 (#12587 )	2024-12-20 13:25:25 +08:00
chatglm.py	add glm_sdpa back to fix chatglm-6b (#11313 )	2024-06-14 10:31:43 +08:00
chatglm2.py	refactor chatglm2, internlm, stablelm and qwen (#12604 )	2024-12-24 18:18:00 +08:00
chatglm4.py	refactor baichuan, glm4 and minicpm3 (#12600 )	2024-12-24 14:16:30 +08:00
chatglm4v.py	refactor baichuan, glm4 and minicpm3 (#12600 )	2024-12-24 14:16:30 +08:00
cohere.py	fix llama related import (#12611 )	2024-12-25 16:23:52 +08:00
common.py	refactor mistral and phi3 (#12605 )	2024-12-24 17:52:32 +08:00
decilm.py	fix llama related import (#12611 )	2024-12-25 16:23:52 +08:00
falcon.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
gemma.py	refactor attention_softmax (#12295 )	2024-10-30 13:20:50 +08:00
gemma2.py	optimize minicpm3 again (#12047 )	2024-09-10 14:19:57 +08:00
glm.py	refactor glm edge (#12588 )	2024-12-20 15:36:57 +08:00
gpt2.py	refactor mllama, gpt2 and internvl (#12602 )	2024-12-24 14:18:31 +08:00
gptbigcode.py	Fix Starcoder issue on CPU on transformers 4.36+ (#11190 )	2024-06-04 10:05:40 -07:00
gptj.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
gptneox.py	refactor ot remove old rope usage (#12224 )	2024-10-17 17:06:09 +08:00
internlm.py	refactor chatglm2, internlm, stablelm and qwen (#12604 )	2024-12-24 18:18:00 +08:00
internvl.py	refactor mllama, gpt2 and internvl (#12602 )	2024-12-24 14:18:31 +08:00
llama.py	rewrite llama optimization (#12609 )	2024-12-25 17:04:32 +08:00
minicpm.py	refactor yuan2 and starcoder2 and fix (#12589 )	2024-12-20 16:41:50 +08:00
minicpm3.py	refactor baichuan, glm4 and minicpm3 (#12600 )	2024-12-24 14:16:30 +08:00
minicpmv.py	refactor sd 1.5 and qwen2-vl and fix (#12590 )	2024-12-20 17:34:55 +08:00
mistral.py	add compresskv back for mistral (#12607 )	2024-12-25 11:06:08 +08:00
mixtral.py	add compresskv back for mistral (#12607 )	2024-12-25 11:06:08 +08:00
mllama.py	refactor mllama, gpt2 and internvl (#12602 )	2024-12-24 14:18:31 +08:00
mpt.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
phi.py	refactor attention_softmax (#12295 )	2024-10-30 13:20:50 +08:00
phi3.py	refactor mistral and phi3 (#12605 )	2024-12-24 17:52:32 +08:00
phixtral.py	refactor to reduce old rope usage (#12219 )	2024-10-17 14:45:09 +08:00
qwen.py	refactor chatglm2, internlm, stablelm and qwen (#12604 )	2024-12-24 18:18:00 +08:00
qwen2.py	refactor mistral and phi3 (#12605 )	2024-12-24 17:52:32 +08:00
qwen2_moe.py	refactor merge_qkv and attention_softmax (#12213 )	2024-10-16 15:58:14 +08:00
qwen2_vl.py	refactor sd 1.5 and qwen2-vl and fix (#12590 )	2024-12-20 17:34:55 +08:00
qwen_vl.py	Support pipeline parallel for qwen-vl (#11503 )	2024-07-04 18:03:57 +08:00
rwkv4.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
rwkv5.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
sd.py	refactor sd 1.5 and qwen2-vl and fix (#12590 )	2024-12-20 17:34:55 +08:00
stablelm.py	refactor chatglm2, internlm, stablelm and qwen (#12604 )	2024-12-24 18:18:00 +08:00
starcoder2.py	refactor yuan2 and starcoder2 and fix (#12589 )	2024-12-20 16:41:50 +08:00
utils.py	refactor qwen2 and llama3 (#12587 )	2024-12-20 13:25:25 +08:00
yuan.py	refactor yuan2 and starcoder2 and fix (#12589 )	2024-12-20 16:41:50 +08:00

__init__.py

Refactor bigdl.llm to ipex_llm (#24 )

2024-03-22 15:41:21 +08:00

aquila.py

refactor attention_softmax (#12295 )

2024-10-30 13:20:50 +08:00

baichuan.py

refactor baichuan, glm4 and minicpm3 (#12600 )

2024-12-24 14:16:30 +08:00

bert.py

Refactor bigdl.llm to ipex_llm (#24 )

2024-03-22 15:41:21 +08:00

bloom.py

refactor qwen2 and llama3 (#12587 )

2024-12-20 13:25:25 +08:00

chatglm.py

add glm_sdpa back to fix chatglm-6b (#11313 )

2024-06-14 10:31:43 +08:00

chatglm2.py

refactor chatglm2, internlm, stablelm and qwen (#12604 )

2024-12-24 18:18:00 +08:00

chatglm4.py

refactor baichuan, glm4 and minicpm3 (#12600 )

2024-12-24 14:16:30 +08:00

chatglm4v.py

refactor baichuan, glm4 and minicpm3 (#12600 )

2024-12-24 14:16:30 +08:00

cohere.py

fix llama related import (#12611 )

2024-12-25 16:23:52 +08:00

common.py

refactor mistral and phi3 (#12605 )

2024-12-24 17:52:32 +08:00

decilm.py

fix llama related import (#12611 )

2024-12-25 16:23:52 +08:00

falcon.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

gemma.py

refactor attention_softmax (#12295 )

2024-10-30 13:20:50 +08:00

gemma2.py

optimize minicpm3 again (#12047 )

2024-09-10 14:19:57 +08:00

glm.py

refactor glm edge (#12588 )

2024-12-20 15:36:57 +08:00

gpt2.py

refactor mllama, gpt2 and internvl (#12602 )

2024-12-24 14:18:31 +08:00

gptbigcode.py

Fix Starcoder issue on CPU on transformers 4.36+ (#11190 )

2024-06-04 10:05:40 -07:00

gptj.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

gptneox.py

refactor ot remove old rope usage (#12224 )

2024-10-17 17:06:09 +08:00

internlm.py

refactor chatglm2, internlm, stablelm and qwen (#12604 )

2024-12-24 18:18:00 +08:00

internvl.py

refactor mllama, gpt2 and internvl (#12602 )

2024-12-24 14:18:31 +08:00

llama.py

rewrite llama optimization (#12609 )

2024-12-25 17:04:32 +08:00

minicpm.py

refactor yuan2 and starcoder2 and fix (#12589 )

2024-12-20 16:41:50 +08:00

minicpm3.py

refactor baichuan, glm4 and minicpm3 (#12600 )

2024-12-24 14:16:30 +08:00

minicpmv.py

refactor sd 1.5 and qwen2-vl and fix (#12590 )

2024-12-20 17:34:55 +08:00

mistral.py

add compresskv back for mistral (#12607 )

2024-12-25 11:06:08 +08:00

mixtral.py

add compresskv back for mistral (#12607 )

2024-12-25 11:06:08 +08:00

mllama.py

refactor mllama, gpt2 and internvl (#12602 )

2024-12-24 14:18:31 +08:00

mpt.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

phi.py

refactor attention_softmax (#12295 )

2024-10-30 13:20:50 +08:00

phi3.py

refactor mistral and phi3 (#12605 )

2024-12-24 17:52:32 +08:00

phixtral.py

refactor to reduce old rope usage (#12219 )

2024-10-17 14:45:09 +08:00

qwen.py

refactor chatglm2, internlm, stablelm and qwen (#12604 )

2024-12-24 18:18:00 +08:00

qwen2.py

refactor mistral and phi3 (#12605 )

2024-12-24 17:52:32 +08:00

qwen2_moe.py

refactor merge_qkv and attention_softmax (#12213 )

2024-10-16 15:58:14 +08:00

qwen2_vl.py

refactor sd 1.5 and qwen2-vl and fix (#12590 )

2024-12-20 17:34:55 +08:00

qwen_vl.py

Support pipeline parallel for qwen-vl (#11503 )

2024-07-04 18:03:57 +08:00

rwkv4.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

rwkv5.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

sd.py

refactor sd 1.5 and qwen2-vl and fix (#12590 )

2024-12-20 17:34:55 +08:00

stablelm.py

refactor chatglm2, internlm, stablelm and qwen (#12604 )

2024-12-24 18:18:00 +08:00

starcoder2.py

refactor yuan2 and starcoder2 and fix (#12589 )

2024-12-20 16:41:50 +08:00

utils.py

refactor qwen2 and llama3 (#12587 )

2024-12-20 13:25:25 +08:00

yuan.py

refactor yuan2 and starcoder2 and fix (#12589 )

2024-12-20 16:41:50 +08:00