ipex-llm

Xin Qiu dbc3c2d72d glm4 sdp (#11253 ) * glm4 sdp * fix style * update comment	2024-06-07 15:42:23 +08:00
..
__init__.py	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
aquila.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
baichuan.py	Refactor baichuan1 7B and 13B (#11258 )	2024-06-07 14:29:20 +08:00
bert.py	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
bloom.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
chatglm.py	fix chatglm run error (#11045 )	2024-05-16 15:39:18 +08:00
chatglm2.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
chatglm2_32k.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
chatglm4.py	glm4 sdp (#11253 )	2024-06-07 15:42:23 +08:00
cohere.py	Fix `should_use_fuse_rope` error of Qwen1.5-MoE-A2.7B-Chat (#11216 )	2024-06-05 15:56:10 +08:00
decilm.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
falcon.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
gemma.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
gptbigcode.py	Fix Starcoder issue on CPU on transformers 4.36+ (#11190 )	2024-06-04 10:05:40 -07:00
gptj.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
gptneox.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
internlm.py	fix first token sdp with batch (#11153 )	2024-05-28 15:03:06 +08:00
llama.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
minicpm.py	quantized attention forward for minicpm (#11200 )	2024-06-05 09:15:25 +08:00
mistral.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
mixtral.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
mpt.py	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
phi.py	remove new_layout parameter (#10906 )	2024-04-29 10:31:50 +08:00
phi3.py	disable sdp_causal in phi-3 to fix overflow (#11157 )	2024-05-28 17:25:53 +08:00
phixtral.py	Disable fast fused rope on UHD (#10780 )	2024-04-18 10:03:53 +08:00
qwen.py	fix first token sdp with batch (#11153 )	2024-05-28 15:03:06 +08:00
qwen2.py	qwen2 sdpa small fix (#11261 )	2024-06-07 14:42:18 +08:00
qwen2_moe.py	Refactor qwen2 moe (#11244 )	2024-06-07 13:14:54 +08:00
qwen_vl.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
rwkv4.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
rwkv5.py	Divide core-xe packages (#11131 )	2024-05-28 12:00:18 +08:00
stablelm.py	refactor stablelm (#11195 )	2024-06-04 13:14:43 +08:00
starcoder2.py	add latest optimization in starcoder2 (#11236 )	2024-06-06 14:02:17 +08:00
utils.py	check devie name in use_flash_attention (#11263 )	2024-06-07 15:07:47 +08:00
yuan.py	refactor yuan2 (#11235 )	2024-06-06 13:17:54 +08:00

__init__.py

Refactor bigdl.llm to ipex_llm (#24 )

2024-03-22 15:41:21 +08:00

aquila.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

baichuan.py

Refactor baichuan1 7B and 13B (#11258 )

2024-06-07 14:29:20 +08:00

bert.py

Refactor bigdl.llm to ipex_llm (#24 )

2024-03-22 15:41:21 +08:00

bloom.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

chatglm.py

fix chatglm run error (#11045 )

2024-05-16 15:39:18 +08:00

chatglm2.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

chatglm2_32k.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

chatglm4.py

glm4 sdp (#11253 )

2024-06-07 15:42:23 +08:00

cohere.py

Fix should_use_fuse_rope error of Qwen1.5-MoE-A2.7B-Chat (#11216 )

2024-06-05 15:56:10 +08:00

decilm.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

falcon.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

gemma.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

gptbigcode.py

Fix Starcoder issue on CPU on transformers 4.36+ (#11190 )

2024-06-04 10:05:40 -07:00

gptj.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

gptneox.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

internlm.py

fix first token sdp with batch (#11153 )

2024-05-28 15:03:06 +08:00

llama.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

minicpm.py

quantized attention forward for minicpm (#11200 )

2024-06-05 09:15:25 +08:00

mistral.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

mixtral.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

mpt.py

LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )

2024-04-16 09:32:30 +08:00

phi.py

remove new_layout parameter (#10906 )

2024-04-29 10:31:50 +08:00

phi3.py

disable sdp_causal in phi-3 to fix overflow (#11157 )

2024-05-28 17:25:53 +08:00

phixtral.py

Disable fast fused rope on UHD (#10780 )

2024-04-18 10:03:53 +08:00

qwen.py

fix first token sdp with batch (#11153 )

2024-05-28 15:03:06 +08:00

qwen2.py

qwen2 sdpa small fix (#11261 )

2024-06-07 14:42:18 +08:00

qwen2_moe.py

Refactor qwen2 moe (#11244 )

2024-06-07 13:14:54 +08:00

qwen_vl.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

rwkv4.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

rwkv5.py

Divide core-xe packages (#11131 )

2024-05-28 12:00:18 +08:00

stablelm.py

refactor stablelm (#11195 )

2024-06-04 13:14:43 +08:00

starcoder2.py

add latest optimization in starcoder2 (#11236 )

2024-06-06 14:02:17 +08:00

utils.py

check devie name in use_flash_attention (#11263 )

2024-06-07 15:07:47 +08:00

yuan.py

refactor yuan2 (#11235 )

2024-06-06 13:17:54 +08:00