ipex-llm/python/llm/src/ipex_llm/transformers/models
Xin Qiu dbc3c2d72d
glm4 sdp (#11253)
* glm4 sdp

* fix style

* update comment
2024-06-07 15:42:23 +08:00
..
__init__.py Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
aquila.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
baichuan.py Refactor baichuan1 7B and 13B (#11258) 2024-06-07 14:29:20 +08:00
bert.py Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
bloom.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
chatglm.py fix chatglm run error (#11045) 2024-05-16 15:39:18 +08:00
chatglm2.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
chatglm2_32k.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
chatglm4.py glm4 sdp (#11253) 2024-06-07 15:42:23 +08:00
cohere.py Fix should_use_fuse_rope error of Qwen1.5-MoE-A2.7B-Chat (#11216) 2024-06-05 15:56:10 +08:00
decilm.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
falcon.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
gemma.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
gptbigcode.py Fix Starcoder issue on CPU on transformers 4.36+ (#11190) 2024-06-04 10:05:40 -07:00
gptj.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
gptneox.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
internlm.py fix first token sdp with batch (#11153) 2024-05-28 15:03:06 +08:00
llama.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
minicpm.py quantized attention forward for minicpm (#11200) 2024-06-05 09:15:25 +08:00
mistral.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
mixtral.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
mpt.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
phi.py remove new_layout parameter (#10906) 2024-04-29 10:31:50 +08:00
phi3.py disable sdp_causal in phi-3 to fix overflow (#11157) 2024-05-28 17:25:53 +08:00
phixtral.py Disable fast fused rope on UHD (#10780) 2024-04-18 10:03:53 +08:00
qwen.py fix first token sdp with batch (#11153) 2024-05-28 15:03:06 +08:00
qwen2.py qwen2 sdpa small fix (#11261) 2024-06-07 14:42:18 +08:00
qwen2_moe.py Refactor qwen2 moe (#11244) 2024-06-07 13:14:54 +08:00
qwen_vl.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
rwkv4.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
rwkv5.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
stablelm.py refactor stablelm (#11195) 2024-06-04 13:14:43 +08:00
starcoder2.py add latest optimization in starcoder2 (#11236) 2024-06-06 14:02:17 +08:00
utils.py check devie name in use_flash_attention (#11263) 2024-06-07 15:07:47 +08:00
yuan.py refactor yuan2 (#11235) 2024-06-06 13:17:54 +08:00