ipex-llm/python/llm/src/ipex_llm/transformers/models
2024-10-10 13:50:01 +08:00
..
__init__.py Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
aquila.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
baichuan.py add comment 2024-08-22 15:14:47 +08:00
bert.py Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
bloom.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
chatglm.py add glm_sdpa back to fix chatglm-6b (#11313) 2024-06-14 10:31:43 +08:00
chatglm2.py Support compress KV with quantize KV (#11812) 2024-08-19 15:32:32 +08:00
chatglm4.py Support compress KV with quantize KV (#11812) 2024-08-19 15:32:32 +08:00
chatglm4v.py Support pipeline parallel for glm-4v (#11545) 2024-07-11 16:06:06 +08:00
cohere.py Fix cohere model on transformers>=4.41 (#11575) 2024-07-17 17:18:59 -07:00
common.py optimize minicpm3 again (#12047) 2024-09-10 14:19:57 +08:00
decilm.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
falcon.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
gemma.py fix gemma for 4.41 (#11531) 2024-07-18 15:02:50 -07:00
gemma2.py optimize minicpm3 again (#12047) 2024-09-10 14:19:57 +08:00
gptbigcode.py Fix Starcoder issue on CPU on transformers 4.36+ (#11190) 2024-06-04 10:05:40 -07:00
gptj.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
gptneox.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
internlm.py fix internlm xcomposser stream chat (#11564) 2024-07-11 18:21:17 +08:00
internvl.py support internvl2-4b (#11718) 2024-08-06 13:36:32 +08:00
llama.py remove some useless code (#12035) 2024-09-06 17:51:08 +08:00
llama32.py optimize llama 3.2 rope (#12128) 2024-09-26 16:08:10 +08:00
minicpm.py Update compresskv model forward type logic (#11868) 2024-08-20 18:11:37 +08:00
minicpm3.py optimize minicpm3 kv cache (#12052) 2024-09-10 16:51:21 +08:00
minicpmv.py Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer (#11963) 2024-08-29 19:22:09 +08:00
mistral.py remove some useless code (#12035) 2024-09-06 17:51:08 +08:00
mixtral.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
mllama.py add basic llama 3.2 vision support (#12163) 2024-10-08 10:46:48 +08:00
mpt.py LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
phi.py remove new_layout parameter (#10906) 2024-04-29 10:31:50 +08:00
phi3.py use sdp_causal to reduce internvl2-4b memory usage if set environment variable (#11953) 2024-08-28 17:35:05 +08:00
phixtral.py Disable fast fused rope on UHD (#10780) 2024-04-18 10:03:53 +08:00
qwen.py fix first token sdp with batch (#11153) 2024-05-28 15:03:06 +08:00
qwen2.py Update compresskv model forward type logic (#11868) 2024-08-20 18:11:37 +08:00
qwen2_moe.py Add more qwen1.5 and qwen2 support for pipeline parallel inference (#11423) 2024-06-25 15:49:32 +08:00
qwen2_vl.py fix qwen2 vl again (#12174) 2024-10-10 13:50:01 +08:00
qwen_vl.py Support pipeline parallel for qwen-vl (#11503) 2024-07-04 18:03:57 +08:00
rwkv4.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
rwkv5.py Divide core-xe packages (#11131) 2024-05-28 12:00:18 +08:00
sd15.py fix sd1.5 (#12129) 2024-09-26 17:15:16 +08:00
stablelm.py use new fuse rope in stablelm family (#11497) 2024-07-03 11:08:26 +08:00
starcoder2.py add latest optimization in starcoder2 (#11236) 2024-06-06 14:02:17 +08:00
utils.py remove some useless code (#12035) 2024-09-06 17:51:08 +08:00
yuan.py refactor yuan2 (#11235) 2024-06-06 13:17:54 +08:00