ipex-llm/python/llm/src/ipex_llm
Qiyuan Gong f4537798c1
Enable kv cache quantization by default for flex when 1 < batch <= 8 (#10584)
* Enable kv cache quantization by default for flex when 1 < batch <= 8.
* Change up bound from <8 to <=8.
2024-03-29 09:43:42 +08:00
..
cli Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
ggml Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
gptq Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
langchain Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
llamaindex Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
serving Replace ipex with ipex-llm (#10554) 2024-03-28 13:54:40 +08:00
transformers Enable kv cache quantization by default for flex when 1 < batch <= 8 (#10584) 2024-03-29 09:43:42 +08:00
utils Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
vllm Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
__init__.py Update setup.py and add new actions and add compatible mode (#25) 2024-03-22 15:44:59 +08:00
convert_model.py Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
format.sh Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
llm_patching.py Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
models.py Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
optimize.py LLM: fix torch_dtype setting of apply fp16 optimization through optimize_model (#10556) 2024-03-27 14:18:45 +08:00