ipex-llm

History

Qiyuan Gong f4537798c1 Enable kv cache quantization by default for flex when 1 < batch <= 8 (#10584 ) * Enable kv cache quantization by default for flex when 1 < batch <= 8. * Change up bound from <8 to <=8.		2024-03-29 09:43:42 +08:00
..
cli	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
ggml	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
gptq	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
langchain	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
llamaindex	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
serving	Replace ipex with ipex-llm (#10554 )	2024-03-28 13:54:40 +08:00
transformers	Enable kv cache quantization by default for flex when 1 < batch <= 8 (#10584 )	2024-03-29 09:43:42 +08:00
utils	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
vllm	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
__init__.py	Update setup.py and add new actions and add compatible mode (#25 )	2024-03-22 15:44:59 +08:00
convert_model.py	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
format.sh	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
llm_patching.py	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
models.py	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
optimize.py	LLM: fix torch_dtype setting of apply fp16 optimization through optimize_model (#10556 )	2024-03-27 14:18:45 +08:00