ipex-llm

History

Ruonan Wang 3fe2ea3081 [NPU] Reuse prefill of acc lib for pipeline (#12279 ) * first commit * update example * fix style * update example * embedding as const * fix generate * code refactor * meet code review * fix style * change max_output_len to max_context_len * fix all-in-one * fix example * add check for new tokens		2024-10-28 16:05:49 +08:00
..
cli
ggml	Init NPU quantize method and support q8_0_rtn (#11452 )	2024-07-01 13:45:07 +08:00
gptq
langchain	Remove chatglm_C Module to Eliminate LGPL Dependency (#11178 )	2024-05-31 17:03:11 +08:00
llamaindex	Llamaindex: add tokenizer_id and support chat (#10590 )	2024-04-07 13:51:34 +08:00
serving	Support lightweight-serving glm-4v-9b (#11994 )	2024-09-05 09:25:08 +08:00
transformers	[NPU] Reuse prefill of acc lib for pipeline (#12279 )	2024-10-28 16:05:49 +08:00
utils	Add benchmark_util for `transformers >= 4.44.0` (#12171 )	2024-10-14 15:40:12 +08:00
vllm	Enable vllm multimodal minicpm-v-2-6 (#12074 )	2024-09-13 13:28:35 +08:00
__init__.py	IPEX Duplicate importer V2 (#11310 )	2024-06-19 16:29:19 +08:00
convert_model.py
format.sh
llm_patching.py	Upgrade Peft version to 0.10.0 for LLM finetune (#10886 )	2024-05-07 15:09:14 +08:00
models.py	Remove chatglm_C Module to Eliminate LGPL Dependency (#11178 )	2024-05-31 17:03:11 +08:00
optimize.py	support passing None to low_bit in optimize_model (#12121 )	2024-09-26 11:09:35 +08:00