ipex-llm/python/llm/src/bigdl/llm
Xin Qiu 30795bdfbc Gemma optimization: rms_norm, kv_cache, fused_rope, fused_rope+qkv (#10212)
* gemma optimization

* update

* update

* fix style

* meet code review
2024-02-23 10:07:24 +08:00
..
cli [LLM] fix chatglm main choice (#9073) 2023-09-28 11:23:37 +08:00
ggml LLM: add GGUF-IQ2 examples (#10207) 2024-02-22 14:18:45 +08:00
gptq gptq2ggml: support loading safetensors model. (#8401) 2023-06-27 11:19:33 +08:00
langchain LLM: modify transformersembeddings.embed() in langchain (#10051) 2024-02-05 10:42:10 +08:00
serving [Serving] Add vllm_worker to fastchat serving framework (#9934) 2024-01-18 21:33:36 +08:00
transformers Gemma optimization: rms_norm, kv_cache, fused_rope, fused_rope+qkv (#10212) 2024-02-23 10:07:24 +08:00
utils change xmx condition (#9896) 2024-01-12 19:51:48 +08:00
vllm add mistral and chatglm support to vllm (#9879) 2024-01-10 15:38:42 +08:00
__init__.py [LLM] IPEX auto importer set on by default (#9832) 2024-01-04 13:33:29 +08:00
convert_model.py LLM: add chatglm native int4 transformers API (#8695) 2023-08-07 17:52:47 +08:00
format.sh Integrate vllm (#9310) 2023-11-23 16:46:45 +08:00
models.py LLM: add chatglm native int4 transformers API (#8695) 2023-08-07 17:52:47 +08:00
optimize.py [LLM] Improve LLM doc regarding windows gpu related info (#9880) 2024-01-11 14:37:16 +08:00