Xiangyu Tian
|
b3f6faa038
|
LLM: Add CPU vLLM entrypoint (#11083)
Add CPU vLLM entrypoint and update CPU vLLM serving example.
|
2024-05-24 09:16:59 +08:00 |
|
Guancheng Fu
|
2c64754eb0
|
Add vLLM to ipex-llm serving image (#10807)
* add vllm
* done
* doc work
* fix done
* temp
* add docs
* format
* add start-fastchat-service.sh
* fix
|
2024-04-29 17:25:42 +08:00 |
|
Guancheng Fu
|
47bd5f504c
|
[vLLM]Remove vllm-v1, refactor v2 (#10842)
* remove vllm-v1
* fix format
|
2024-04-22 17:51:32 +08:00 |
|
Wang, Jian4
|
9df70d95eb
|
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
|
2024-03-22 15:41:21 +08:00 |
|
Guancheng Fu
|
2d930bdca8
|
Add vLLM bf16 support (#10278)
* add argument load_in_low_bit
* add docs
* modify gpu doc
* done
---------
Co-authored-by: ivy-lv11 <lvzc@lamda.nju.edu.cn>
|
2024-02-29 16:33:42 +08:00 |
|
Guancheng Fu
|
963a5c8d79
|
Add vLLM-XPU version's README/examples (#9536)
* test
* test
* fix last kv cache
* add xpu readme
* remove numactl for xpu example
* fix link error
* update max_num_batched_tokens logic
* add explaination
* add xpu environement version requirement
* refine gpu memory
* fix
* fix style
|
2023-11-28 09:44:03 +08:00 |
|