Guancheng Fu bf579507c2 Integrate vllm (#9310 )

* done

* Rename structure

* add models

* Add structure/sampling_params,sequence

* add input_metadata

* add outputs

* Add policy,logger

* add and update

* add parallelconfig back

* core/scheduler.py

* Add llm_engine.py

* Add async_llm_engine.py

* Add tested entrypoint

* fix minor error

* Fix everything

* fix kv cache view

* fix

* fix

* fix

* format&refine

* remove logger from repo

* try to add token latency

* remove logger

* Refine config.py

* finish worker.py

* delete utils.py

* add license

* refine

* refine sequence.py

* remove sampling_params.py

* finish

* add license

* format

* add license

* refine

* refine

* Refine line too long

* remove exception

* so dumb style-check

* refine

* refine

* refine

* refine

* refine

* refine

* add README

* refine README

* add warning instead error

* fix padding

* add license

* format

* format

* format fix

* Refine vllm dependency (#1)

vllm dependency clear

* fix licence

* fix format

* fix format

* fix

* adapt LLM engine

* fix

* add license

* fix format

* fix

* Moving README.md to the correct position

* Fix readme.md

* done

* guide for adding models

* fix

* Fix README.md

* Add new model readme

* remove ray-logic

* refactor arg_utils.py

* remove distributed_init_method logic

* refactor entrypoints

* refactor input_metadata

* refactor model_loader

* refactor utils.py

* refactor models

* fix api server

* remove vllm.stucture

* revert by txy 1120

* remove utils

* format

* fix license

* add bigdl model

* Refer to a specfic commit

* Change code base

* add comments

* add async_llm_engine comment

* refine

* formatted

* add worker comments

* add comments

* add comments

* fix style

* add changes

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>

2023-11-23 16:46:45 +08:00

1.1 KiB

Raw Blame History

BigDL-LLM Examples on Intel CPU

This folder contains examples of running BigDL-LLM on Intel CPU:

HF-Transformers-AutoModels: running any Hugging Face Transformers model on BigDL-LLM (using the standard AutoModel APIs)
PyTorch-Models: running any PyTorch model on BigDL-LLM (with "one-line code change")
Native-Models: converting & running LLM in llama/chatglm/bloom/gptneox/starcoder model family using native (cpp) implementation
LangChain: running LangChain applications on BigDL-LLM
Applications: running Transformers applications on BigDl-LLM
QLoRA-FineTuning: running QLoRA finetuning using BigDL-LLM on intel CPUs
vLLM-Serving: running vLLM serving framework on Xeon Platforms (with BigDL-LLM low-bit optimized models)

System Support

Hardware:

Intel® Core™ processors
Intel® Xeon® processors

Operating System:

Ubuntu 20.04 or later
CentOS 7 or later
Windows 10/11, with or without WSL

1.1 KiB Raw Blame History

BigDL-LLM Examples on Intel CPU

System Support

1.1 KiB

Raw Blame History