ipex-llm/python/llm/example/CPU/README.md
Guancheng Fu bf579507c2 Integrate vllm (#9310)
* done

* Rename structure

* add models

* Add structure/sampling_params,sequence

* add input_metadata

* add outputs

* Add policy,logger

* add and update

* add parallelconfig back

* core/scheduler.py

* Add llm_engine.py

* Add async_llm_engine.py

* Add tested entrypoint

* fix minor error

* Fix everything

* fix kv cache view

* fix

* fix

* fix

* format&refine

* remove logger from repo

* try to add token latency

* remove logger

* Refine config.py

* finish worker.py

* delete utils.py

* add license

* refine

* refine sequence.py

* remove sampling_params.py

* finish

* add license

* format

* add license

* refine

* refine

* Refine line too long

* remove exception

* so dumb style-check

* refine

* refine

* refine

* refine

* refine

* refine

* add README

* refine README

* add warning instead error

* fix padding

* add license

* format

* format

* format fix

* Refine vllm dependency (#1)

vllm dependency clear

* fix licence

* fix format

* fix format

* fix

* adapt LLM engine

* fix

* add license

* fix format

* fix

* Moving README.md to the correct position

* Fix readme.md

* done

* guide for adding models

* fix

* Fix README.md

* Add new model readme

* remove ray-logic

* refactor arg_utils.py

* remove distributed_init_method logic

* refactor entrypoints

* refactor input_metadata

* refactor model_loader

* refactor utils.py

* refactor models

* fix api server

* remove vllm.stucture

* revert by txy 1120

* remove utils

* format

* fix license

* add bigdl model

* Refer to a specfic commit

* Change code base

* add comments

* add async_llm_engine comment

* refine

* formatted

* add worker comments

* add comments

* add comments

* fix style

* add changes

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2023-11-23 16:46:45 +08:00

21 lines
1.1 KiB
Markdown

# BigDL-LLM Examples on Intel CPU
This folder contains examples of running BigDL-LLM on Intel CPU:
- [HF-Transformers-AutoModels](HF-Transformers-AutoModels): running any Hugging Face Transformers model on BigDL-LLM (using the standard AutoModel APIs)
- [PyTorch-Models](PyTorch-Models): running any PyTorch model on BigDL-LLM (with "one-line code change")
- [Native-Models](Native-Models): converting & running LLM in `llama`/`chatglm`/`bloom`/`gptneox`/`starcoder` model family using native (cpp) implementation
- [LangChain](LangChain): running LangChain applications on BigDL-LLM
- [Applications](Applications): running Transformers applications on BigDl-LLM
- [QLoRA-FineTuning](QLoRA-FineTuning): running QLoRA finetuning using BigDL-LLM on intel CPUs
- [vLLM-Serving](vLLM-Serving): running vLLM serving framework on Xeon Platforms (with BigDL-LLM low-bit optimized models)
## System Support
**Hardware**:
- Intel® Core™ processors
- Intel® Xeon® processors
**Operating System**:
- Ubuntu 20.04 or later
- CentOS 7 or later
- Windows 10/11, with or without WSL