ipex-llm/python/llm/example/CPU
Guancheng Fu bf579507c2 Integrate vllm (#9310)
* done

* Rename structure

* add models

* Add structure/sampling_params,sequence

* add input_metadata

* add outputs

* Add policy,logger

* add and update

* add parallelconfig back

* core/scheduler.py

* Add llm_engine.py

* Add async_llm_engine.py

* Add tested entrypoint

* fix minor error

* Fix everything

* fix kv cache view

* fix

* fix

* fix

* format&refine

* remove logger from repo

* try to add token latency

* remove logger

* Refine config.py

* finish worker.py

* delete utils.py

* add license

* refine

* refine sequence.py

* remove sampling_params.py

* finish

* add license

* format

* add license

* refine

* refine

* Refine line too long

* remove exception

* so dumb style-check

* refine

* refine

* refine

* refine

* refine

* refine

* add README

* refine README

* add warning instead error

* fix padding

* add license

* format

* format

* format fix

* Refine vllm dependency (#1)

vllm dependency clear

* fix licence

* fix format

* fix format

* fix

* adapt LLM engine

* fix

* add license

* fix format

* fix

* Moving README.md to the correct position

* Fix readme.md

* done

* guide for adding models

* fix

* Fix README.md

* Add new model readme

* remove ray-logic

* refactor arg_utils.py

* remove distributed_init_method logic

* refactor entrypoints

* refactor input_metadata

* refactor model_loader

* refactor utils.py

* refactor models

* fix api server

* remove vllm.stucture

* revert by txy 1120

* remove utils

* format

* fix license

* add bigdl model

* Refer to a specfic commit

* Change code base

* add comments

* add async_llm_engine comment

* refine

* formatted

* add worker comments

* add comments

* add comments

* fix style

* add changes

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2023-11-23 16:46:45 +08:00
..
Applications Minor Fixes of README (#9294) 2023-10-27 18:25:46 +08:00
Deepspeed-AutoTP install bigdl-llm in deepspeed cpu inference example (#9508) 2023-11-23 08:39:21 +08:00
HF-Transformers-AutoModels Add awq load support (#9453) 2023-11-16 14:06:25 +08:00
LangChain LLM: update example layout (#9046) 2023-10-09 15:36:39 +08:00
Native-Models LLM: update example layout (#9046) 2023-10-09 15:36:39 +08:00
PyTorch-Models Add examples for Yi-6B (#9421) 2023-11-13 10:53:15 +08:00
QLoRA-FineTuning support ccl (MPI) distributed mode in alpaca_qlora_finetuning_cpu (#9507) 2023-11-23 10:58:09 +08:00
vLLM-Serving Integrate vllm (#9310) 2023-11-23 16:46:45 +08:00
README.md Integrate vllm (#9310) 2023-11-23 16:46:45 +08:00

BigDL-LLM Examples on Intel CPU

This folder contains examples of running BigDL-LLM on Intel CPU:

  • HF-Transformers-AutoModels: running any Hugging Face Transformers model on BigDL-LLM (using the standard AutoModel APIs)
  • PyTorch-Models: running any PyTorch model on BigDL-LLM (with "one-line code change")
  • Native-Models: converting & running LLM in llama/chatglm/bloom/gptneox/starcoder model family using native (cpp) implementation
  • LangChain: running LangChain applications on BigDL-LLM
  • Applications: running Transformers applications on BigDl-LLM
  • QLoRA-FineTuning: running QLoRA finetuning using BigDL-LLM on intel CPUs
  • vLLM-Serving: running vLLM serving framework on Xeon Platforms (with BigDL-LLM low-bit optimized models)

System Support

Hardware:

  • Intel® Core™ processors
  • Intel® Xeon® processors

Operating System:

  • Ubuntu 20.04 or later
  • CentOS 7 or later
  • Windows 10/11, with or without WSL