History

Guancheng Fu bf579507c2 Integrate vllm (#9310 ) * done * Rename structure * add models * Add structure/sampling_params,sequence * add input_metadata * add outputs * Add policy,logger * add and update * add parallelconfig back * core/scheduler.py * Add llm_engine.py * Add async_llm_engine.py * Add tested entrypoint * fix minor error * Fix everything * fix kv cache view * fix * fix * fix * format&refine * remove logger from repo * try to add token latency * remove logger * Refine config.py * finish worker.py * delete utils.py * add license * refine * refine sequence.py * remove sampling_params.py * finish * add license * format * add license * refine * refine * Refine line too long * remove exception * so dumb style-check * refine * refine * refine * refine * refine * refine * add README * refine README * add warning instead error * fix padding * add license * format * format * format fix * Refine vllm dependency (#1) vllm dependency clear * fix licence * fix format * fix format * fix * adapt LLM engine * fix * add license * fix format * fix * Moving README.md to the correct position * Fix readme.md * done * guide for adding models * fix * Fix README.md * Add new model readme * remove ray-logic * refactor arg_utils.py * remove distributed_init_method logic * refactor entrypoints * refactor input_metadata * refactor model_loader * refactor utils.py * refactor models * fix api server * remove vllm.stucture * revert by txy 1120 * remove utils * format * fix license * add bigdl model * Refer to a specfic commit * Change code base * add comments * add async_llm_engine comment * refine * formatted * add worker comments * add comments * add comments * fix style * add changes --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com> Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com> Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>		2023-11-23 16:46:45 +08:00
..
Applications	Minor Fixes of README (#9294 )	2023-10-27 18:25:46 +08:00
Deepspeed-AutoTP	install bigdl-llm in deepspeed cpu inference example (#9508 )	2023-11-23 08:39:21 +08:00
HF-Transformers-AutoModels	Add awq load support (#9453 )	2023-11-16 14:06:25 +08:00
LangChain	LLM: update example layout (#9046 )	2023-10-09 15:36:39 +08:00
Native-Models	LLM: update example layout (#9046 )	2023-10-09 15:36:39 +08:00
PyTorch-Models	Add examples for Yi-6B (#9421 )	2023-11-13 10:53:15 +08:00
QLoRA-FineTuning	support ccl (MPI) distributed mode in alpaca_qlora_finetuning_cpu (#9507 )	2023-11-23 10:58:09 +08:00
vLLM-Serving	Integrate vllm (#9310 )	2023-11-23 16:46:45 +08:00
README.md	Integrate vllm (#9310 )	2023-11-23 16:46:45 +08:00

README.md

BigDL-LLM Examples on Intel CPU

This folder contains examples of running BigDL-LLM on Intel CPU:

HF-Transformers-AutoModels: running any Hugging Face Transformers model on BigDL-LLM (using the standard AutoModel APIs)
PyTorch-Models: running any PyTorch model on BigDL-LLM (with "one-line code change")
Native-Models: converting & running LLM in llama/chatglm/bloom/gptneox/starcoder model family using native (cpp) implementation
LangChain: running LangChain applications on BigDL-LLM
Applications: running Transformers applications on BigDl-LLM
QLoRA-FineTuning: running QLoRA finetuning using BigDL-LLM on intel CPUs
vLLM-Serving: running vLLM serving framework on Xeon Platforms (with BigDL-LLM low-bit optimized models)

System Support

Hardware:

Intel® Core™ processors
Intel® Xeon® processors

Operating System:

Ubuntu 20.04 or later
CentOS 7 or later
Windows 10/11, with or without WSL