Yuwen Hu
|
5f5ca38b74
|
[LLM Doc] Fix api doc rendering error (#9542)
* Fix api rendering error
* Fix python style
|
2023-11-29 09:17:09 +08:00 |
|
Yishuo Wang
|
a86c6e0b56
|
[LLM] support loading gguf model (#9544)
|
2023-11-28 15:51:15 +08:00 |
|
Xin Qiu
|
32b37f3af7
|
Update gpu install.md (#9541)
* Update install_gpu.md
* Update install_gpu.md
|
2023-11-28 11:15:03 +08:00 |
|
Xiangyu Tian
|
916c338772
|
fix bugs in vllm length check (#9543)
|
2023-11-28 11:09:54 +08:00 |
|
WeiguangHan
|
5098bc3544
|
LLM: enable previous models (#9505)
* enable previous models
* test mistral model
* for test
* run models separately
* test all models
* for test
* revert the llm_performance_test.yaml
|
2023-11-28 10:21:07 +08:00 |
|
Zhao Changmin
|
e7e0cd3b5e
|
CPU Pinned embedding Layer (#9538)
* CPU Pinned embedding
|
2023-11-28 09:46:31 +08:00 |
|
Guancheng Fu
|
963a5c8d79
|
Add vLLM-XPU version's README/examples (#9536)
* test
* test
* fix last kv cache
* add xpu readme
* remove numactl for xpu example
* fix link error
* update max_num_batched_tokens logic
* add explaination
* add xpu environement version requirement
* refine gpu memory
* fix
* fix style
|
2023-11-28 09:44:03 +08:00 |
|
Guancheng Fu
|
b6c3520748
|
Remove xformers from vLLM-CPU (#9535)
|
2023-11-27 11:21:25 +08:00 |
|
binbin Deng
|
2b9c7d2a59
|
LLM: quick fix alpaca qlora finetuning script (#9534)
|
2023-11-27 11:04:27 +08:00 |
|
Yuwen Hu
|
11fa3de290
|
Add sutup support of win gpu for bigdl-llm (#9512)
|
2023-11-24 17:49:21 +08:00 |
|
Chen, Zhentao
|
45820cf3b9
|
add optimize model option (#9530)
|
2023-11-24 17:10:49 +08:00 |
|
binbin Deng
|
6bec0faea5
|
LLM: support Mistral AWQ models (#9520)
|
2023-11-24 16:20:22 +08:00 |
|
Ruonan Wang
|
914a5a5a27
|
LLM: fix abnormal Mistral GPU accuracy by updating rms_norm (#9529)
|
2023-11-24 15:37:50 +08:00 |
|
SONG Ge
|
3d24823cda
|
hot-fix mistral kv_cache (#9528)
|
2023-11-24 14:33:04 +08:00 |
|
Zhao Changmin
|
42b7a16bc5
|
Replace torch.bmm with safe_bmm (#9519)
* replace bmm with safe one
* rename args and deprecated warning
|
2023-11-24 12:16:48 +08:00 |
|
Jason Dai
|
b3178d449f
|
Update README.md (#9525)
|
2023-11-23 21:45:20 +08:00 |
|
Jason Dai
|
82898a4203
|
Update GPU example README (#9524)
|
2023-11-23 21:20:26 +08:00 |
|
Jason Dai
|
064848028f
|
Update README.md (#9523)
|
2023-11-23 21:16:21 +08:00 |
|
Ruonan Wang
|
b63aae8a8e
|
LLM: add flash attention support for llama (#9518)
* add initial flash attention for llama
* accelerate fp32 first token by changing to fp16 in advance
* support fp32
|
2023-11-23 18:40:18 +08:00 |
|
Guancheng Fu
|
bf579507c2
|
Integrate vllm (#9310)
* done
* Rename structure
* add models
* Add structure/sampling_params,sequence
* add input_metadata
* add outputs
* Add policy,logger
* add and update
* add parallelconfig back
* core/scheduler.py
* Add llm_engine.py
* Add async_llm_engine.py
* Add tested entrypoint
* fix minor error
* Fix everything
* fix kv cache view
* fix
* fix
* fix
* format&refine
* remove logger from repo
* try to add token latency
* remove logger
* Refine config.py
* finish worker.py
* delete utils.py
* add license
* refine
* refine sequence.py
* remove sampling_params.py
* finish
* add license
* format
* add license
* refine
* refine
* Refine line too long
* remove exception
* so dumb style-check
* refine
* refine
* refine
* refine
* refine
* refine
* add README
* refine README
* add warning instead error
* fix padding
* add license
* format
* format
* format fix
* Refine vllm dependency (#1)
vllm dependency clear
* fix licence
* fix format
* fix format
* fix
* adapt LLM engine
* fix
* add license
* fix format
* fix
* Moving README.md to the correct position
* Fix readme.md
* done
* guide for adding models
* fix
* Fix README.md
* Add new model readme
* remove ray-logic
* refactor arg_utils.py
* remove distributed_init_method logic
* refactor entrypoints
* refactor input_metadata
* refactor model_loader
* refactor utils.py
* refactor models
* fix api server
* remove vllm.stucture
* revert by txy 1120
* remove utils
* format
* fix license
* add bigdl model
* Refer to a specfic commit
* Change code base
* add comments
* add async_llm_engine comment
* refine
* formatted
* add worker comments
* add comments
* add comments
* fix style
* add changes
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
|
2023-11-23 16:46:45 +08:00 |
|
Heyang Sun
|
48fbb1eb94
|
support ccl (MPI) distributed mode in alpaca_qlora_finetuning_cpu (#9507)
|
2023-11-23 10:58:09 +08:00 |
|
Qiyuan Gong
|
0f0c6bb631
|
[LLM] Fix Qwen registered_causal_mask is None (#9513)
* Add registered_causal_mask init based on 2abd8e5777.
|
2023-11-23 09:28:04 +08:00 |
|
Heyang Sun
|
11fa5a8a0e
|
Fix QLoRA CPU dispatch_model issue about accelerate (#9506)
|
2023-11-23 08:41:25 +08:00 |
|
Heyang Sun
|
1453046938
|
install bigdl-llm in deepspeed cpu inference example (#9508)
|
2023-11-23 08:39:21 +08:00 |
|
binbin Deng
|
86743fb57b
|
LLM: fix transformers version in CPU finetuning example (#9511)
|
2023-11-22 15:53:07 +08:00 |
|
binbin Deng
|
1a2129221d
|
LLM: support resume from checkpoint in Alpaca QLoRA (#9502)
|
2023-11-22 13:49:14 +08:00 |
|
Ruonan Wang
|
139e98aa18
|
LLM: quick fix benchmark (#9509)
|
2023-11-22 10:19:57 +08:00 |
|
WeiguangHan
|
c2aeb4d1e8
|
del model after test (#9504)
|
2023-11-21 18:41:50 +08:00 |
|
Ruonan Wang
|
076d106ef5
|
LLM: GPU QLoRA update to bf16 to accelerate gradient checkpointing (#9499)
* update to bf16 to accelerate gradient checkpoint
* add utils and fix ut
|
2023-11-21 17:08:36 +08:00 |
|
Cheen Hau, 俊豪
|
3e39828420
|
Update all in one benchmark readme (#9496)
* Add gperftools install to all in one benchmark readme
* Update readme
|
2023-11-21 14:57:16 +08:00 |
|
Guancheng Fu
|
2b200bf2f2
|
Add vllm_worker related arguments in docker serving image's entrypoint (#9500)
* fix entrypoint
* fix missing long mode argument
|
2023-11-21 14:41:06 +08:00 |
|
Wang, Jian4
|
40ec9f7ead
|
Add qlora cpu docker manually build (#9501)
|
2023-11-21 14:39:16 +08:00 |
|
binbin Deng
|
b7ae572ac3
|
LLM: update Alpaca QLoRA finetuning example on GPU (#9492)
|
2023-11-21 14:22:19 +08:00 |
|
Lilac09
|
566ec85113
|
add stream interval option to entrypoint (#9498)
|
2023-11-21 09:47:32 +08:00 |
|
Wang, Jian4
|
c5cb3ab82e
|
LLM : Add CPU alpaca qlora example (#9469)
* init
* update xpu to cpu
* update
* update readme
* update example
* update
* add refer
* add guide to train different datasets
* update readme
* update
|
2023-11-21 09:19:58 +08:00 |
|
binbin Deng
|
96fd26759c
|
LLM: fix QLoRA finetuning example on CPU (#9489)
|
2023-11-20 14:31:24 +08:00 |
|
Xin Qiu
|
0f9a440b06
|
doc for Multi gpu selection (#9414)
|
2023-11-20 09:25:58 +08:00 |
|
Xin Qiu
|
50b01058f1
|
enable new q4_1 (#9479)
|
2023-11-17 14:58:57 +08:00 |
|
binbin Deng
|
3dac21ac7b
|
LLM: add more example usages about alpaca qlora on different hardware (#9458)
|
2023-11-17 09:56:43 +08:00 |
|
Heyang Sun
|
921b263d6a
|
update deepspeed install and run guide in README (#9441)
|
2023-11-17 09:11:39 +08:00 |
|
Zhao Changmin
|
30abd304a7
|
LLM: Fix baichuan pre-normalize model tensor assigning issue when loading (#9481)
* No need to normalized when loading
|
2023-11-16 21:57:28 +08:00 |
|
WeiguangHan
|
bc06bec90e
|
LLM: modify the script to generate html results more accurately (#9445)
* modify the script to generate html results more accurately
* resolve some comments
* revert some codes
|
2023-11-16 19:50:23 +08:00 |
|
Ruonan Wang
|
c0ef70df02
|
llm: quick fix of fast_rms_norm (#9480)
|
2023-11-16 14:42:16 +08:00 |
|
Yina Chen
|
d5263e6681
|
Add awq load support (#9453)
* Support directly loading GPTQ models from huggingface
* fix style
* fix tests
* change example structure
* address comments
* fix style
* init
* address comments
* add examples
* fix style
* fix style
* fix style
* fix style
* update
* remove
* meet comments
* fix style
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com>
|
2023-11-16 14:06:25 +08:00 |
|
Ruonan Wang
|
d2c064124a
|
LLM: update rms related usage to suport ipex 2.1 new api (#9466)
* update rms related usage
* fix style
|
2023-11-16 11:21:50 +08:00 |
|
Yuwen Hu
|
731b0aaade
|
Empty cache after embedding to cpu (#9477)
|
2023-11-16 10:52:30 +08:00 |
|
WeiguangHan
|
c487b53f21
|
LLM: only run arc perf test nightly (#9448)
* LLM: only run arc perf test nightly
* deleted unused python scripts
* rebase main
|
2023-11-15 19:38:14 +08:00 |
|
WeiguangHan
|
0d55bbd9f1
|
LLM: ajust the order of some models (#9470)
|
2023-11-15 17:04:59 +08:00 |
|
Lilac09
|
13f6eb77b4
|
Add exec bash to entrypoint.sh to keep container running after being booted. (#9471)
* add bigdl-llm-init
* boot bash
|
2023-11-15 16:09:16 +08:00 |
|
Xin Qiu
|
170e0072af
|
chatglm2 correctness test (#9450)
* chatglm2 ut
* some update
* chatglm2 path
* fix
* add print
|
2023-11-15 15:44:56 +08:00 |
|