binbin Deng
|
2b9c7d2a59
|
LLM: quick fix alpaca qlora finetuning script (#9534)
|
2023-11-27 11:04:27 +08:00 |
|
Yuwen Hu
|
11fa3de290
|
Add sutup support of win gpu for bigdl-llm (#9512)
|
2023-11-24 17:49:21 +08:00 |
|
Chen, Zhentao
|
45820cf3b9
|
add optimize model option (#9530)
|
2023-11-24 17:10:49 +08:00 |
|
binbin Deng
|
6bec0faea5
|
LLM: support Mistral AWQ models (#9520)
|
2023-11-24 16:20:22 +08:00 |
|
Ruonan Wang
|
914a5a5a27
|
LLM: fix abnormal Mistral GPU accuracy by updating rms_norm (#9529)
|
2023-11-24 15:37:50 +08:00 |
|
SONG Ge
|
3d24823cda
|
hot-fix mistral kv_cache (#9528)
|
2023-11-24 14:33:04 +08:00 |
|
Zhao Changmin
|
42b7a16bc5
|
Replace torch.bmm with safe_bmm (#9519)
* replace bmm with safe one
* rename args and deprecated warning
|
2023-11-24 12:16:48 +08:00 |
|
Jason Dai
|
b3178d449f
|
Update README.md (#9525)
|
2023-11-23 21:45:20 +08:00 |
|
Jason Dai
|
82898a4203
|
Update GPU example README (#9524)
|
2023-11-23 21:20:26 +08:00 |
|
Jason Dai
|
064848028f
|
Update README.md (#9523)
|
2023-11-23 21:16:21 +08:00 |
|
Ruonan Wang
|
b63aae8a8e
|
LLM: add flash attention support for llama (#9518)
* add initial flash attention for llama
* accelerate fp32 first token by changing to fp16 in advance
* support fp32
|
2023-11-23 18:40:18 +08:00 |
|
Guancheng Fu
|
bf579507c2
|
Integrate vllm (#9310)
* done
* Rename structure
* add models
* Add structure/sampling_params,sequence
* add input_metadata
* add outputs
* Add policy,logger
* add and update
* add parallelconfig back
* core/scheduler.py
* Add llm_engine.py
* Add async_llm_engine.py
* Add tested entrypoint
* fix minor error
* Fix everything
* fix kv cache view
* fix
* fix
* fix
* format&refine
* remove logger from repo
* try to add token latency
* remove logger
* Refine config.py
* finish worker.py
* delete utils.py
* add license
* refine
* refine sequence.py
* remove sampling_params.py
* finish
* add license
* format
* add license
* refine
* refine
* Refine line too long
* remove exception
* so dumb style-check
* refine
* refine
* refine
* refine
* refine
* refine
* add README
* refine README
* add warning instead error
* fix padding
* add license
* format
* format
* format fix
* Refine vllm dependency (#1)
vllm dependency clear
* fix licence
* fix format
* fix format
* fix
* adapt LLM engine
* fix
* add license
* fix format
* fix
* Moving README.md to the correct position
* Fix readme.md
* done
* guide for adding models
* fix
* Fix README.md
* Add new model readme
* remove ray-logic
* refactor arg_utils.py
* remove distributed_init_method logic
* refactor entrypoints
* refactor input_metadata
* refactor model_loader
* refactor utils.py
* refactor models
* fix api server
* remove vllm.stucture
* revert by txy 1120
* remove utils
* format
* fix license
* add bigdl model
* Refer to a specfic commit
* Change code base
* add comments
* add async_llm_engine comment
* refine
* formatted
* add worker comments
* add comments
* add comments
* fix style
* add changes
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
|
2023-11-23 16:46:45 +08:00 |
|
Heyang Sun
|
48fbb1eb94
|
support ccl (MPI) distributed mode in alpaca_qlora_finetuning_cpu (#9507)
|
2023-11-23 10:58:09 +08:00 |
|
Qiyuan Gong
|
0f0c6bb631
|
[LLM] Fix Qwen registered_causal_mask is None (#9513)
* Add registered_causal_mask init based on 2abd8e5777.
|
2023-11-23 09:28:04 +08:00 |
|
Heyang Sun
|
11fa5a8a0e
|
Fix QLoRA CPU dispatch_model issue about accelerate (#9506)
|
2023-11-23 08:41:25 +08:00 |
|
Heyang Sun
|
1453046938
|
install bigdl-llm in deepspeed cpu inference example (#9508)
|
2023-11-23 08:39:21 +08:00 |
|
binbin Deng
|
86743fb57b
|
LLM: fix transformers version in CPU finetuning example (#9511)
|
2023-11-22 15:53:07 +08:00 |
|
binbin Deng
|
1a2129221d
|
LLM: support resume from checkpoint in Alpaca QLoRA (#9502)
|
2023-11-22 13:49:14 +08:00 |
|
Ruonan Wang
|
139e98aa18
|
LLM: quick fix benchmark (#9509)
|
2023-11-22 10:19:57 +08:00 |
|
WeiguangHan
|
c2aeb4d1e8
|
del model after test (#9504)
|
2023-11-21 18:41:50 +08:00 |
|
Ruonan Wang
|
076d106ef5
|
LLM: GPU QLoRA update to bf16 to accelerate gradient checkpointing (#9499)
* update to bf16 to accelerate gradient checkpoint
* add utils and fix ut
|
2023-11-21 17:08:36 +08:00 |
|
Cheen Hau, 俊豪
|
3e39828420
|
Update all in one benchmark readme (#9496)
* Add gperftools install to all in one benchmark readme
* Update readme
|
2023-11-21 14:57:16 +08:00 |
|
Guancheng Fu
|
2b200bf2f2
|
Add vllm_worker related arguments in docker serving image's entrypoint (#9500)
* fix entrypoint
* fix missing long mode argument
|
2023-11-21 14:41:06 +08:00 |
|
Wang, Jian4
|
40ec9f7ead
|
Add qlora cpu docker manually build (#9501)
|
2023-11-21 14:39:16 +08:00 |
|
binbin Deng
|
b7ae572ac3
|
LLM: update Alpaca QLoRA finetuning example on GPU (#9492)
|
2023-11-21 14:22:19 +08:00 |
|
Lilac09
|
566ec85113
|
add stream interval option to entrypoint (#9498)
|
2023-11-21 09:47:32 +08:00 |
|
Wang, Jian4
|
c5cb3ab82e
|
LLM : Add CPU alpaca qlora example (#9469)
* init
* update xpu to cpu
* update
* update readme
* update example
* update
* add refer
* add guide to train different datasets
* update readme
* update
|
2023-11-21 09:19:58 +08:00 |
|
binbin Deng
|
96fd26759c
|
LLM: fix QLoRA finetuning example on CPU (#9489)
|
2023-11-20 14:31:24 +08:00 |
|
Xin Qiu
|
0f9a440b06
|
doc for Multi gpu selection (#9414)
|
2023-11-20 09:25:58 +08:00 |
|
Xin Qiu
|
50b01058f1
|
enable new q4_1 (#9479)
|
2023-11-17 14:58:57 +08:00 |
|
binbin Deng
|
3dac21ac7b
|
LLM: add more example usages about alpaca qlora on different hardware (#9458)
|
2023-11-17 09:56:43 +08:00 |
|
Heyang Sun
|
921b263d6a
|
update deepspeed install and run guide in README (#9441)
|
2023-11-17 09:11:39 +08:00 |
|
Zhao Changmin
|
30abd304a7
|
LLM: Fix baichuan pre-normalize model tensor assigning issue when loading (#9481)
* No need to normalized when loading
|
2023-11-16 21:57:28 +08:00 |
|
WeiguangHan
|
bc06bec90e
|
LLM: modify the script to generate html results more accurately (#9445)
* modify the script to generate html results more accurately
* resolve some comments
* revert some codes
|
2023-11-16 19:50:23 +08:00 |
|
Ruonan Wang
|
c0ef70df02
|
llm: quick fix of fast_rms_norm (#9480)
|
2023-11-16 14:42:16 +08:00 |
|
Yina Chen
|
d5263e6681
|
Add awq load support (#9453)
* Support directly loading GPTQ models from huggingface
* fix style
* fix tests
* change example structure
* address comments
* fix style
* init
* address comments
* add examples
* fix style
* fix style
* fix style
* fix style
* update
* remove
* meet comments
* fix style
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com>
|
2023-11-16 14:06:25 +08:00 |
|
Ruonan Wang
|
d2c064124a
|
LLM: update rms related usage to suport ipex 2.1 new api (#9466)
* update rms related usage
* fix style
|
2023-11-16 11:21:50 +08:00 |
|
Yuwen Hu
|
731b0aaade
|
Empty cache after embedding to cpu (#9477)
|
2023-11-16 10:52:30 +08:00 |
|
WeiguangHan
|
c487b53f21
|
LLM: only run arc perf test nightly (#9448)
* LLM: only run arc perf test nightly
* deleted unused python scripts
* rebase main
|
2023-11-15 19:38:14 +08:00 |
|
WeiguangHan
|
0d55bbd9f1
|
LLM: ajust the order of some models (#9470)
|
2023-11-15 17:04:59 +08:00 |
|
Lilac09
|
13f6eb77b4
|
Add exec bash to entrypoint.sh to keep container running after being booted. (#9471)
* add bigdl-llm-init
* boot bash
|
2023-11-15 16:09:16 +08:00 |
|
Xin Qiu
|
170e0072af
|
chatglm2 correctness test (#9450)
* chatglm2 ut
* some update
* chatglm2 path
* fix
* add print
|
2023-11-15 15:44:56 +08:00 |
|
Lilac09
|
24146d108f
|
add bigdl-llm-init (#9468)
|
2023-11-15 14:55:33 +08:00 |
|
Ruonan Wang
|
0f82b8c3a0
|
LLM: update qlora example (#9454)
* update qlora example
* fix loss=0
|
2023-11-15 09:24:15 +08:00 |
|
Chen, Zhentao
|
dbbdb53a18
|
fix multiple gpu usage (#9459)
|
2023-11-14 17:06:27 +08:00 |
|
Chen, Zhentao
|
d19ca21957
|
patch bigdl-llm model to harness by binding instead of patch file (#9420)
* add run_llb.py
* fix args interpret
* modify outputs
* update workflow
* add license
* test mixed 4 bit
* update readme
* use autotokenizer
* add timeout
* refactor workflow file
* fix working directory
* fix env
* throw exception if some jobs failed
* improve terminal outputs
* Disable var which cause the run stuck
* fix unknown precision
* fix key error
* directly output config instead
* rm harness submodule
|
2023-11-14 12:51:39 +08:00 |
|
Yang Wang
|
51d07a9fd8
|
Support directly loading gptq models from huggingface (#9391)
* Support directly loading GPTQ models from huggingface
* fix style
* fix tests
* change example structure
* address comments
* fix style
* address comments
|
2023-11-13 20:48:12 -08:00 |
|
Lilac09
|
b2b085550b
|
Remove bigdl-nano and add ipex into inference-cpu image (#9452)
* remove bigdl-nano and add ipex into inference-cpu image
* remove bigdl-nano in docker
* remove bigdl-nano in docker
|
2023-11-14 10:50:52 +08:00 |
|
Wang, Jian4
|
0f78ebe35e
|
LLM : Add qlora cpu finetune docker image (#9271)
* init qlora cpu docker image
* update
* remove ipex and update
* update
* update readme
* update example and readme
|
2023-11-14 10:36:53 +08:00 |
|
WeiguangHan
|
d109275333
|
temporarily disable the test of some models (#9434)
|
2023-11-13 18:50:53 +08:00 |
|