Cengguang Zhang
26213a5829
LLM: Change benchmark bf16 load format. ( #9035 )
...
* LLM: Change benchmark bf16 load format.
* comment on bf16 chatglm.
* fix.
2023-09-22 17:38:38 +08:00
JinBridge
023555fb1f
LLM: Add one-click installer for Windows ( #8999 )
...
* LLM: init one-click installer for windows
* LLM: fix typo in one-click installer readme
* LLM: one-click installer try except logic
* LLM: one-click installer add dependency
* LLM: one-click installer adjust README.md
* LLM: one-click installer split README and add zip compress in setup.bat
* LLM: one-click installer verified internlm and llama2 and replace gif
* LLM: remove one-click installer images
* LLM: finetune the one-click installer README.md
* LLM: fix typo in one-click installer README.md
* LLM: rename one-click installer to protable executable
* LLM: rename other places to protable executable
* LLM: rename the zip filename to executable
* LLM: update .gitignore
* LLM: add colorama to setup.bat
2023-09-22 14:46:30 +08:00
Jiao Wang
028a6d9383
MPT model optimize for long sequence ( #9020 )
...
* mpt_long_seq
* update
* update
* update
* style
* style2
* update
2023-09-21 21:27:23 -07:00
Ruonan Wang
b943d73844
LLM: refactor kv cache ( #9030 )
...
* refactor utils
* meet code review; update all models
* small fix
2023-09-21 21:28:03 +08:00
Cengguang Zhang
868511cf02
LLM: fix kv cache issue of bloom and falcon. ( #9029 )
2023-09-21 18:12:20 +08:00
Ruonan Wang
bf51ec40b2
LLM: Fix empty cache ( #9024 )
...
* fix
* fix
* update example
2023-09-21 17:16:07 +08:00
Yina Chen
714884414e
fix error ( #9025 )
2023-09-21 16:42:11 +08:00
binbin Deng
edb225530b
add bark ( #9016 )
2023-09-21 12:24:58 +08:00
SONG Ge
fa47967583
[LLM] Optimize kv_cache for gptj model family ( #9010 )
...
* optimize gptj model family attention
* add license and comment for dolly-model
* remove xpu mentioned
* remove useless info
* code sytle
* style fix
* code style in gptj fix
* remove gptj arch
* move apply_rotary_pos_emb into utils
* kv_seq_length update
* use hidden_states instead of query layer to reach batch size
2023-09-21 10:42:08 +08:00
Cengguang Zhang
b3cad7de57
LLM: add bloom kv cache support ( #9012 )
...
* LLM: add bloom kv cache support
* fix style.
2023-09-20 21:10:53 +08:00
Kai Huang
156af15d1e
Add NF3 ( #9008 )
...
* add nf3
* grammar
2023-09-20 20:03:07 +08:00
Kai Huang
6981745fe4
Optimize kv_cache for gpt-neox model family ( #9015 )
...
* override gptneox
* style
* move to utils
* revert
2023-09-20 19:59:19 +08:00
JinBridge
48b503c630
LLM: add example of aquila ( #9006 )
...
* LLM: add example of aquila
* LLM: replace AquilaChat with Aquila
* LLM: shorten prompt of aquila example
2023-09-20 15:52:56 +08:00
Cengguang Zhang
735a17f7b4
LLM: add kv cache to falcon family. ( #8995 )
...
* add kv cache to falcon family.
* fix: import error.
* refactor
* update comments.
* add two version falcon attention forward.
* fix
* fix.
* fix.
* fix.
* fix style.
* fix style.
2023-09-20 15:36:30 +08:00
Ruonan Wang
94a7f8917b
LLM: fix optimized kv cache for baichuan-13b ( #9009 )
...
* fix baichuan 13b
* fix style
* fix
* fix style
2023-09-20 15:30:14 +08:00
Yang Wang
c88f6ec457
Experiment XPU QLora Finetuning ( #8937 )
...
* Support xpu finetuning
* support xpu finetuning
* fix style
* fix style
* fix style
* refine example
* add readme
* refine readme
* refine api
* fix fp16
* fix example
* refactor
* fix style
* fix compute type
* add qlora
* refine training args
* fix example
* fix style
* fast path forinference
* address comments
* refine readme
* revert lint
2023-09-19 10:15:44 -07:00
Jason Dai
51518e029d
Update llm readme ( #9005 )
2023-09-19 20:01:33 +08:00
Ruonan Wang
249386261c
LLM: add Baichuan2 cpu example ( #9002 )
...
* add baichuan2 cpu examples
* add link
* update prompt
2023-09-19 18:08:30 +08:00
Ruonan Wang
004c45c2be
LLM: Support optimized kv_cache for baichuan family ( #8997 )
...
* add initial support for baichuan attantion
* support baichuan1
* update based on comment
* update based on comment
* support baichuan2
* update link, change how to jusge baichuan2
* fix style
* add model parameter for pob emb
* update based on comment
2023-09-19 15:38:54 +08:00
Xin Qiu
37bb0cbf8f
Speed up gpt-j in gpubenchmark ( #9000 )
...
* Speedup gpt-j in gpubenchmark
* meet code review
2023-09-19 14:22:28 +08:00
Zhao Changmin
2a05581da7
LLM: Apply low_cpu_mem_usage algorithm on optimize_model API ( #8987 )
...
* low_cpu_mem_usage
2023-09-18 21:41:42 +08:00
Cengguang Zhang
8299b68fea
update readme. ( #8996 )
2023-09-18 17:06:15 +08:00
binbin Deng
c1d25a51a8
LLM: add optimize_model example for bert ( #8975 )
2023-09-18 16:18:35 +08:00
Cengguang Zhang
74338fd291
LLM: add auto torch dtype in benchmark. ( #8981 )
2023-09-18 15:48:25 +08:00
Ruonan Wang
cabe7c0358
LLM: add baichuan2 example for arc ( #8994 )
...
* add baichuan2 examples
* add link
* small fix
2023-09-18 14:32:27 +08:00
binbin Deng
0a552d5bdc
LLM: fix installation on windows ( #8989 )
2023-09-18 11:14:54 +08:00
Ruonan Wang
32716106e0
update use_cahce=True ( #8986 )
2023-09-18 07:59:33 +08:00
Xin Qiu
64ee1d7689
update run_transformer_int4_gpu ( #8983 )
...
* xpuperf
* update run.py
* clean upo
* uodate
* update
* meet code review
2023-09-15 15:10:04 +08:00
Zhao Changmin
16b9412e80
tie_word_embeddings ( #8977 )
...
tie_word_embeddings
2023-09-15 10:17:09 +08:00
JinBridge
c12b8f24b6
LLM: add use_cache=True for all gpu examples ( #8971 )
2023-09-15 09:54:38 +08:00
Guancheng Fu
d1b62ef2f2
[bigdl-llm] Remove serving-dep from all_requires ( #8980 )
...
* Remove serving-dep from all_requires
* pin fastchat version
2023-09-14 16:59:24 +08:00
Yishuo Wang
bcf456070c
fix bloom-176b int overflow ( #8973 )
2023-09-14 14:37:57 +08:00
Ruonan Wang
dd57623650
LLM: reduce GPU memory for optimize_model=True ( #8965 )
...
* reduce gpu memory for llama & chatglm
* change to device type
2023-09-13 17:27:09 +08:00
binbin Deng
be29c75c18
LLM: refactor gpu examples ( #8963 )
...
* restructure
* change to hf-transformers-models/
2023-09-13 14:47:47 +08:00
Cengguang Zhang
cca84b0a64
LLM: update llm benchmark scripts. ( #8943 )
...
* update llm benchmark scripts.
* change tranformer_bf16 to pytorch_autocast_bf16.
* add autocast in transformer int4.
* revert autocast.
* add "pytorch_autocast_bf16" to doc
* fix comments.
2023-09-13 12:23:28 +08:00
SONG Ge
7132ef6081
[LLM Doc] Add optimize_model doc in transformers api ( #8957 )
...
* add optimize in from_pretrained
* add api doc for load_low_bit
* update api docs following comments
* update api docs
* update
* reord comments
2023-09-13 10:42:33 +08:00
Zhao Changmin
c32c260ce2
LLM: Add save/load API in optimize_model to support general pytorch model ( #8956 )
...
* support hf format SL
2023-09-13 10:22:00 +08:00
Ruonan Wang
4de73f592e
LLM: add gpu example of chinese-llama-2-7b ( #8960 )
...
* add gpu example of chinese -llama2
* update model name and link
* update name
2023-09-13 10:16:51 +08:00
Guancheng Fu
0bf5857908
[LLM] Integrate FastChat as a serving framework for BigDL-LLM ( #8821 )
...
* Finish changing
* format
* add licence
* Add licence
* fix
* fix
* Add xpu support for fschat
* Fix patch
* Also install webui dependencies
* change setup.py dependency installs
* fiox
* format
* final test
2023-09-13 09:28:05 +08:00
Yuwen Hu
cb534ed5c4
[LLM] Add Arc demo gif to readme and readthedocs ( #8958 )
...
* Add arc demo in main readme
* Small style fix
* Realize using table
* Update based on comments
* Small update
* Try to solve with height problem
* Small fix
* Update demo for inner llm readme
* Update demo video for readthedocs
* Small fix
* Update based on comments
2023-09-13 09:23:52 +08:00
Zhao Changmin
dcaa4dc130
LLM: Support GQA on llama kvcache ( #8938 )
...
* support GQA
2023-09-12 12:18:40 +08:00
binbin Deng
2d81521019
LLM: add optimize_model examples for llama2 and chatglm ( #8894 )
...
* add llama2 and chatglm optimize_model examples
* update default usage
* update command and some descriptions
* move folder and remove general_int4 descriptions
* change folder name
2023-09-12 10:36:29 +08:00
Zhao Changmin
f00c442d40
fix accelerate ( #8946 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-12 09:27:58 +08:00
Yang Wang
16761c58be
Make llama attention stateless ( #8928 )
...
* Make llama attention stateless
* fix style
* fix chatglm
* fix chatglm xpu
2023-09-11 18:21:50 -07:00
Zhao Changmin
e62eda74b8
refine ( #8912 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-11 16:40:33 +08:00
Yina Chen
df165ad165
init ( #8933 )
2023-09-11 14:30:55 +08:00
Ruonan Wang
b3f5dd5b5d
LLM: update q8 convert xpu&cpu ( #8930 )
2023-09-08 16:01:17 +08:00
Yina Chen
33d75adadf
[LLM]Support q5_0 on arc ( #8926 )
...
* support q5_0
* delete
* fix style
2023-09-08 15:52:36 +08:00
Yuwen Hu
ca35c93825
[LLM] Fix langchain UT ( #8929 )
...
* Change dependency version for langchain uts
* Downgrade pandas version instead; and update example readme accordingly
2023-09-08 13:51:04 +08:00
Xin Qiu
ea0853c0b5
update benchmark_utils readme ( #8925 )
...
* update readme
* meet code review
2023-09-08 10:30:26 +08:00