Jiao Wang
d5ca1f32b6
Aquila KV cache optimization ( #9080 )
...
* update
* update
* style
2023-10-05 11:10:57 -07:00
Jason Dai
7506100bd5
Update readme ( #9084 )
2023-10-05 16:54:09 +08:00
Yang Wang
88565c76f6
add export merged model example ( #9018 )
...
* add export merged model example
* add sources
* add script
* fix style
2023-10-04 21:18:52 -07:00
Yang Wang
0cd8f1c79c
Use ipex fused rms norm for llama ( #9081 )
...
* also apply rmsnorm
* fix cpu
2023-10-04 21:04:55 -07:00
Cengguang Zhang
fb883100e7
LLM: support chatglm-18b convert attention forward in benchmark scripts. ( #9072 )
...
* add chatglm-18b convert.
* fix if statement.
* fix
2023-09-28 14:04:52 +08:00
Yishuo Wang
6de2189e90
[LLM] fix chatglm main choice ( #9073 )
2023-09-28 11:23:37 +08:00
binbin Deng
760183bac6
LLM: update key feature and installation page of document ( #9068 )
2023-09-27 15:44:34 +08:00
Lilac09
c91b2bd574
fix:modify indentation ( #9070 )
...
* modify Dockerfile
* add README.md
* add README.md
* Modify Dockerfile
* Add bigdl inference cpu image build
* Add bigdl llm cpu image build
* Add bigdl llm cpu image build
* Add bigdl llm cpu image build
* Modify Dockerfile
* Add bigdl inference cpu image build
* Add bigdl inference cpu image build
* Add bigdl llm xpu image build
* manually build
* recover file
* manually build
* recover file
* modify indentation
2023-09-27 14:53:52 +08:00
Cengguang Zhang
ad62c58b33
LLM: Enable jemalloc in benchmark scripts. ( #9058 )
...
* enable jemalloc.
* fix readme.
2023-09-26 15:37:49 +08:00
Lilac09
ecee02b34d
Add bigdl llm xpu image build ( #9062 )
...
* modify Dockerfile
* add README.md
* add README.md
* Modify Dockerfile
* Add bigdl inference cpu image build
* Add bigdl llm cpu image build
* Add bigdl llm cpu image build
* Add bigdl llm cpu image build
* Modify Dockerfile
* Add bigdl inference cpu image build
* Add bigdl inference cpu image build
* Add bigdl llm xpu image build
2023-09-26 14:29:03 +08:00
Lilac09
9ac950fa52
Add bigdl llm cpu image build ( #9047 )
...
* modify Dockerfile
* add README.md
* add README.md
* Modify Dockerfile
* Add bigdl inference cpu image build
* Add bigdl llm cpu image build
* Add bigdl llm cpu image build
* Add bigdl llm cpu image build
2023-09-26 13:22:11 +08:00
Ziteng Zhang
a717352c59
Replace Llama 7b to Llama2-7b in README.md ( #9055 )
...
* Replace Llama 7b with Llama2-7b in README.md
Need to replace the base model to Llama2-7b as we are operating on Llama2 here.
* Replace Llama 7b to Llama2-7b in README.md
a llama 7b in the 1st line is missed
* Update architecture graph
---------
Co-authored-by: Heyang Sun <60865256+Uxito-Ada@users.noreply.github.com>
2023-09-26 09:56:46 +08:00
Guancheng Fu
cc84ed70b3
Create serving images ( #9048 )
...
* Finished & Tested
* Install latest pip from base images
* Add blank line
* Delete unused comment
* fix typos
2023-09-25 15:51:45 +08:00
Cengguang Zhang
b4a1266ef0
[WIP] LLM: add kv cache support for internlm. ( #9036 )
...
* LLM: add kv cache support for internlm
* add internlm apply_rotary_pos_emb
* fix.
* fix style.
2023-09-25 14:16:59 +08:00
Ruonan Wang
975da86e00
LLM: fix gptneox kv cache ( #9044 )
2023-09-25 13:03:57 +08:00
Heyang Sun
4b843d1dbf
change lora-model output behavior on k8s ( #9038 )
...
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2023-09-25 09:28:44 +08:00
Cengguang Zhang
26213a5829
LLM: Change benchmark bf16 load format. ( #9035 )
...
* LLM: Change benchmark bf16 load format.
* comment on bf16 chatglm.
* fix.
2023-09-22 17:38:38 +08:00
JinBridge
023555fb1f
LLM: Add one-click installer for Windows ( #8999 )
...
* LLM: init one-click installer for windows
* LLM: fix typo in one-click installer readme
* LLM: one-click installer try except logic
* LLM: one-click installer add dependency
* LLM: one-click installer adjust README.md
* LLM: one-click installer split README and add zip compress in setup.bat
* LLM: one-click installer verified internlm and llama2 and replace gif
* LLM: remove one-click installer images
* LLM: finetune the one-click installer README.md
* LLM: fix typo in one-click installer README.md
* LLM: rename one-click installer to protable executable
* LLM: rename other places to protable executable
* LLM: rename the zip filename to executable
* LLM: update .gitignore
* LLM: add colorama to setup.bat
2023-09-22 14:46:30 +08:00
Jiao Wang
028a6d9383
MPT model optimize for long sequence ( #9020 )
...
* mpt_long_seq
* update
* update
* update
* style
* style2
* update
2023-09-21 21:27:23 -07:00
Lilac09
9126abdf9b
add README.md for bigdl-llm-cpu image ( #9026 )
...
* modify Dockerfile
* add README.md
* add README.md
2023-09-22 09:03:57 +08:00
Ruonan Wang
b943d73844
LLM: refactor kv cache ( #9030 )
...
* refactor utils
* meet code review; update all models
* small fix
2023-09-21 21:28:03 +08:00
Cengguang Zhang
868511cf02
LLM: fix kv cache issue of bloom and falcon. ( #9029 )
2023-09-21 18:12:20 +08:00
Ruonan Wang
bf51ec40b2
LLM: Fix empty cache ( #9024 )
...
* fix
* fix
* update example
2023-09-21 17:16:07 +08:00
Yina Chen
714884414e
fix error ( #9025 )
2023-09-21 16:42:11 +08:00
binbin Deng
edb225530b
add bark ( #9016 )
2023-09-21 12:24:58 +08:00
SONG Ge
fa47967583
[LLM] Optimize kv_cache for gptj model family ( #9010 )
...
* optimize gptj model family attention
* add license and comment for dolly-model
* remove xpu mentioned
* remove useless info
* code sytle
* style fix
* code style in gptj fix
* remove gptj arch
* move apply_rotary_pos_emb into utils
* kv_seq_length update
* use hidden_states instead of query layer to reach batch size
2023-09-21 10:42:08 +08:00
Guancheng Fu
3913ba4577
add README.md ( #9004 )
2023-09-21 10:32:56 +08:00
Cengguang Zhang
b3cad7de57
LLM: add bloom kv cache support ( #9012 )
...
* LLM: add bloom kv cache support
* fix style.
2023-09-20 21:10:53 +08:00
Kai Huang
156af15d1e
Add NF3 ( #9008 )
...
* add nf3
* grammar
2023-09-20 20:03:07 +08:00
Kai Huang
6981745fe4
Optimize kv_cache for gpt-neox model family ( #9015 )
...
* override gptneox
* style
* move to utils
* revert
2023-09-20 19:59:19 +08:00
JinBridge
48b503c630
LLM: add example of aquila ( #9006 )
...
* LLM: add example of aquila
* LLM: replace AquilaChat with Aquila
* LLM: shorten prompt of aquila example
2023-09-20 15:52:56 +08:00
Cengguang Zhang
735a17f7b4
LLM: add kv cache to falcon family. ( #8995 )
...
* add kv cache to falcon family.
* fix: import error.
* refactor
* update comments.
* add two version falcon attention forward.
* fix
* fix.
* fix.
* fix.
* fix style.
* fix style.
2023-09-20 15:36:30 +08:00
Ruonan Wang
94a7f8917b
LLM: fix optimized kv cache for baichuan-13b ( #9009 )
...
* fix baichuan 13b
* fix style
* fix
* fix style
2023-09-20 15:30:14 +08:00
Yang Wang
c88f6ec457
Experiment XPU QLora Finetuning ( #8937 )
...
* Support xpu finetuning
* support xpu finetuning
* fix style
* fix style
* fix style
* refine example
* add readme
* refine readme
* refine api
* fix fp16
* fix example
* refactor
* fix style
* fix compute type
* add qlora
* refine training args
* fix example
* fix style
* fast path forinference
* address comments
* refine readme
* revert lint
2023-09-19 10:15:44 -07:00
Jason Dai
51518e029d
Update llm readme ( #9005 )
2023-09-19 20:01:33 +08:00
Ruonan Wang
249386261c
LLM: add Baichuan2 cpu example ( #9002 )
...
* add baichuan2 cpu examples
* add link
* update prompt
2023-09-19 18:08:30 +08:00
Yuwen Hu
c389e1323d
fix xpu performance tests by making sure that latest bigdl-core-xe is installed ( #9001 )
2023-09-19 17:33:30 +08:00
Guancheng Fu
b6c9198d47
Add xpu image for bigdl-llm ( #9003 )
...
* Add xpu image
* fix
* fix
* fix format
2023-09-19 16:56:22 +08:00
Ruonan Wang
004c45c2be
LLM: Support optimized kv_cache for baichuan family ( #8997 )
...
* add initial support for baichuan attantion
* support baichuan1
* update based on comment
* update based on comment
* support baichuan2
* update link, change how to jusge baichuan2
* fix style
* add model parameter for pob emb
* update based on comment
2023-09-19 15:38:54 +08:00
Xin Qiu
37bb0cbf8f
Speed up gpt-j in gpubenchmark ( #9000 )
...
* Speedup gpt-j in gpubenchmark
* meet code review
2023-09-19 14:22:28 +08:00
Zhao Changmin
2a05581da7
LLM: Apply low_cpu_mem_usage algorithm on optimize_model API ( #8987 )
...
* low_cpu_mem_usage
2023-09-18 21:41:42 +08:00
Cengguang Zhang
8299b68fea
update readme. ( #8996 )
2023-09-18 17:06:15 +08:00
binbin Deng
c1d25a51a8
LLM: add optimize_model example for bert ( #8975 )
2023-09-18 16:18:35 +08:00
Cengguang Zhang
74338fd291
LLM: add auto torch dtype in benchmark. ( #8981 )
2023-09-18 15:48:25 +08:00
Ruonan Wang
cabe7c0358
LLM: add baichuan2 example for arc ( #8994 )
...
* add baichuan2 examples
* add link
* small fix
2023-09-18 14:32:27 +08:00
Guancheng Fu
7353882732
add Dockerfile ( #8993 )
2023-09-18 13:25:37 +08:00
binbin Deng
0a552d5bdc
LLM: fix installation on windows ( #8989 )
2023-09-18 11:14:54 +08:00
Xiangyu Tian
52878d3e5f
[PPML] Enable TLS in Attestation API Serving for LLM finetuning ( #8945 )
...
Add enableTLS flag to enable TLS in Attestation API Serving for LLM finetuning.
2023-09-18 09:32:25 +08:00
Ruonan Wang
32716106e0
update use_cahce=True ( #8986 )
2023-09-18 07:59:33 +08:00
Xin Qiu
64ee1d7689
update run_transformer_int4_gpu ( #8983 )
...
* xpuperf
* update run.py
* clean upo
* uodate
* update
* meet code review
2023-09-15 15:10:04 +08:00