Commit graph

1454 commits

Author SHA1 Message Date
binbin Deng
5e9962b60e LLM: update example layout (#9046) 2023-10-09 15:36:39 +08:00
Yina Chen
4c4f8d1663 [LLM]Fix Arc falcon abnormal output issue (#9096)
* update

* update

* fix error & style

* fix style

* update train

* to input_seq_size
2023-10-09 15:09:37 +08:00
Zhao Changmin
548e4dd5fe LLM: Adapt transformers models for optimize model SL (#9022)
* LLM: Adapt transformers model for SL
2023-10-09 11:13:44 +08:00
Ruonan Wang
f64257a093 LLM: basic api support for esimd fp16 (#9067)
* basic api support for fp16

* fix style

* fix

* fix error and style

* fix style

* meet code review

* update based on comments
2023-10-09 11:05:17 +08:00
JIN Qiao
65373d2a8b LLM: adjust portable zip content (#9054)
* LLM: adjust portable zip content

* LLM: adjust portable zip README
2023-10-09 10:51:19 +08:00
Guancheng Fu
df8df751c4 Modify readme for bigdl-llm-serving-cpu (#9105) 2023-10-09 09:56:09 +08:00
Heyang Sun
2756f9c20d XPU QLoRA Container (#9082)
* XPU QLoRA Container

* fix apt issue

* refine
2023-10-08 11:04:20 +08:00
ZehuaCao
aad68100ae Add trusted-bigdl-llm-serving-tdx image. (#9093)
* add entrypoint in cpu serving

* kubernetes support for fastchat cpu serving

* Update Readme

* add image to manually_build action

* update manually_build.yml

* update README.md

* update manually_build.yaml

* update attestation_cli.py

* update manually_build.yml

* update Dockerfile

* rename

* update trusted-bigdl-llm-serving-tdx Dockerfile
2023-10-08 10:13:51 +08:00
Xin Qiu
b3e94a32d4 change log4error import (#9098) 2023-10-08 09:23:28 +08:00
Kai Huang
78ea7ddb1c Combine apply_rotary_pos_emb for gpt-neox (#9074) 2023-10-07 16:27:46 +08:00
Heyang Sun
0b40ef8261 separate trusted and native llm cpu finetune from lora (#9050)
* seperate trusted-llm and bigdl from lora finetuning

* add k8s for trusted llm finetune

* refine

* refine

* rename cpu to tdx in trusted llm

* solve conflict

* fix typo

* resolving conflict

* Delete docker/llm/finetune/lora/README.md

* fix

---------

Co-authored-by: Uxito-Ada <seusunheyang@foxmail.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2023-10-07 15:26:59 +08:00
ZehuaCao
b773d67dd4 Add Kubernetes support for BigDL-LLM-serving CPU. (#9071) 2023-10-07 09:37:48 +08:00
Yang Wang
36dd4afd61 Fix llama when rope scaling is not None (#9086)
* Fix llama when rope scaling is not None

* fix style

* fix style
2023-10-06 13:27:37 -07:00
Yang Wang
fcb1c618a0 using bigdl-llm fused rope for llama (#9066)
* optimize llama xpu rope

* fix bug

* fix style

* refine append cache

* remove check

* do not cache cos sin

* remove unnecessary changes

* clean up

* fix style

* check for training
2023-10-06 09:57:29 -07:00
Jason Dai
50044640c0 Update README.md (#9085) 2023-10-06 21:54:18 +08:00
Jiao Wang
aefa5a5bfe Qwen kv cache (#9079)
* qwen and aquila

* update

* update

* style
2023-10-05 11:59:17 -07:00
Jiao Wang
d5ca1f32b6 Aquila KV cache optimization (#9080)
* update

* update

* style
2023-10-05 11:10:57 -07:00
Jason Dai
7506100bd5 Update readme (#9084) 2023-10-05 16:54:09 +08:00
Yang Wang
88565c76f6 add export merged model example (#9018)
* add export merged model example

* add sources

* add script

* fix style
2023-10-04 21:18:52 -07:00
Yang Wang
0cd8f1c79c Use ipex fused rms norm for llama (#9081)
* also apply rmsnorm

* fix cpu
2023-10-04 21:04:55 -07:00
Cengguang Zhang
fb883100e7 LLM: support chatglm-18b convert attention forward in benchmark scripts. (#9072)
* add chatglm-18b convert.

* fix if statement.

* fix
2023-09-28 14:04:52 +08:00
Yishuo Wang
6de2189e90 [LLM] fix chatglm main choice (#9073) 2023-09-28 11:23:37 +08:00
binbin Deng
760183bac6 LLM: update key feature and installation page of document (#9068) 2023-09-27 15:44:34 +08:00
Lilac09
c91b2bd574 fix:modify indentation (#9070)
* modify Dockerfile

* add README.md

* add README.md

* Modify Dockerfile

* Add bigdl inference cpu image build

* Add bigdl llm cpu image build

* Add bigdl llm cpu image build

* Add bigdl llm cpu image build

* Modify Dockerfile

* Add bigdl inference cpu image build

* Add bigdl inference cpu image build

* Add bigdl llm xpu image build

* manually build

* recover file

* manually build

* recover file

* modify indentation
2023-09-27 14:53:52 +08:00
Cengguang Zhang
ad62c58b33 LLM: Enable jemalloc in benchmark scripts. (#9058)
* enable jemalloc.

* fix readme.
2023-09-26 15:37:49 +08:00
Lilac09
ecee02b34d Add bigdl llm xpu image build (#9062)
* modify Dockerfile

* add README.md

* add README.md

* Modify Dockerfile

* Add bigdl inference cpu image build

* Add bigdl llm cpu image build

* Add bigdl llm cpu image build

* Add bigdl llm cpu image build

* Modify Dockerfile

* Add bigdl inference cpu image build

* Add bigdl inference cpu image build

* Add bigdl llm xpu image build
2023-09-26 14:29:03 +08:00
Lilac09
9ac950fa52 Add bigdl llm cpu image build (#9047)
* modify Dockerfile

* add README.md

* add README.md

* Modify Dockerfile

* Add bigdl inference cpu image build

* Add bigdl llm cpu image build

* Add bigdl llm cpu image build

* Add bigdl llm cpu image build
2023-09-26 13:22:11 +08:00
Ziteng Zhang
a717352c59 Replace Llama 7b to Llama2-7b in README.md (#9055)
* Replace Llama 7b with Llama2-7b in README.md

Need to replace the base model to Llama2-7b as we are operating on Llama2 here.

* Replace Llama 7b to Llama2-7b in README.md

a llama 7b in the 1st line is missed

* Update architecture graph

---------

Co-authored-by: Heyang Sun <60865256+Uxito-Ada@users.noreply.github.com>
2023-09-26 09:56:46 +08:00
Guancheng Fu
cc84ed70b3 Create serving images (#9048)
* Finished & Tested

* Install latest pip from base images

* Add blank line

* Delete unused comment

* fix typos
2023-09-25 15:51:45 +08:00
Cengguang Zhang
b4a1266ef0 [WIP] LLM: add kv cache support for internlm. (#9036)
* LLM: add kv cache support for internlm

* add internlm apply_rotary_pos_emb

* fix.

* fix style.
2023-09-25 14:16:59 +08:00
Ruonan Wang
975da86e00 LLM: fix gptneox kv cache (#9044) 2023-09-25 13:03:57 +08:00
Heyang Sun
4b843d1dbf change lora-model output behavior on k8s (#9038)
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2023-09-25 09:28:44 +08:00
Cengguang Zhang
26213a5829 LLM: Change benchmark bf16 load format. (#9035)
* LLM: Change benchmark bf16 load format.

* comment on bf16 chatglm.

* fix.
2023-09-22 17:38:38 +08:00
JinBridge
023555fb1f LLM: Add one-click installer for Windows (#8999)
* LLM: init one-click installer for windows

* LLM: fix typo in one-click installer readme

* LLM: one-click installer try except logic

* LLM: one-click installer add dependency

* LLM: one-click installer adjust README.md

* LLM: one-click installer split README and add zip compress in setup.bat

* LLM: one-click installer verified internlm and llama2 and replace gif

* LLM: remove one-click installer images

* LLM: finetune the one-click installer README.md

* LLM: fix typo in one-click installer README.md

* LLM: rename one-click installer to protable executable

* LLM: rename other places to protable executable

* LLM: rename the zip filename to executable

* LLM: update .gitignore

* LLM: add colorama to setup.bat
2023-09-22 14:46:30 +08:00
Jiao Wang
028a6d9383 MPT model optimize for long sequence (#9020)
* mpt_long_seq

* update

* update

* update

* style

* style2

* update
2023-09-21 21:27:23 -07:00
Lilac09
9126abdf9b add README.md for bigdl-llm-cpu image (#9026)
* modify Dockerfile

* add README.md

* add README.md
2023-09-22 09:03:57 +08:00
Ruonan Wang
b943d73844 LLM: refactor kv cache (#9030)
* refactor utils

* meet code review; update all models

* small fix
2023-09-21 21:28:03 +08:00
Cengguang Zhang
868511cf02 LLM: fix kv cache issue of bloom and falcon. (#9029) 2023-09-21 18:12:20 +08:00
Ruonan Wang
bf51ec40b2 LLM: Fix empty cache (#9024)
* fix

* fix

* update example
2023-09-21 17:16:07 +08:00
Yina Chen
714884414e fix error (#9025) 2023-09-21 16:42:11 +08:00
binbin Deng
edb225530b add bark (#9016) 2023-09-21 12:24:58 +08:00
SONG Ge
fa47967583 [LLM] Optimize kv_cache for gptj model family (#9010)
* optimize gptj model family attention

* add license and comment for dolly-model

* remove xpu mentioned

* remove useless info

* code sytle

* style fix

* code style in gptj fix

* remove gptj arch

* move apply_rotary_pos_emb into utils

* kv_seq_length update

* use hidden_states instead of query layer to reach batch size
2023-09-21 10:42:08 +08:00
Guancheng Fu
3913ba4577 add README.md (#9004) 2023-09-21 10:32:56 +08:00
Cengguang Zhang
b3cad7de57 LLM: add bloom kv cache support (#9012)
* LLM: add bloom kv cache support

* fix style.
2023-09-20 21:10:53 +08:00
Kai Huang
156af15d1e Add NF3 (#9008)
* add nf3

* grammar
2023-09-20 20:03:07 +08:00
Kai Huang
6981745fe4 Optimize kv_cache for gpt-neox model family (#9015)
* override gptneox

* style

* move to utils

* revert
2023-09-20 19:59:19 +08:00
JinBridge
48b503c630 LLM: add example of aquila (#9006)
* LLM: add example of aquila

* LLM: replace AquilaChat with Aquila

* LLM: shorten prompt of aquila example
2023-09-20 15:52:56 +08:00
Cengguang Zhang
735a17f7b4 LLM: add kv cache to falcon family. (#8995)
* add kv cache to falcon family.

* fix: import error.

* refactor

* update comments.

* add two version falcon attention forward.

* fix

* fix.

* fix.

* fix.

* fix style.

* fix style.
2023-09-20 15:36:30 +08:00
Ruonan Wang
94a7f8917b LLM: fix optimized kv cache for baichuan-13b (#9009)
* fix baichuan 13b

* fix style

* fix

* fix style
2023-09-20 15:30:14 +08:00
Yang Wang
c88f6ec457 Experiment XPU QLora Finetuning (#8937)
* Support xpu finetuning

* support xpu finetuning

* fix style

* fix style

* fix style

* refine example

* add readme

* refine readme

* refine api

* fix fp16

* fix example

* refactor

* fix style

* fix compute type

* add qlora

* refine training args

* fix example

* fix style

* fast path forinference

* address comments

* refine readme

* revert lint
2023-09-19 10:15:44 -07:00