Yang Wang
7160afd4d1
Support XPU DDP training and autocast for LowBitMatmul ( #9167 )
...
* support autocast in low bit matmul
* Support XPU DDP training
* fix amp
2023-10-16 20:47:19 -07:00
Ruonan Wang
77afb8796b
LLM: fix convert of chatglm ( #9190 )
2023-10-17 10:48:13 +08:00
dingbaorong
af3b575c7e
expose modules_to_not_convert in optimize_model ( #9180 )
...
* expose modules_to_not_convert in optimize_model
* some fixes
2023-10-17 09:50:26 +08:00
Cengguang Zhang
5ca8a851e9
LLM: add fuse optimization for Mistral. ( #9184 )
...
* add fuse optimization for mistral.
* fix.
* fix
* fix style.
* fix.
* fix error.
* fix style.
* fix style.
2023-10-16 16:50:31 +08:00
Jiao Wang
49e1381c7f
update rope ( #9155 )
2023-10-15 21:51:45 -07:00
Jason Dai
b192a8032c
Update llm-readme ( #9176 )
2023-10-16 10:54:52 +08:00
binbin Deng
a164c24746
LLM: add kv_cache optimization for chatglm2-6b-32k ( #9165 )
2023-10-16 10:43:15 +08:00
Yang Wang
7a2de00b48
Fixes for xpu Bf16 training ( #9156 )
...
* Support bf16 training
* Use a stable transformer version
* remove env
* fix style
2023-10-14 21:28:59 -07:00
Cengguang Zhang
51a133de56
LLM: add fuse rope and norm optimization for Baichuan. ( #9166 )
...
* add fuse rope optimization.
* add rms norm optimization.
2023-10-13 17:36:52 +08:00
Jin Qiao
db7f938fdc
LLM: add replit and starcoder to gpu pytorch model example ( #9154 )
2023-10-13 15:44:17 +08:00
Jin Qiao
797b156a0d
LLM: add dolly-v1 and dolly-v2 to gpu pytorch model example ( #9153 )
2023-10-13 15:43:35 +08:00
Yishuo Wang
259cbb4126
[LLM] add initial bigdl-llm-init ( #9150 )
2023-10-13 15:31:45 +08:00
Cengguang Zhang
433f408081
LLM: Add fuse rope and norm optimization for Aquila. ( #9161 )
...
* add fuse norm optimization.
* add fuse rope optimization
2023-10-13 14:18:37 +08:00
SONG Ge
e7aa67e141
[LLM] Add rope optimization for internlm ( #9159 )
...
* add rope and norm optimization for internlm and gptneox
* revert gptneox back and split with pr#9155 #
* add norm_forward
* style fix
* update
* update
2023-10-13 14:18:28 +08:00
Jin Qiao
f754ab3e60
LLM: add baichuan and baichuan2 to gpu pytorch model example ( #9152 )
2023-10-13 13:44:31 +08:00
Ruonan Wang
b8aee7bb1b
LLM: Fix Qwen kv_cache optimization ( #9148 )
...
* first commit
* ut pass
* accelerate rotate half by using common util function
* fix style
2023-10-12 15:49:42 +08:00
binbin Deng
69942d3826
LLM: fix model check before attention optimization ( #9149 )
2023-10-12 15:21:51 +08:00
JIN Qiao
1a1ddc4144
LLM: Add Replit CPU and GPU example ( #9028 )
2023-10-12 13:42:14 +08:00
JIN Qiao
d74834ff4c
LLM: add gpu pytorch-models example llama2 and chatglm2 ( #9142 )
2023-10-12 13:41:48 +08:00
Ruonan Wang
4f34557224
LLM: support num_beams in all-in-one benchmark ( #9141 )
...
* support num_beams
* fix
2023-10-12 13:35:12 +08:00
Ruonan Wang
62ac7ae444
LLM: fix inaccurate input / output tokens of current all-in-one benchmark ( #9137 )
...
* first fix
* fix all apis
* fix
2023-10-11 17:13:34 +08:00
binbin Deng
eb3fb18eb4
LLM: improve PyTorch API doc ( #9128 )
2023-10-11 15:03:39 +08:00
binbin Deng
995b0f119f
LLM: update some gpu examples ( #9136 )
2023-10-11 14:23:56 +08:00
Ruonan Wang
1c8d5da362
LLM: fix llama tokenizer for all-in-one benchmark ( #9129 )
...
* fix tokenizer for gpu benchmark
* fix ipex fp16
* meet code review
* fix
2023-10-11 13:39:39 +08:00
binbin Deng
2ad67a18b1
LLM: add mistral examples ( #9121 )
2023-10-11 13:38:15 +08:00
Ruonan Wang
1363e666fc
LLM: update benchmark_util.py for beam search ( #9126 )
...
* update reorder_cache
* fix
2023-10-11 09:41:53 +08:00
Guoqiong Song
e8c5645067
add LLM example of aquila on GPU ( #9056 )
...
* aquila, dolly-v1, dolly-v2, vacuna
2023-10-10 17:01:35 -07:00
Ruonan Wang
388f688ef3
LLM: update setup.py to add bigdl-core-xe package ( #9122 )
2023-10-10 15:02:48 +08:00
Zhao Changmin
1709beba5b
LLM: Explicitly close pickle file pointer before removing temporary directory ( #9120 )
...
* fp close
2023-10-10 14:57:23 +08:00
Yuwen Hu
0e09dd926b
[LLM] Fix example test ( #9118 )
...
* Update llm example test link due to example layout change
* Add better change detect
2023-10-10 13:24:18 +08:00
Ruonan Wang
ad7d9231f5
LLM: add benchmark script for Max gpu and ipex fp16 gpu ( #9112 )
...
* add pvc bash
* meet code review
* rename to run-max-gpu.sh
2023-10-10 10:18:41 +08:00
binbin Deng
e4d1457a70
LLM: improve transformers style API doc ( #9113 )
2023-10-10 09:31:00 +08:00
Yuwen Hu
65212451cc
[LLM] Small update to performance tests ( #9106 )
...
* small updates to llm performance tests regarding model handling
* Small fix
2023-10-09 16:55:25 +08:00
Zhao Changmin
edccfb2ed3
LLM: Check model device type ( #9092 )
...
* check model device
2023-10-09 15:49:15 +08:00
binbin Deng
5e9962b60e
LLM: update example layout ( #9046 )
2023-10-09 15:36:39 +08:00
Yina Chen
4c4f8d1663
[LLM]Fix Arc falcon abnormal output issue ( #9096 )
...
* update
* update
* fix error & style
* fix style
* update train
* to input_seq_size
2023-10-09 15:09:37 +08:00
Zhao Changmin
548e4dd5fe
LLM: Adapt transformers models for optimize model SL ( #9022 )
...
* LLM: Adapt transformers model for SL
2023-10-09 11:13:44 +08:00
Ruonan Wang
f64257a093
LLM: basic api support for esimd fp16 ( #9067 )
...
* basic api support for fp16
* fix style
* fix
* fix error and style
* fix style
* meet code review
* update based on comments
2023-10-09 11:05:17 +08:00
JIN Qiao
65373d2a8b
LLM: adjust portable zip content ( #9054 )
...
* LLM: adjust portable zip content
* LLM: adjust portable zip README
2023-10-09 10:51:19 +08:00
Xin Qiu
b3e94a32d4
change log4error import ( #9098 )
2023-10-08 09:23:28 +08:00
Kai Huang
78ea7ddb1c
Combine apply_rotary_pos_emb for gpt-neox ( #9074 )
2023-10-07 16:27:46 +08:00
Yang Wang
36dd4afd61
Fix llama when rope scaling is not None ( #9086 )
...
* Fix llama when rope scaling is not None
* fix style
* fix style
2023-10-06 13:27:37 -07:00
Yang Wang
fcb1c618a0
using bigdl-llm fused rope for llama ( #9066 )
...
* optimize llama xpu rope
* fix bug
* fix style
* refine append cache
* remove check
* do not cache cos sin
* remove unnecessary changes
* clean up
* fix style
* check for training
2023-10-06 09:57:29 -07:00
Jiao Wang
aefa5a5bfe
Qwen kv cache ( #9079 )
...
* qwen and aquila
* update
* update
* style
2023-10-05 11:59:17 -07:00
Jiao Wang
d5ca1f32b6
Aquila KV cache optimization ( #9080 )
...
* update
* update
* style
2023-10-05 11:10:57 -07:00
Yang Wang
88565c76f6
add export merged model example ( #9018 )
...
* add export merged model example
* add sources
* add script
* fix style
2023-10-04 21:18:52 -07:00
Yang Wang
0cd8f1c79c
Use ipex fused rms norm for llama ( #9081 )
...
* also apply rmsnorm
* fix cpu
2023-10-04 21:04:55 -07:00
Cengguang Zhang
fb883100e7
LLM: support chatglm-18b convert attention forward in benchmark scripts. ( #9072 )
...
* add chatglm-18b convert.
* fix if statement.
* fix
2023-09-28 14:04:52 +08:00
Yishuo Wang
6de2189e90
[LLM] fix chatglm main choice ( #9073 )
2023-09-28 11:23:37 +08:00
Cengguang Zhang
ad62c58b33
LLM: Enable jemalloc in benchmark scripts. ( #9058 )
...
* enable jemalloc.
* fix readme.
2023-09-26 15:37:49 +08:00
Cengguang Zhang
b4a1266ef0
[WIP] LLM: add kv cache support for internlm. ( #9036 )
...
* LLM: add kv cache support for internlm
* add internlm apply_rotary_pos_emb
* fix.
* fix style.
2023-09-25 14:16:59 +08:00
Ruonan Wang
975da86e00
LLM: fix gptneox kv cache ( #9044 )
2023-09-25 13:03:57 +08:00
Cengguang Zhang
26213a5829
LLM: Change benchmark bf16 load format. ( #9035 )
...
* LLM: Change benchmark bf16 load format.
* comment on bf16 chatglm.
* fix.
2023-09-22 17:38:38 +08:00
JinBridge
023555fb1f
LLM: Add one-click installer for Windows ( #8999 )
...
* LLM: init one-click installer for windows
* LLM: fix typo in one-click installer readme
* LLM: one-click installer try except logic
* LLM: one-click installer add dependency
* LLM: one-click installer adjust README.md
* LLM: one-click installer split README and add zip compress in setup.bat
* LLM: one-click installer verified internlm and llama2 and replace gif
* LLM: remove one-click installer images
* LLM: finetune the one-click installer README.md
* LLM: fix typo in one-click installer README.md
* LLM: rename one-click installer to protable executable
* LLM: rename other places to protable executable
* LLM: rename the zip filename to executable
* LLM: update .gitignore
* LLM: add colorama to setup.bat
2023-09-22 14:46:30 +08:00
Jiao Wang
028a6d9383
MPT model optimize for long sequence ( #9020 )
...
* mpt_long_seq
* update
* update
* update
* style
* style2
* update
2023-09-21 21:27:23 -07:00
Ruonan Wang
b943d73844
LLM: refactor kv cache ( #9030 )
...
* refactor utils
* meet code review; update all models
* small fix
2023-09-21 21:28:03 +08:00
Cengguang Zhang
868511cf02
LLM: fix kv cache issue of bloom and falcon. ( #9029 )
2023-09-21 18:12:20 +08:00
Ruonan Wang
bf51ec40b2
LLM: Fix empty cache ( #9024 )
...
* fix
* fix
* update example
2023-09-21 17:16:07 +08:00
Yina Chen
714884414e
fix error ( #9025 )
2023-09-21 16:42:11 +08:00
binbin Deng
edb225530b
add bark ( #9016 )
2023-09-21 12:24:58 +08:00
SONG Ge
fa47967583
[LLM] Optimize kv_cache for gptj model family ( #9010 )
...
* optimize gptj model family attention
* add license and comment for dolly-model
* remove xpu mentioned
* remove useless info
* code sytle
* style fix
* code style in gptj fix
* remove gptj arch
* move apply_rotary_pos_emb into utils
* kv_seq_length update
* use hidden_states instead of query layer to reach batch size
2023-09-21 10:42:08 +08:00
Cengguang Zhang
b3cad7de57
LLM: add bloom kv cache support ( #9012 )
...
* LLM: add bloom kv cache support
* fix style.
2023-09-20 21:10:53 +08:00
Kai Huang
156af15d1e
Add NF3 ( #9008 )
...
* add nf3
* grammar
2023-09-20 20:03:07 +08:00
Kai Huang
6981745fe4
Optimize kv_cache for gpt-neox model family ( #9015 )
...
* override gptneox
* style
* move to utils
* revert
2023-09-20 19:59:19 +08:00
JinBridge
48b503c630
LLM: add example of aquila ( #9006 )
...
* LLM: add example of aquila
* LLM: replace AquilaChat with Aquila
* LLM: shorten prompt of aquila example
2023-09-20 15:52:56 +08:00
Cengguang Zhang
735a17f7b4
LLM: add kv cache to falcon family. ( #8995 )
...
* add kv cache to falcon family.
* fix: import error.
* refactor
* update comments.
* add two version falcon attention forward.
* fix
* fix.
* fix.
* fix.
* fix style.
* fix style.
2023-09-20 15:36:30 +08:00
Ruonan Wang
94a7f8917b
LLM: fix optimized kv cache for baichuan-13b ( #9009 )
...
* fix baichuan 13b
* fix style
* fix
* fix style
2023-09-20 15:30:14 +08:00
Yang Wang
c88f6ec457
Experiment XPU QLora Finetuning ( #8937 )
...
* Support xpu finetuning
* support xpu finetuning
* fix style
* fix style
* fix style
* refine example
* add readme
* refine readme
* refine api
* fix fp16
* fix example
* refactor
* fix style
* fix compute type
* add qlora
* refine training args
* fix example
* fix style
* fast path forinference
* address comments
* refine readme
* revert lint
2023-09-19 10:15:44 -07:00
Jason Dai
51518e029d
Update llm readme ( #9005 )
2023-09-19 20:01:33 +08:00
Ruonan Wang
249386261c
LLM: add Baichuan2 cpu example ( #9002 )
...
* add baichuan2 cpu examples
* add link
* update prompt
2023-09-19 18:08:30 +08:00
Ruonan Wang
004c45c2be
LLM: Support optimized kv_cache for baichuan family ( #8997 )
...
* add initial support for baichuan attantion
* support baichuan1
* update based on comment
* update based on comment
* support baichuan2
* update link, change how to jusge baichuan2
* fix style
* add model parameter for pob emb
* update based on comment
2023-09-19 15:38:54 +08:00
Xin Qiu
37bb0cbf8f
Speed up gpt-j in gpubenchmark ( #9000 )
...
* Speedup gpt-j in gpubenchmark
* meet code review
2023-09-19 14:22:28 +08:00
Zhao Changmin
2a05581da7
LLM: Apply low_cpu_mem_usage algorithm on optimize_model API ( #8987 )
...
* low_cpu_mem_usage
2023-09-18 21:41:42 +08:00
Cengguang Zhang
8299b68fea
update readme. ( #8996 )
2023-09-18 17:06:15 +08:00
binbin Deng
c1d25a51a8
LLM: add optimize_model example for bert ( #8975 )
2023-09-18 16:18:35 +08:00
Cengguang Zhang
74338fd291
LLM: add auto torch dtype in benchmark. ( #8981 )
2023-09-18 15:48:25 +08:00
Ruonan Wang
cabe7c0358
LLM: add baichuan2 example for arc ( #8994 )
...
* add baichuan2 examples
* add link
* small fix
2023-09-18 14:32:27 +08:00
binbin Deng
0a552d5bdc
LLM: fix installation on windows ( #8989 )
2023-09-18 11:14:54 +08:00
Ruonan Wang
32716106e0
update use_cahce=True ( #8986 )
2023-09-18 07:59:33 +08:00
Xin Qiu
64ee1d7689
update run_transformer_int4_gpu ( #8983 )
...
* xpuperf
* update run.py
* clean upo
* uodate
* update
* meet code review
2023-09-15 15:10:04 +08:00
Zhao Changmin
16b9412e80
tie_word_embeddings ( #8977 )
...
tie_word_embeddings
2023-09-15 10:17:09 +08:00
JinBridge
c12b8f24b6
LLM: add use_cache=True for all gpu examples ( #8971 )
2023-09-15 09:54:38 +08:00
Guancheng Fu
d1b62ef2f2
[bigdl-llm] Remove serving-dep from all_requires ( #8980 )
...
* Remove serving-dep from all_requires
* pin fastchat version
2023-09-14 16:59:24 +08:00
Yishuo Wang
bcf456070c
fix bloom-176b int overflow ( #8973 )
2023-09-14 14:37:57 +08:00
Ruonan Wang
dd57623650
LLM: reduce GPU memory for optimize_model=True ( #8965 )
...
* reduce gpu memory for llama & chatglm
* change to device type
2023-09-13 17:27:09 +08:00
binbin Deng
be29c75c18
LLM: refactor gpu examples ( #8963 )
...
* restructure
* change to hf-transformers-models/
2023-09-13 14:47:47 +08:00
Cengguang Zhang
cca84b0a64
LLM: update llm benchmark scripts. ( #8943 )
...
* update llm benchmark scripts.
* change tranformer_bf16 to pytorch_autocast_bf16.
* add autocast in transformer int4.
* revert autocast.
* add "pytorch_autocast_bf16" to doc
* fix comments.
2023-09-13 12:23:28 +08:00
SONG Ge
7132ef6081
[LLM Doc] Add optimize_model doc in transformers api ( #8957 )
...
* add optimize in from_pretrained
* add api doc for load_low_bit
* update api docs following comments
* update api docs
* update
* reord comments
2023-09-13 10:42:33 +08:00
Zhao Changmin
c32c260ce2
LLM: Add save/load API in optimize_model to support general pytorch model ( #8956 )
...
* support hf format SL
2023-09-13 10:22:00 +08:00
Ruonan Wang
4de73f592e
LLM: add gpu example of chinese-llama-2-7b ( #8960 )
...
* add gpu example of chinese -llama2
* update model name and link
* update name
2023-09-13 10:16:51 +08:00
Guancheng Fu
0bf5857908
[LLM] Integrate FastChat as a serving framework for BigDL-LLM ( #8821 )
...
* Finish changing
* format
* add licence
* Add licence
* fix
* fix
* Add xpu support for fschat
* Fix patch
* Also install webui dependencies
* change setup.py dependency installs
* fiox
* format
* final test
2023-09-13 09:28:05 +08:00
Yuwen Hu
cb534ed5c4
[LLM] Add Arc demo gif to readme and readthedocs ( #8958 )
...
* Add arc demo in main readme
* Small style fix
* Realize using table
* Update based on comments
* Small update
* Try to solve with height problem
* Small fix
* Update demo for inner llm readme
* Update demo video for readthedocs
* Small fix
* Update based on comments
2023-09-13 09:23:52 +08:00
Zhao Changmin
dcaa4dc130
LLM: Support GQA on llama kvcache ( #8938 )
...
* support GQA
2023-09-12 12:18:40 +08:00
binbin Deng
2d81521019
LLM: add optimize_model examples for llama2 and chatglm ( #8894 )
...
* add llama2 and chatglm optimize_model examples
* update default usage
* update command and some descriptions
* move folder and remove general_int4 descriptions
* change folder name
2023-09-12 10:36:29 +08:00
Zhao Changmin
f00c442d40
fix accelerate ( #8946 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-12 09:27:58 +08:00
Yang Wang
16761c58be
Make llama attention stateless ( #8928 )
...
* Make llama attention stateless
* fix style
* fix chatglm
* fix chatglm xpu
2023-09-11 18:21:50 -07:00
Zhao Changmin
e62eda74b8
refine ( #8912 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-11 16:40:33 +08:00
Yina Chen
df165ad165
init ( #8933 )
2023-09-11 14:30:55 +08:00
Ruonan Wang
b3f5dd5b5d
LLM: update q8 convert xpu&cpu ( #8930 )
2023-09-08 16:01:17 +08:00
Yina Chen
33d75adadf
[LLM]Support q5_0 on arc ( #8926 )
...
* support q5_0
* delete
* fix style
2023-09-08 15:52:36 +08:00
Yuwen Hu
ca35c93825
[LLM] Fix langchain UT ( #8929 )
...
* Change dependency version for langchain uts
* Downgrade pandas version instead; and update example readme accordingly
2023-09-08 13:51:04 +08:00
Xin Qiu
ea0853c0b5
update benchmark_utils readme ( #8925 )
...
* update readme
* meet code review
2023-09-08 10:30:26 +08:00
Yang Wang
ee98cdd85c
Support latest transformer version ( #8923 )
...
* Support latest transformer version
* fix style
2023-09-07 19:01:32 -07:00
Yang Wang
25428b22b4
Fix chatglm2 attention and kv cache ( #8924 )
...
* fix chatglm2 attention
* fix bf16 bug
* make model stateless
* add utils
* cleanup
* fix style
2023-09-07 18:54:29 -07:00
Yina Chen
b209b8f7b6
[LLM] Fix arc qtype != q4_0 generate issue ( #8920 )
...
* Fix arc precision!=q4_0 generate issue
* meet comments
2023-09-07 08:56:36 -07:00
Cengguang Zhang
3d2efe9608
LLM: update llm latency benchmark. ( #8922 )
2023-09-07 19:00:19 +08:00
binbin Deng
7897eb4b51
LLM: add benchmark scripts on GPU ( #8916 )
2023-09-07 18:08:17 +08:00
Xin Qiu
d8a01d7c4f
fix chatglm in run.pu ( #8919 )
2023-09-07 16:44:10 +08:00
Xin Qiu
e9de9d9950
benchmark for native int4 ( #8918 )
...
* native4
* update
* update
* update
2023-09-07 15:56:15 +08:00
Ruonan Wang
c0797ea232
LLM: update setup to specify bigdl-core-xe version ( #8913 )
2023-09-07 15:11:55 +08:00
Ruonan Wang
057e77e229
LLM: update benchmark_utils.py to handle do_sample=True ( #8903 )
2023-09-07 14:20:47 +08:00
Yang Wang
c34400e6b0
Use new layout for xpu qlinear ( #8896 )
...
* use new layout for xpu qlinear
* fix style
2023-09-06 21:55:33 -07:00
Zhao Changmin
8bc1d8a17c
LLM: Fix discards in optimize_model with non-hf models and add openai whisper example ( #8877 )
...
* openai-whisper
2023-09-07 10:35:59 +08:00
Xin Qiu
5d9942a3ca
transformer int4 and native int4's benchmark script for 32 256 1k 2k input ( #8871 )
...
* transformer
* move
* update
* add header
* update all-in-one
* clean up
2023-09-07 09:49:55 +08:00
Yina Chen
bfc71fbc15
Add known issue in arc voice assistant example ( #8902 )
...
* add known issue in voice assistant example
* update cpu
2023-09-07 09:28:26 +08:00
Yuwen Hu
db26c7b84d
[LLM] Update readme gif & image url to the ones hosted on readthedocs ( #8900 )
2023-09-06 20:04:17 +08:00
SONG Ge
7a71ced78f
[LLM Docs] Remain API Docs Issues Solution ( #8780 )
...
* langchain readthedocs update
* solve langchain.llms.transformersllm issues
* langchain.embeddings.transformersembeddings/transfortmersllms issues
* update docs for get_num_tokens
* add low_bit api doc
* add optimizer model api doc
* update rst index
* fix coomments style
* update docs following the comments
* update api doc
2023-09-06 16:29:34 +08:00
Xin Qiu
49a39452c6
update benchmark ( #8899 )
2023-09-06 15:11:43 +08:00
Kai Huang
4a9ff050a1
Add qlora nf4 ( #8782 )
...
* add nf4
* dequant nf4
* style
2023-09-06 09:39:22 +08:00
xingyuan li
704a896e90
[LLM] Add perf test on xpu for bigdl-llm ( #8866 )
...
* add xpu latency job
* update install way
* remove duplicated workflow
* add perf upload
2023-09-05 17:36:24 +09:00
Zhao Changmin
95271f10e0
LLM: Rename low bit layer ( #8875 )
...
* rename lowbit
---------
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-05 13:21:12 +08:00
Yina Chen
74a2c2ddf5
Update optimize_model=True in llama2 chatglm2 arc examples ( #8878 )
...
* add optimize_model=True in llama2 chatglm2 examples
* add ipex optimize in gpt-j example
2023-09-05 10:35:37 +08:00
Jason Dai
5e58f698cd
Update readthedocs ( #8882 )
2023-09-04 15:42:16 +08:00
Song Jiaming
7b3ac66e17
[LLM] auto performance test fix specific settings to template ( #8876 )
2023-09-01 15:49:04 +08:00
Yang Wang
242c9d6036
Fix chatglm2 multi-turn streamchat ( #8867 )
2023-08-31 22:13:49 -07:00
Song Jiaming
c06f1ca93e
[LLM] auto perf test to output to csv ( #8846 )
2023-09-01 10:48:00 +08:00
Zhao Changmin
9c652fbe95
LLM: Whisper long segment recognize example ( #8826 )
...
* LLM: Long segment recognize example
2023-08-31 16:41:25 +08:00
Yishuo Wang
a232c5aa21
[LLM] add protobuf in bigdl-llm dependency ( #8861 )
2023-08-31 15:23:31 +08:00
xingyuan li
de6c6bb17f
[LLM] Downgrade amx build gcc version and remove avx flag display ( #8856 )
...
* downgrade to gcc 11
* remove avx display
2023-08-31 14:08:13 +09:00
Yang Wang
3b4f4e1c3d
Fix llama attention optimization for XPU ( #8855 )
...
* Fix llama attention optimization fo XPU
* fix chatglm2
* fix typo
2023-08-30 21:30:49 -07:00
Shengsheng Huang
7b566bf686
[LLM] add new API for optimize any pytorch models ( #8827 )
...
* add new API for optimize any pytorch models
* change test util name
* revise API and update UT
* fix python style
* update ut config, change default value
* change defaults, disable ut transcribe
2023-08-30 19:41:53 +08:00
Xin Qiu
8eca982301
windows add env ( #8852 )
2023-08-30 15:54:52 +08:00
Zhao Changmin
731916c639
LLM: Enable attempting loading method automatically ( #8841 )
...
* enable auto load method
* warning error
* logger info
---------
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-08-30 15:41:55 +08:00
Yishuo Wang
bba73ec9d2
[LLM] change chatglm native int4 checkpoint name ( #8851 )
2023-08-30 15:05:19 +08:00
Yina Chen
55e705a84c
[LLM] Support the rest of AutoXXX classes in Transformers API ( #8815 )
...
* add transformers auto models
* fix
2023-08-30 11:16:14 +08:00
Zhao Changmin
887018b0f2
Update ut save&load ( #8847 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-08-30 10:32:57 +08:00
Yina Chen
3462fd5c96
Add arc gpt-j example ( #8840 )
2023-08-30 10:31:24 +08:00
Ruonan Wang
f42c0bad1b
LLM: update GPU doc ( #8845 )
2023-08-30 09:24:19 +08:00
Jason Dai
aab7deab1f
Reorganize GPU examples ( #8844 )
2023-08-30 08:32:08 +08:00
Yang Wang
a386ad984e
Add Data Center GPU Flex Series to Readme ( #8835 )
...
* Add Data Center GPU Flex Series to Readme
* remove
* update starcoder
2023-08-29 11:19:09 -07:00
Yishuo Wang
7429ea0606
[LLM] support transformer int4 + amx int4 ( #8838 )
2023-08-29 17:27:18 +08:00
Ruonan Wang
ddff7a6f05
Update readme of GPU to specify oneapi version( #8820 )
2023-08-29 13:14:22 +08:00
Zhao Changmin
bb31d4fe80
LLM: Implement hf low_cpu_mem_usage with 1xbinary file peak memory on transformer int4 ( #8731 )
...
* 1x peak memory
2023-08-29 09:33:17 +08:00
Yina Chen
35fdf94031
[LLM]Arc starcoder example ( #8814 )
...
* arc starcoder example init
* add log
* meet comments
2023-08-28 16:48:00 +08:00
xingyuan li
6a902b892e
[LLM] Add amx build step ( #8822 )
...
* add amx build step
2023-08-28 17:41:18 +09:00
Ruonan Wang
eae92bc7da
llm: quick fix path ( #8810 )
2023-08-25 16:02:31 +08:00
Ruonan Wang
0186f3ab2f
llm: update all ARC int4 examples ( #8809 )
...
* update GPU examples
* update other examples
* fix
* update based on comment
2023-08-25 15:26:10 +08:00
Song Jiaming
b8b1b6888b
[LLM] Performance test ( #8796 )
2023-08-25 14:31:45 +08:00
Yang Wang
9d0f6a8cce
rename math.py in example to avoid conflict ( #8805 )
2023-08-24 21:06:31 -07:00
SONG Ge
d2926c7672
[LLM] Unify Langchain Native and Transformers LLM API ( #8752 )
...
* deprecate BigDLNativeTransformers and add specific LMEmbedding method
* deprecate and add LM methods for langchain llms
* add native params to native langchain
* new imple for embedding
* move ut from bigdlnative to casual llm
* rename embeddings api and examples update align with usage updating
* docqa example hot-fix
* add more api docs
* add langchain ut for starcoder
* support model_kwargs for transformer methods when calling causalLM and add ut
* ut fix for transformers embedding
* update for langchain causal supporting transformers
* remove model_family in readme doc
* add model_families params to support more models
* update api docs and remove chatglm embeddings for now
* remove chatglm embeddings in examples
* new refactor for ut to add bloom and transformers llama ut
* disable llama transformers embedding ut
2023-08-25 11:14:21 +08:00