Ruonan Wang
139e98aa18
LLM: quick fix benchmark ( #9509 )
2023-11-22 10:19:57 +08:00
WeiguangHan
c2aeb4d1e8
del model after test ( #9504 )
2023-11-21 18:41:50 +08:00
Cheen Hau, 俊豪
3e39828420
Update all in one benchmark readme ( #9496 )
...
* Add gperftools install to all in one benchmark readme
* Update readme
2023-11-21 14:57:16 +08:00
WeiguangHan
c487b53f21
LLM: only run arc perf test nightly ( #9448 )
...
* LLM: only run arc perf test nightly
* deleted unused python scripts
* rebase main
2023-11-15 19:38:14 +08:00
Chen, Zhentao
dbbdb53a18
fix multiple gpu usage ( #9459 )
2023-11-14 17:06:27 +08:00
Chen, Zhentao
d19ca21957
patch bigdl-llm model to harness by binding instead of patch file ( #9420 )
...
* add run_llb.py
* fix args interpret
* modify outputs
* update workflow
* add license
* test mixed 4 bit
* update readme
* use autotokenizer
* add timeout
* refactor workflow file
* fix working directory
* fix env
* throw exception if some jobs failed
* improve terminal outputs
* Disable var which cause the run stuck
* fix unknown precision
* fix key error
* directly output config instead
* rm harness submodule
2023-11-14 12:51:39 +08:00
Chen, Zhentao
0ecb9efb05
use AutoTokenizer to enable more models ( #9446 )
2023-11-13 17:47:43 +08:00
Cengguang Zhang
ece5805572
LLM: add chatglm3-6b to latency benchmark test. ( #9442 )
2023-11-13 17:24:37 +08:00
Chen, Zhentao
5747e2fe69
fix multiple gpu usage of harness ( #9444 )
2023-11-13 16:53:23 +08:00
Heyang Sun
b23b91407c
fix llm-init on deepspeed missing lib ( #9419 )
2023-11-10 13:51:24 +08:00
Chen, Zhentao
298b64217e
add auto triggered acc test ( #9364 )
...
* add auto triggered acc test
* use llama 7b instead
* fix env
* debug download
* fix download prefix
* add cut dirs
* fix env of model path
* fix dataset download
* full job
* source xpu env vars
* use matrix to trigger model run
* reset batch=1
* remove redirect
* remove some trigger
* add task matrix
* add precision list
* test llama-7b-chat
* use /mnt/disk1 to store model and datasets
* remove installation test
* correct downloading path
* fix HF vars
* add bigdl-llm env vars
* rename file
* fix hf_home
* fix script path
* rename as harness evalution
* rerun
2023-11-08 10:22:27 +08:00
WeiguangHan
84ab614aab
LLM: add more models and skip runtime error ( #9349 )
...
* add more models and skip runtime error
* upgrade transformers
* temporarily removed Mistral-7B-v0.1
* temporarily disable the upload of arc perf result
2023-11-08 09:45:53 +08:00
Heyang Sun
af94058203
[LLM] Support CPU deepspeed distributed inference ( #9259 )
...
* [LLM] Support CPU Deepspeed distributed inference
* Update run_deepspeed.py
* Rename
* fix style
* add new codes
* refine
* remove annotated codes
* refine
* Update README.md
* refine doc and example code
2023-11-06 17:56:42 +08:00
Chen, Zhentao
d4dffbdb62
Merge harness ( #9319 )
...
* add harness patch and llb script
* add readme
* add license
* use patch instead
* update readme
* rename tests to evaluation
* fix typo
* remove nano dependency
* add original harness link
* rename title of usage
* rename BigDLGPULM as BigDLLM
* empty commit to rerun job
2023-11-02 15:14:19 +08:00
Ruonan Wang
7e73c354a6
LLM: decoupling bigdl-llm and bigdl-nano ( #9306 )
2023-11-01 11:00:54 +08:00
binbin Deng
770ac70b00
LLM: add low_bit option in benchmark scripts ( #9257 )
2023-10-25 10:27:48 +08:00
WeiguangHan
ec9195da42
LLM: using html to visualize the perf result for Arc ( #9228 )
...
* LLM: using html to visualize the perf result for Arc
* deploy the html file
* add python license
* reslove some comments
2023-10-24 18:05:25 +08:00
Ruonan Wang
b15656229e
LLM: fix benchmark issue ( #9255 )
2023-10-24 14:15:05 +08:00
WeiguangHan
b9194c5786
LLM: skip some model tests using certain api ( #9163 )
...
* LLM: Skip some model tests using certain api
* initialize variable named result
2023-10-18 09:39:27 +08:00
Ruonan Wang
4f34557224
LLM: support num_beams in all-in-one benchmark ( #9141 )
...
* support num_beams
* fix
2023-10-12 13:35:12 +08:00
Ruonan Wang
62ac7ae444
LLM: fix inaccurate input / output tokens of current all-in-one benchmark ( #9137 )
...
* first fix
* fix all apis
* fix
2023-10-11 17:13:34 +08:00
Ruonan Wang
1c8d5da362
LLM: fix llama tokenizer for all-in-one benchmark ( #9129 )
...
* fix tokenizer for gpu benchmark
* fix ipex fp16
* meet code review
* fix
2023-10-11 13:39:39 +08:00
Ruonan Wang
1363e666fc
LLM: update benchmark_util.py for beam search ( #9126 )
...
* update reorder_cache
* fix
2023-10-11 09:41:53 +08:00
Yuwen Hu
0e09dd926b
[LLM] Fix example test ( #9118 )
...
* Update llm example test link due to example layout change
* Add better change detect
2023-10-10 13:24:18 +08:00
Ruonan Wang
ad7d9231f5
LLM: add benchmark script for Max gpu and ipex fp16 gpu ( #9112 )
...
* add pvc bash
* meet code review
* rename to run-max-gpu.sh
2023-10-10 10:18:41 +08:00
Yuwen Hu
65212451cc
[LLM] Small update to performance tests ( #9106 )
...
* small updates to llm performance tests regarding model handling
* Small fix
2023-10-09 16:55:25 +08:00
Kai Huang
78ea7ddb1c
Combine apply_rotary_pos_emb for gpt-neox ( #9074 )
2023-10-07 16:27:46 +08:00
Cengguang Zhang
ad62c58b33
LLM: Enable jemalloc in benchmark scripts. ( #9058 )
...
* enable jemalloc.
* fix readme.
2023-09-26 15:37:49 +08:00
Cengguang Zhang
26213a5829
LLM: Change benchmark bf16 load format. ( #9035 )
...
* LLM: Change benchmark bf16 load format.
* comment on bf16 chatglm.
* fix.
2023-09-22 17:38:38 +08:00
Kai Huang
6981745fe4
Optimize kv_cache for gpt-neox model family ( #9015 )
...
* override gptneox
* style
* move to utils
* revert
2023-09-20 19:59:19 +08:00
Xin Qiu
37bb0cbf8f
Speed up gpt-j in gpubenchmark ( #9000 )
...
* Speedup gpt-j in gpubenchmark
* meet code review
2023-09-19 14:22:28 +08:00
Cengguang Zhang
8299b68fea
update readme. ( #8996 )
2023-09-18 17:06:15 +08:00
Cengguang Zhang
74338fd291
LLM: add auto torch dtype in benchmark. ( #8981 )
2023-09-18 15:48:25 +08:00
Ruonan Wang
32716106e0
update use_cahce=True ( #8986 )
2023-09-18 07:59:33 +08:00
Xin Qiu
64ee1d7689
update run_transformer_int4_gpu ( #8983 )
...
* xpuperf
* update run.py
* clean upo
* uodate
* update
* meet code review
2023-09-15 15:10:04 +08:00
Cengguang Zhang
cca84b0a64
LLM: update llm benchmark scripts. ( #8943 )
...
* update llm benchmark scripts.
* change tranformer_bf16 to pytorch_autocast_bf16.
* add autocast in transformer int4.
* revert autocast.
* add "pytorch_autocast_bf16" to doc
* fix comments.
2023-09-13 12:23:28 +08:00
Xin Qiu
ea0853c0b5
update benchmark_utils readme ( #8925 )
...
* update readme
* meet code review
2023-09-08 10:30:26 +08:00
Cengguang Zhang
3d2efe9608
LLM: update llm latency benchmark. ( #8922 )
2023-09-07 19:00:19 +08:00
binbin Deng
7897eb4b51
LLM: add benchmark scripts on GPU ( #8916 )
2023-09-07 18:08:17 +08:00
Xin Qiu
d8a01d7c4f
fix chatglm in run.pu ( #8919 )
2023-09-07 16:44:10 +08:00
Xin Qiu
e9de9d9950
benchmark for native int4 ( #8918 )
...
* native4
* update
* update
* update
2023-09-07 15:56:15 +08:00
Ruonan Wang
057e77e229
LLM: update benchmark_utils.py to handle do_sample=True ( #8903 )
2023-09-07 14:20:47 +08:00
Xin Qiu
5d9942a3ca
transformer int4 and native int4's benchmark script for 32 256 1k 2k input ( #8871 )
...
* transformer
* move
* update
* add header
* update all-in-one
* clean up
2023-09-07 09:49:55 +08:00
Xin Qiu
49a39452c6
update benchmark ( #8899 )
2023-09-06 15:11:43 +08:00
Song Jiaming
7b3ac66e17
[LLM] auto performance test fix specific settings to template ( #8876 )
2023-09-01 15:49:04 +08:00
Song Jiaming
c06f1ca93e
[LLM] auto perf test to output to csv ( #8846 )
2023-09-01 10:48:00 +08:00
Song Jiaming
b8b1b6888b
[LLM] Performance test ( #8796 )
2023-08-25 14:31:45 +08:00
Ruonan Wang
e9aa2bd890
LLM: reduce GPU 1st token latency and update example ( #8763 )
...
* reduce 1st token latency
* update example
* fix
* fix style
* update readme of gpu benchmark
2023-08-16 18:01:23 +08:00
Song Jiaming
c1f9af6d97
[LLM] chatglm example and transformers low-bit examples ( #8751 )
2023-08-16 11:41:44 +08:00
Ruonan Wang
8805186f2f
LLM: add benchmark tool for gpu ( #8760 )
...
* add benchmark tool for gpu
* update
2023-08-16 11:22:10 +08:00