Commit graph

60 commits

Author SHA1 Message Date
Ruonan Wang
139e98aa18 LLM: quick fix benchmark (#9509) 2023-11-22 10:19:57 +08:00
WeiguangHan
c2aeb4d1e8 del model after test (#9504) 2023-11-21 18:41:50 +08:00
Cheen Hau, 俊豪
3e39828420 Update all in one benchmark readme (#9496)
* Add gperftools install to all in one benchmark readme

* Update readme
2023-11-21 14:57:16 +08:00
WeiguangHan
c487b53f21 LLM: only run arc perf test nightly (#9448)
* LLM: only run arc perf test nightly

* deleted unused python scripts

* rebase main
2023-11-15 19:38:14 +08:00
Chen, Zhentao
dbbdb53a18 fix multiple gpu usage (#9459) 2023-11-14 17:06:27 +08:00
Chen, Zhentao
d19ca21957 patch bigdl-llm model to harness by binding instead of patch file (#9420)
* add run_llb.py

* fix args interpret

* modify outputs

* update workflow

* add license

* test mixed 4 bit

* update readme

* use autotokenizer

* add timeout

* refactor workflow file

* fix working directory

* fix env

* throw exception if some jobs failed

* improve terminal outputs

* Disable var which cause the run stuck

* fix unknown precision

* fix key error

* directly output config instead

* rm harness submodule
2023-11-14 12:51:39 +08:00
Chen, Zhentao
0ecb9efb05 use AutoTokenizer to enable more models (#9446) 2023-11-13 17:47:43 +08:00
Cengguang Zhang
ece5805572 LLM: add chatglm3-6b to latency benchmark test. (#9442) 2023-11-13 17:24:37 +08:00
Chen, Zhentao
5747e2fe69 fix multiple gpu usage of harness (#9444) 2023-11-13 16:53:23 +08:00
Heyang Sun
b23b91407c fix llm-init on deepspeed missing lib (#9419) 2023-11-10 13:51:24 +08:00
Chen, Zhentao
298b64217e add auto triggered acc test (#9364)
* add auto triggered acc test

* use llama 7b instead

* fix env

* debug download

* fix download prefix

* add cut dirs

* fix env of model path

* fix dataset download

* full job

* source xpu env vars

* use matrix to trigger model run

* reset batch=1

* remove redirect

* remove some trigger

* add task matrix

* add precision list

* test llama-7b-chat

* use /mnt/disk1 to store model and datasets

* remove installation test

* correct downloading path

* fix HF vars

* add bigdl-llm env vars

* rename file

* fix hf_home

* fix script path

* rename as harness evalution

* rerun
2023-11-08 10:22:27 +08:00
WeiguangHan
84ab614aab LLM: add more models and skip runtime error (#9349)
* add more models and skip runtime error

* upgrade transformers

* temporarily removed Mistral-7B-v0.1

* temporarily disable the upload of arc perf result
2023-11-08 09:45:53 +08:00
Heyang Sun
af94058203 [LLM] Support CPU deepspeed distributed inference (#9259)
* [LLM] Support CPU Deepspeed distributed inference

* Update run_deepspeed.py

* Rename

* fix style

* add new codes

* refine

* remove annotated codes

* refine

* Update README.md

* refine doc and example code
2023-11-06 17:56:42 +08:00
Chen, Zhentao
d4dffbdb62 Merge harness (#9319)
* add harness patch and llb script

* add readme

* add license

* use patch instead

* update readme

* rename tests to evaluation

* fix typo

* remove nano dependency

* add original harness link

* rename title of usage

* rename BigDLGPULM as BigDLLM

* empty commit to rerun job
2023-11-02 15:14:19 +08:00
Ruonan Wang
7e73c354a6 LLM: decoupling bigdl-llm and bigdl-nano (#9306) 2023-11-01 11:00:54 +08:00
binbin Deng
770ac70b00 LLM: add low_bit option in benchmark scripts (#9257) 2023-10-25 10:27:48 +08:00
WeiguangHan
ec9195da42 LLM: using html to visualize the perf result for Arc (#9228)
* LLM: using html to visualize the perf result for Arc

* deploy the html file

* add python license

* reslove some comments
2023-10-24 18:05:25 +08:00
Ruonan Wang
b15656229e LLM: fix benchmark issue (#9255) 2023-10-24 14:15:05 +08:00
WeiguangHan
b9194c5786 LLM: skip some model tests using certain api (#9163)
* LLM: Skip some model tests using certain api

* initialize variable named result
2023-10-18 09:39:27 +08:00
Ruonan Wang
4f34557224 LLM: support num_beams in all-in-one benchmark (#9141)
* support num_beams

* fix
2023-10-12 13:35:12 +08:00
Ruonan Wang
62ac7ae444 LLM: fix inaccurate input / output tokens of current all-in-one benchmark (#9137)
* first fix

* fix all apis

* fix
2023-10-11 17:13:34 +08:00
Ruonan Wang
1c8d5da362 LLM: fix llama tokenizer for all-in-one benchmark (#9129)
* fix tokenizer for gpu benchmark

* fix ipex fp16

* meet code review

* fix
2023-10-11 13:39:39 +08:00
Ruonan Wang
1363e666fc LLM: update benchmark_util.py for beam search (#9126)
* update reorder_cache

* fix
2023-10-11 09:41:53 +08:00
Yuwen Hu
0e09dd926b [LLM] Fix example test (#9118)
* Update llm example test link due to example layout change

* Add better change detect
2023-10-10 13:24:18 +08:00
Ruonan Wang
ad7d9231f5 LLM: add benchmark script for Max gpu and ipex fp16 gpu (#9112)
* add pvc bash

* meet code review

* rename to run-max-gpu.sh
2023-10-10 10:18:41 +08:00
Yuwen Hu
65212451cc [LLM] Small update to performance tests (#9106)
* small updates to llm performance tests regarding model handling

* Small fix
2023-10-09 16:55:25 +08:00
Kai Huang
78ea7ddb1c Combine apply_rotary_pos_emb for gpt-neox (#9074) 2023-10-07 16:27:46 +08:00
Cengguang Zhang
ad62c58b33 LLM: Enable jemalloc in benchmark scripts. (#9058)
* enable jemalloc.

* fix readme.
2023-09-26 15:37:49 +08:00
Cengguang Zhang
26213a5829 LLM: Change benchmark bf16 load format. (#9035)
* LLM: Change benchmark bf16 load format.

* comment on bf16 chatglm.

* fix.
2023-09-22 17:38:38 +08:00
Kai Huang
6981745fe4 Optimize kv_cache for gpt-neox model family (#9015)
* override gptneox

* style

* move to utils

* revert
2023-09-20 19:59:19 +08:00
Xin Qiu
37bb0cbf8f Speed up gpt-j in gpubenchmark (#9000)
* Speedup gpt-j in gpubenchmark

* meet code review
2023-09-19 14:22:28 +08:00
Cengguang Zhang
8299b68fea update readme. (#8996) 2023-09-18 17:06:15 +08:00
Cengguang Zhang
74338fd291 LLM: add auto torch dtype in benchmark. (#8981) 2023-09-18 15:48:25 +08:00
Ruonan Wang
32716106e0 update use_cahce=True (#8986) 2023-09-18 07:59:33 +08:00
Xin Qiu
64ee1d7689 update run_transformer_int4_gpu (#8983)
* xpuperf

* update run.py

* clean upo

* uodate

* update

* meet code review
2023-09-15 15:10:04 +08:00
Cengguang Zhang
cca84b0a64 LLM: update llm benchmark scripts. (#8943)
* update llm benchmark scripts.

* change tranformer_bf16 to pytorch_autocast_bf16.

* add autocast in transformer int4.

* revert autocast.

* add "pytorch_autocast_bf16" to doc

* fix comments.
2023-09-13 12:23:28 +08:00
Xin Qiu
ea0853c0b5 update benchmark_utils readme (#8925)
* update readme

* meet code review
2023-09-08 10:30:26 +08:00
Cengguang Zhang
3d2efe9608 LLM: update llm latency benchmark. (#8922) 2023-09-07 19:00:19 +08:00
binbin Deng
7897eb4b51 LLM: add benchmark scripts on GPU (#8916) 2023-09-07 18:08:17 +08:00
Xin Qiu
d8a01d7c4f fix chatglm in run.pu (#8919) 2023-09-07 16:44:10 +08:00
Xin Qiu
e9de9d9950 benchmark for native int4 (#8918)
* native4

* update

* update

* update
2023-09-07 15:56:15 +08:00
Ruonan Wang
057e77e229 LLM: update benchmark_utils.py to handle do_sample=True (#8903) 2023-09-07 14:20:47 +08:00
Xin Qiu
5d9942a3ca transformer int4 and native int4's benchmark script for 32 256 1k 2k input (#8871)
* transformer

* move

* update

* add header

* update all-in-one

* clean up
2023-09-07 09:49:55 +08:00
Xin Qiu
49a39452c6 update benchmark (#8899) 2023-09-06 15:11:43 +08:00
Song Jiaming
7b3ac66e17 [LLM] auto performance test fix specific settings to template (#8876) 2023-09-01 15:49:04 +08:00
Song Jiaming
c06f1ca93e [LLM] auto perf test to output to csv (#8846) 2023-09-01 10:48:00 +08:00
Song Jiaming
b8b1b6888b [LLM] Performance test (#8796) 2023-08-25 14:31:45 +08:00
Ruonan Wang
e9aa2bd890 LLM: reduce GPU 1st token latency and update example (#8763)
* reduce 1st token latency

* update example

* fix

* fix style

* update readme of gpu benchmark
2023-08-16 18:01:23 +08:00
Song Jiaming
c1f9af6d97 [LLM] chatglm example and transformers low-bit examples (#8751) 2023-08-16 11:41:44 +08:00
Ruonan Wang
8805186f2f LLM: add benchmark tool for gpu (#8760)
* add benchmark tool for gpu

* update
2023-08-16 11:22:10 +08:00