Heyang Sun
|
af94058203
|
[LLM] Support CPU deepspeed distributed inference (#9259)
* [LLM] Support CPU Deepspeed distributed inference
* Update run_deepspeed.py
* Rename
* fix style
* add new codes
* refine
* remove annotated codes
* refine
* Update README.md
* refine doc and example code
|
2023-11-06 17:56:42 +08:00 |
|
Chen, Zhentao
|
d4dffbdb62
|
Merge harness (#9319)
* add harness patch and llb script
* add readme
* add license
* use patch instead
* update readme
* rename tests to evaluation
* fix typo
* remove nano dependency
* add original harness link
* rename title of usage
* rename BigDLGPULM as BigDLLM
* empty commit to rerun job
|
2023-11-02 15:14:19 +08:00 |
|
Ruonan Wang
|
7e73c354a6
|
LLM: decoupling bigdl-llm and bigdl-nano (#9306)
|
2023-11-01 11:00:54 +08:00 |
|
binbin Deng
|
770ac70b00
|
LLM: add low_bit option in benchmark scripts (#9257)
|
2023-10-25 10:27:48 +08:00 |
|
WeiguangHan
|
ec9195da42
|
LLM: using html to visualize the perf result for Arc (#9228)
* LLM: using html to visualize the perf result for Arc
* deploy the html file
* add python license
* reslove some comments
|
2023-10-24 18:05:25 +08:00 |
|
Ruonan Wang
|
b15656229e
|
LLM: fix benchmark issue (#9255)
|
2023-10-24 14:15:05 +08:00 |
|
WeiguangHan
|
b9194c5786
|
LLM: skip some model tests using certain api (#9163)
* LLM: Skip some model tests using certain api
* initialize variable named result
|
2023-10-18 09:39:27 +08:00 |
|
Ruonan Wang
|
4f34557224
|
LLM: support num_beams in all-in-one benchmark (#9141)
* support num_beams
* fix
|
2023-10-12 13:35:12 +08:00 |
|
Ruonan Wang
|
62ac7ae444
|
LLM: fix inaccurate input / output tokens of current all-in-one benchmark (#9137)
* first fix
* fix all apis
* fix
|
2023-10-11 17:13:34 +08:00 |
|
Ruonan Wang
|
1c8d5da362
|
LLM: fix llama tokenizer for all-in-one benchmark (#9129)
* fix tokenizer for gpu benchmark
* fix ipex fp16
* meet code review
* fix
|
2023-10-11 13:39:39 +08:00 |
|
Ruonan Wang
|
1363e666fc
|
LLM: update benchmark_util.py for beam search (#9126)
* update reorder_cache
* fix
|
2023-10-11 09:41:53 +08:00 |
|
Ruonan Wang
|
ad7d9231f5
|
LLM: add benchmark script for Max gpu and ipex fp16 gpu (#9112)
* add pvc bash
* meet code review
* rename to run-max-gpu.sh
|
2023-10-10 10:18:41 +08:00 |
|
Yuwen Hu
|
65212451cc
|
[LLM] Small update to performance tests (#9106)
* small updates to llm performance tests regarding model handling
* Small fix
|
2023-10-09 16:55:25 +08:00 |
|
Kai Huang
|
78ea7ddb1c
|
Combine apply_rotary_pos_emb for gpt-neox (#9074)
|
2023-10-07 16:27:46 +08:00 |
|
Cengguang Zhang
|
ad62c58b33
|
LLM: Enable jemalloc in benchmark scripts. (#9058)
* enable jemalloc.
* fix readme.
|
2023-09-26 15:37:49 +08:00 |
|
Cengguang Zhang
|
26213a5829
|
LLM: Change benchmark bf16 load format. (#9035)
* LLM: Change benchmark bf16 load format.
* comment on bf16 chatglm.
* fix.
|
2023-09-22 17:38:38 +08:00 |
|
Kai Huang
|
6981745fe4
|
Optimize kv_cache for gpt-neox model family (#9015)
* override gptneox
* style
* move to utils
* revert
|
2023-09-20 19:59:19 +08:00 |
|
Xin Qiu
|
37bb0cbf8f
|
Speed up gpt-j in gpubenchmark (#9000)
* Speedup gpt-j in gpubenchmark
* meet code review
|
2023-09-19 14:22:28 +08:00 |
|
Cengguang Zhang
|
8299b68fea
|
update readme. (#8996)
|
2023-09-18 17:06:15 +08:00 |
|
Cengguang Zhang
|
74338fd291
|
LLM: add auto torch dtype in benchmark. (#8981)
|
2023-09-18 15:48:25 +08:00 |
|
Ruonan Wang
|
32716106e0
|
update use_cahce=True (#8986)
|
2023-09-18 07:59:33 +08:00 |
|
Xin Qiu
|
64ee1d7689
|
update run_transformer_int4_gpu (#8983)
* xpuperf
* update run.py
* clean upo
* uodate
* update
* meet code review
|
2023-09-15 15:10:04 +08:00 |
|
Cengguang Zhang
|
cca84b0a64
|
LLM: update llm benchmark scripts. (#8943)
* update llm benchmark scripts.
* change tranformer_bf16 to pytorch_autocast_bf16.
* add autocast in transformer int4.
* revert autocast.
* add "pytorch_autocast_bf16" to doc
* fix comments.
|
2023-09-13 12:23:28 +08:00 |
|
Xin Qiu
|
ea0853c0b5
|
update benchmark_utils readme (#8925)
* update readme
* meet code review
|
2023-09-08 10:30:26 +08:00 |
|
Cengguang Zhang
|
3d2efe9608
|
LLM: update llm latency benchmark. (#8922)
|
2023-09-07 19:00:19 +08:00 |
|
binbin Deng
|
7897eb4b51
|
LLM: add benchmark scripts on GPU (#8916)
|
2023-09-07 18:08:17 +08:00 |
|
Xin Qiu
|
d8a01d7c4f
|
fix chatglm in run.pu (#8919)
|
2023-09-07 16:44:10 +08:00 |
|
Xin Qiu
|
e9de9d9950
|
benchmark for native int4 (#8918)
* native4
* update
* update
* update
|
2023-09-07 15:56:15 +08:00 |
|
Ruonan Wang
|
057e77e229
|
LLM: update benchmark_utils.py to handle do_sample=True (#8903)
|
2023-09-07 14:20:47 +08:00 |
|
Xin Qiu
|
5d9942a3ca
|
transformer int4 and native int4's benchmark script for 32 256 1k 2k input (#8871)
* transformer
* move
* update
* add header
* update all-in-one
* clean up
|
2023-09-07 09:49:55 +08:00 |
|
Xin Qiu
|
49a39452c6
|
update benchmark (#8899)
|
2023-09-06 15:11:43 +08:00 |
|
Song Jiaming
|
7b3ac66e17
|
[LLM] auto performance test fix specific settings to template (#8876)
|
2023-09-01 15:49:04 +08:00 |
|
Song Jiaming
|
c06f1ca93e
|
[LLM] auto perf test to output to csv (#8846)
|
2023-09-01 10:48:00 +08:00 |
|
Song Jiaming
|
b8b1b6888b
|
[LLM] Performance test (#8796)
|
2023-08-25 14:31:45 +08:00 |
|
Ruonan Wang
|
e9aa2bd890
|
LLM: reduce GPU 1st token latency and update example (#8763)
* reduce 1st token latency
* update example
* fix
* fix style
* update readme of gpu benchmark
|
2023-08-16 18:01:23 +08:00 |
|
Ruonan Wang
|
8805186f2f
|
LLM: add benchmark tool for gpu (#8760)
* add benchmark tool for gpu
* update
|
2023-08-16 11:22:10 +08:00 |
|
Ruonan Wang
|
64b38e1dc8
|
llm: benchmark tool for transformers int4 (separate 1st token and rest) (#8460)
* add benchmark utils
* fix
* fix bug and add readme
* hidden latency data
|
2023-07-06 09:49:52 +08:00 |
|