Commit graph

94 commits

Author SHA1 Message Date
binbin Deng
4053a6ef94
Update environment variable setting in AutoTP with arc (#11018) 2024-05-15 10:23:58 +08:00
Shaojun Liu
7f8c5b410b
Quickstart: Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL) (#10970)
* add entrypoint.sh

* add quickstart

* remove entrypoint

* update

* Install related library of benchmarking

* update

* print out results

* update docs

* minor update

* update

* update quickstart

* update

* update

* update

* update

* update

* update

* add chat & example section

* add more details

* minor update

* rename quickstart

* update

* minor update

* update

* update config.yaml

* update readme

* use --gpu

* add tips

* minor update

* update
2024-05-14 12:58:31 +08:00
Xin Qiu
dfa3147278
update (#10944) 2024-05-08 14:28:05 +08:00
Cengguang Zhang
0edef1f94c
LLM: add min_new_tokens to all in one benchmark. (#10911) 2024-05-06 09:32:59 +08:00
Yuwen Hu
1a8a93d5e0
Further fix nightly perf (#10901) 2024-04-28 10:18:58 +08:00
Yuwen Hu
ddfdaec137
Fix nightly perf (#10899)
* Fix nightly perf by adding default value in benchmark for use_fp16_torch_dtype

* further fixes
2024-04-28 09:39:29 +08:00
binbin Deng
f51bf018eb
Add benchmark script for pipeline parallel inference (#10873) 2024-04-26 15:28:11 +08:00
Cengguang Zhang
eb39c61607
LLM: add min new token to perf test. (#10869) 2024-04-24 14:32:02 +08:00
yb-peng
c9dee6cd0e
Update 8192.txt (#10824)
* Update 8192.txt

* Update 8192.txt with original text
2024-04-23 14:02:09 +08:00
Wang, Jian4
23c6a52fb0
LLM: Fix ipex torchscript=True error (#10832)
* remove

* update

* remove torchscript
2024-04-22 15:53:09 +08:00
yb-peng
2685c41318
Modify all-in-one benchmark (#10726)
* Update 8192 prompt in all-in-one

* Add cpu_embedding param for linux api

* Update run.py

* Update README.md
2024-04-11 13:38:50 +08:00
yb-peng
2d88bb9b4b
add test api transformer_int4_fp16_gpu (#10627)
* add test api transformer_int4_fp16_gpu

* update config.yaml and README.md in all-in-one

* modify run.py in all-in-one

* re-order test-api

* re-order test-api in config

* modify README.md in all-in-one

* modify README.md in all-in-one

* modify config.yaml

---------

Co-authored-by: pengyb2001 <arda@arda-arc21.sh.intel.com>
Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
2024-04-07 15:47:17 +08:00
Wang, Jian4
9ad4b29697
LLM: CPU benchmark using tcmalloc (#10675) 2024-04-07 14:17:01 +08:00
binbin Deng
d9a1153b4e
LLM: upgrade deepspeed in AutoTP on GPU (#10647) 2024-04-07 14:05:19 +08:00
binbin Deng
27be448920
LLM: add cpu_embedding and peak memory record for deepspeed autotp script (#10621) 2024-04-02 17:32:50 +08:00
Ruonan Wang
d6af4877dd
LLM: remove ipex.optimize for gpt-j (#10606)
* remove ipex.optimize

* fix

* fix
2024-04-01 12:21:49 +08:00
WeiguangHan
fbeb10c796
LLM: Set different env based on different Linux kernels (#10566) 2024-03-27 17:56:33 +08:00
Ruonan Wang
ea4bc450c4
LLM: add esimd sdp for pvc (#10543)
* add esimd sdp for pvc

* update

* fix

* fix batch
2024-03-26 19:04:40 +08:00
Wang, Jian4
16b2ef49c6
Update_document by heyang (#30) 2024-03-25 10:06:02 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm

* rm python/llm/src/bigdl

* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00
binbin Deng
85ef3f1d99 LLM: add empty cache in deepspeed autotp benchmark script (#10488) 2024-03-21 10:51:23 +08:00
Xiangyu Tian
5a5fd5af5b LLM: Add speculative benchmark on CPU/XPU (#10464)
Add speculative benchmark on CPU/XPU.
2024-03-21 09:51:06 +08:00
Xiangyu Tian
cbe24cc7e6 LLM: Enable BigDL IPEX Int8 (#10480)
Enable BigDL IPEX Int8
2024-03-20 15:59:54 +08:00
Jin Qiao
e41d556436 LLM: change fp16 benchmark to model.half (#10477)
* LLM: change fp16 benchmark to model.half

* fix
2024-03-20 13:38:39 +08:00
Jin Qiao
e9055c32f9 LLM: fix fp16 mem record in benchmark (#10461)
* LLM: fix fp16 mem record in benchmark

* change style
2024-03-19 16:17:23 +08:00
Jin Qiao
0451103a43 LLM: add int4+fp16 benchmark script for windows benchmarking (#10449)
* LLM: add fp16 for benchmark script

* remove transformer_int4_fp16_loadlowbit_gpu_win
2024-03-19 11:11:25 +08:00
Xiangyu Tian
0ded0b4b13 LLM: Enable BigDL IPEX optimization for int4 (#10319)
Enable BigDL IPEX optimization for int4
2024-03-12 17:08:50 +08:00
Lilac09
5809a3f5fe Add run-hbm.sh & add user guide for spr and hbm (#10357)
* add run-hbm.sh

* add spr and hbm guide

* only support quad mode

* only support quad mode

* update special cases

* update special cases
2024-03-12 16:15:27 +08:00
binbin Deng
5d996a5caf LLM: add benchmark script for deepspeed autotp on gpu (#10380) 2024-03-12 15:19:57 +08:00
WeiguangHan
fd81d66047 LLM: Compress some models to save space (#10315)
* LLM: compress some models to save space

* add deleted comments
2024-03-04 17:53:03 +08:00
Yuwen Hu
27d9a14989 [LLM] all-on-one update: memory optimize and streaming output (#10302)
* Memory saving for continous in-out pair run and add support for streaming output on MTL iGPU

* Small fix

* Small fix

* Add things back
2024-03-01 18:02:30 +08:00
Keyan (Kyrie) Zhang
59861f73e5 Add Deepseek-6.7B (#9991)
* Add new example Deepseek

* Add new example Deepseek

* Add new example Deepseek

* Add new example Deepseek

* Add new example Deepseek

* modify deepseek

* modify deepseek

* Add verified model in README

* Turn cpu_embedding=True in Deepseek example

---------

Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-02-28 11:36:39 +08:00
Yuwen Hu
21de2613ce [LLM] Add model loading time record for all-in-one benchmark (#10201)
* Add model loading time record in csv for all-in-one benchmark

* Small fix

* Small fix to number after .
2024-02-22 13:57:18 +08:00
Yuwen Hu
001c13243e [LLM] Add support for low_low_bit benchmark on Windows GPU (#10167)
* Add support for low_low_bit performance test on Windows GPU

* Small fix

* Small fix

* Save memory during converting model process

* Drop the results for first time when loading in low bit on mtl igpu for better performance

* Small fix
2024-02-21 10:51:52 +08:00
dingbaorong
36c9442c6d Arc Stable version test (#10087)
* add batch_size in stable version test

* add batch_size in excludes

* add excludes for batch_size

* fix ci

* triger regression test

* fix xpu version

* disable ci

* address kai's comment

---------

Co-authored-by: Ariadne <wyn2000330@126.com>
2024-02-06 10:23:50 +08:00
WeiguangHan
c2e562d037 LLM: add batch_size to the csv and html (#10080)
* LLM: add batch_size to the csv and html

* small fix
2024-02-04 16:35:44 +08:00
WeiguangHan
d2d3f6b091 LLM: ensure the result of daily arc perf test (#10016)
* ensure the result of daily arc perf test

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* concat more csvs

* small fix

* revert some files
2024-01-31 18:26:21 +08:00
Xin Qiu
7952bbc919 add conf batch_size to run_model (#10010) 2024-01-26 15:48:48 +08:00
Ziteng Zhang
8b08ad408b Add batch_size in all_in_one (#9999)
Add batch_size in all_in_one, except run_native_int4
2024-01-25 17:43:49 +08:00
Xin Qiu
610b5226be move reserved memory to benchmark_utils.py (#9907)
* move reserved memory to benchmark_utils.py

* meet code review
2024-01-19 09:44:30 +08:00
WeiguangHan
100e0a87e5 LLM: add compressed chatglm3 model (#9892)
* LLM: add compressed chatglm3 model

* small fix

* revert github action
2024-01-18 17:48:15 +08:00
Ruonan Wang
b059a32fff LLM: add benchmark api for bigdl-llm fp16 on GPU (#9919)
* add bmk for bigdl fp16

* fix
2024-01-17 14:24:35 +08:00
WeiguangHan
0e69bfe6b0 LLM: fix the performance drop of starcoder (#9889)
* LLM: fix the performance drop of starcoder

* small fix

* small fix
2024-01-12 09:14:15 +08:00
Ziteng Zhang
4f4ce73f31 [LLM] Add transformer_autocast_bf16 into all-in-one (#9890)
* Add transformer_autocast_bf16 into all-in-one
2024-01-11 17:51:07 +08:00
WeiguangHan
33fd1f9c76 LLM: fix input length logic for run_transformer_int4_gpu (#9864)
* LLM: fix input length logic for run_transformer_int4_gpu

* small fix

* small fix

* small fix
2024-01-10 18:20:14 +08:00
Cheen Hau, 俊豪
b2aa267f50 Enhance LLM GPU installation document (#9828)
* Improve gpu install doc

* Add troubleshooting - setvars.sh not done properly.

* Further improvements

* 2024.x.x -> 2024.0

* Fixes

* Fix Install BigDL-LLM From Wheel : bigdl-llm[xpu_2.0]

* Remove "export USE_XETLA=OFF" for Max GPU
2024-01-09 16:30:50 +08:00
dingbaorong
f6bb4ab313 Arc stress test (#9795)
* add arc stress test

* triger ci

* triger CI

* triger ci

* disable ci
2023-12-27 21:02:41 +08:00
Shaojun Liu
6c75c689ea bigdl-llm stress test for stable version (#9781)
* 1k-512 2k-512 baseline

* add cpu stress test

* update yaml name

* update

* update

* clean up

* test

* update

* update

* update

* test

* update
2023-12-27 15:40:53 +08:00
dingbaorong
5cfb4c4f5b Arc stable version performance regression test (#9785)
* add arc stable version regression test

* empty gpu mem between different models

* triger ci

* comment spr test

* triger ci

* address kai's comments and disable ci

* merge fp8 and int4

* disable ci
2023-12-27 11:01:56 +08:00
WeiguangHan
c05d7e1532 LLM: add star_corder_15.5b model (#9772)
* LLM: add star_corder_15.5b model

* revert llm_performance_tests.yml
2023-12-26 18:55:56 +08:00