Commit graph

86 commits

Author SHA1 Message Date
yb-peng
c9dee6cd0e
Update 8192.txt (#10824)
* Update 8192.txt

* Update 8192.txt with original text
2024-04-23 14:02:09 +08:00
Wang, Jian4
23c6a52fb0
LLM: Fix ipex torchscript=True error (#10832)
* remove

* update

* remove torchscript
2024-04-22 15:53:09 +08:00
yb-peng
2685c41318
Modify all-in-one benchmark (#10726)
* Update 8192 prompt in all-in-one

* Add cpu_embedding param for linux api

* Update run.py

* Update README.md
2024-04-11 13:38:50 +08:00
yb-peng
2d88bb9b4b
add test api transformer_int4_fp16_gpu (#10627)
* add test api transformer_int4_fp16_gpu

* update config.yaml and README.md in all-in-one

* modify run.py in all-in-one

* re-order test-api

* re-order test-api in config

* modify README.md in all-in-one

* modify README.md in all-in-one

* modify config.yaml

---------

Co-authored-by: pengyb2001 <arda@arda-arc21.sh.intel.com>
Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
2024-04-07 15:47:17 +08:00
Wang, Jian4
9ad4b29697
LLM: CPU benchmark using tcmalloc (#10675) 2024-04-07 14:17:01 +08:00
binbin Deng
d9a1153b4e
LLM: upgrade deepspeed in AutoTP on GPU (#10647) 2024-04-07 14:05:19 +08:00
binbin Deng
27be448920
LLM: add cpu_embedding and peak memory record for deepspeed autotp script (#10621) 2024-04-02 17:32:50 +08:00
Ruonan Wang
d6af4877dd
LLM: remove ipex.optimize for gpt-j (#10606)
* remove ipex.optimize

* fix

* fix
2024-04-01 12:21:49 +08:00
WeiguangHan
fbeb10c796
LLM: Set different env based on different Linux kernels (#10566) 2024-03-27 17:56:33 +08:00
Ruonan Wang
ea4bc450c4
LLM: add esimd sdp for pvc (#10543)
* add esimd sdp for pvc

* update

* fix

* fix batch
2024-03-26 19:04:40 +08:00
Wang, Jian4
16b2ef49c6
Update_document by heyang (#30) 2024-03-25 10:06:02 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm

* rm python/llm/src/bigdl

* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00
binbin Deng
85ef3f1d99 LLM: add empty cache in deepspeed autotp benchmark script (#10488) 2024-03-21 10:51:23 +08:00
Xiangyu Tian
5a5fd5af5b LLM: Add speculative benchmark on CPU/XPU (#10464)
Add speculative benchmark on CPU/XPU.
2024-03-21 09:51:06 +08:00
Xiangyu Tian
cbe24cc7e6 LLM: Enable BigDL IPEX Int8 (#10480)
Enable BigDL IPEX Int8
2024-03-20 15:59:54 +08:00
Jin Qiao
e41d556436 LLM: change fp16 benchmark to model.half (#10477)
* LLM: change fp16 benchmark to model.half

* fix
2024-03-20 13:38:39 +08:00
Jin Qiao
e9055c32f9 LLM: fix fp16 mem record in benchmark (#10461)
* LLM: fix fp16 mem record in benchmark

* change style
2024-03-19 16:17:23 +08:00
Jin Qiao
0451103a43 LLM: add int4+fp16 benchmark script for windows benchmarking (#10449)
* LLM: add fp16 for benchmark script

* remove transformer_int4_fp16_loadlowbit_gpu_win
2024-03-19 11:11:25 +08:00
Xiangyu Tian
0ded0b4b13 LLM: Enable BigDL IPEX optimization for int4 (#10319)
Enable BigDL IPEX optimization for int4
2024-03-12 17:08:50 +08:00
Lilac09
5809a3f5fe Add run-hbm.sh & add user guide for spr and hbm (#10357)
* add run-hbm.sh

* add spr and hbm guide

* only support quad mode

* only support quad mode

* update special cases

* update special cases
2024-03-12 16:15:27 +08:00
binbin Deng
5d996a5caf LLM: add benchmark script for deepspeed autotp on gpu (#10380) 2024-03-12 15:19:57 +08:00
WeiguangHan
fd81d66047 LLM: Compress some models to save space (#10315)
* LLM: compress some models to save space

* add deleted comments
2024-03-04 17:53:03 +08:00
Yuwen Hu
27d9a14989 [LLM] all-on-one update: memory optimize and streaming output (#10302)
* Memory saving for continous in-out pair run and add support for streaming output on MTL iGPU

* Small fix

* Small fix

* Add things back
2024-03-01 18:02:30 +08:00
Keyan (Kyrie) Zhang
59861f73e5 Add Deepseek-6.7B (#9991)
* Add new example Deepseek

* Add new example Deepseek

* Add new example Deepseek

* Add new example Deepseek

* Add new example Deepseek

* modify deepseek

* modify deepseek

* Add verified model in README

* Turn cpu_embedding=True in Deepseek example

---------

Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-02-28 11:36:39 +08:00
Yuwen Hu
21de2613ce [LLM] Add model loading time record for all-in-one benchmark (#10201)
* Add model loading time record in csv for all-in-one benchmark

* Small fix

* Small fix to number after .
2024-02-22 13:57:18 +08:00
Yuwen Hu
001c13243e [LLM] Add support for low_low_bit benchmark on Windows GPU (#10167)
* Add support for low_low_bit performance test on Windows GPU

* Small fix

* Small fix

* Save memory during converting model process

* Drop the results for first time when loading in low bit on mtl igpu for better performance

* Small fix
2024-02-21 10:51:52 +08:00
dingbaorong
36c9442c6d Arc Stable version test (#10087)
* add batch_size in stable version test

* add batch_size in excludes

* add excludes for batch_size

* fix ci

* triger regression test

* fix xpu version

* disable ci

* address kai's comment

---------

Co-authored-by: Ariadne <wyn2000330@126.com>
2024-02-06 10:23:50 +08:00
WeiguangHan
c2e562d037 LLM: add batch_size to the csv and html (#10080)
* LLM: add batch_size to the csv and html

* small fix
2024-02-04 16:35:44 +08:00
WeiguangHan
d2d3f6b091 LLM: ensure the result of daily arc perf test (#10016)
* ensure the result of daily arc perf test

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* concat more csvs

* small fix

* revert some files
2024-01-31 18:26:21 +08:00
Xin Qiu
7952bbc919 add conf batch_size to run_model (#10010) 2024-01-26 15:48:48 +08:00
Ziteng Zhang
8b08ad408b Add batch_size in all_in_one (#9999)
Add batch_size in all_in_one, except run_native_int4
2024-01-25 17:43:49 +08:00
Xin Qiu
610b5226be move reserved memory to benchmark_utils.py (#9907)
* move reserved memory to benchmark_utils.py

* meet code review
2024-01-19 09:44:30 +08:00
WeiguangHan
100e0a87e5 LLM: add compressed chatglm3 model (#9892)
* LLM: add compressed chatglm3 model

* small fix

* revert github action
2024-01-18 17:48:15 +08:00
Ruonan Wang
b059a32fff LLM: add benchmark api for bigdl-llm fp16 on GPU (#9919)
* add bmk for bigdl fp16

* fix
2024-01-17 14:24:35 +08:00
WeiguangHan
0e69bfe6b0 LLM: fix the performance drop of starcoder (#9889)
* LLM: fix the performance drop of starcoder

* small fix

* small fix
2024-01-12 09:14:15 +08:00
Ziteng Zhang
4f4ce73f31 [LLM] Add transformer_autocast_bf16 into all-in-one (#9890)
* Add transformer_autocast_bf16 into all-in-one
2024-01-11 17:51:07 +08:00
WeiguangHan
33fd1f9c76 LLM: fix input length logic for run_transformer_int4_gpu (#9864)
* LLM: fix input length logic for run_transformer_int4_gpu

* small fix

* small fix

* small fix
2024-01-10 18:20:14 +08:00
Cheen Hau, 俊豪
b2aa267f50 Enhance LLM GPU installation document (#9828)
* Improve gpu install doc

* Add troubleshooting - setvars.sh not done properly.

* Further improvements

* 2024.x.x -> 2024.0

* Fixes

* Fix Install BigDL-LLM From Wheel : bigdl-llm[xpu_2.0]

* Remove "export USE_XETLA=OFF" for Max GPU
2024-01-09 16:30:50 +08:00
dingbaorong
f6bb4ab313 Arc stress test (#9795)
* add arc stress test

* triger ci

* triger CI

* triger ci

* disable ci
2023-12-27 21:02:41 +08:00
Shaojun Liu
6c75c689ea bigdl-llm stress test for stable version (#9781)
* 1k-512 2k-512 baseline

* add cpu stress test

* update yaml name

* update

* update

* clean up

* test

* update

* update

* update

* test

* update
2023-12-27 15:40:53 +08:00
dingbaorong
5cfb4c4f5b Arc stable version performance regression test (#9785)
* add arc stable version regression test

* empty gpu mem between different models

* triger ci

* comment spr test

* triger ci

* address kai's comments and disable ci

* merge fp8 and int4

* disable ci
2023-12-27 11:01:56 +08:00
WeiguangHan
c05d7e1532 LLM: add star_corder_15.5b model (#9772)
* LLM: add star_corder_15.5b model

* revert llm_performance_tests.yml
2023-12-26 18:55:56 +08:00
dingbaorong
64d05e581c add peak gpu mem stats in transformer_int4_gpu (#9766)
* add peak gpu mem stats in transformer_int4_gpu

* address weiguang's comments
2023-12-26 15:38:28 +08:00
WeiguangHan
474c099559 LLM: using separate threads to do inference (#9727)
* using separate threads to do inference

* resolve some comments

* resolve some comments

* revert llm_performance_tests.yml file
2023-12-21 17:56:43 +08:00
WeiguangHan
3e8d198b57 LLM: add eval func (#9662)
* Add eval func

* add left eval
2023-12-14 14:59:02 +08:00
Yuwen Hu
cbdd49f229 [LLM] win igpu performance for ipex 2.1 and oneapi 2024.0 (#9679)
* Change igpu win tests for ipex 2.1 and oneapi 2024.0

* Qwen model repo id updates; updates model list for 512-64

* Add .eval for win igpu all-in-one benchmark for best performance
2023-12-13 18:52:29 +08:00
Mingyu Wei
16febc949c [LLM] Add exclude option in all-in-one performance test (#9632)
* add exclude option in all-in-one perf test

* update arc-perf-test.yaml

* Exclude in_out_pairs in main function

* fix some bugs

* address Kai's comments

* define excludes at the beginning

* add bloomz:2048 to exclude
2023-12-13 18:13:06 +08:00
Yuwen Hu
968d99e6f5 Remove empty cache between each iteration of generation (#9660) 2023-12-12 17:24:06 +08:00
WeiguangHan
e9299adb3b LLM: Highlight some values in the html (#9635)
* highlight some values in the html

* revert the llm_performance_tests.yml
2023-12-07 19:02:41 +08:00
Yuwen Hu
48b85593b3 Update all-in-one benchmark readme (#9618) 2023-12-07 10:32:09 +08:00