Commit graph

165 commits

Author SHA1 Message Date
Chen, Zhentao
cb228c70ea Add harness nightly (#9552)
* modify output_path as a directory

* schedule nightly at 21 on Friday

* add tasks and models for nightly

* add accuracy regression

* comment out if to test

* mixed fp4

* for test

* add  missing delimiter

* remove comma

* fixed golden results

* add mixed 4 golden result

* add more options

* add mistral results

* get golden result of stable lm

* move nightly scripts and results to test folder

* add license

* add fp8 stable lm golden

* run on all available devices

* trigger only when ready for review

* fix new line

* update golden

* add mistral
2023-12-01 14:16:35 +08:00
Chen, Zhentao
4d7d5d4c59 Add 3 leaderboard tasks (#9566)
* update leaderboard map

* download model and dataset without overwritten

* fix task drop

* run on all available devices
2023-12-01 14:01:14 +08:00
Chen, Zhentao
c8e0c2ed48 Fixed dumped logs in harness (#9549)
* install transformers==4.34.0

* modify output_path as a directory

* add device and task to output dir parents
2023-11-30 12:47:56 +08:00
Chen, Zhentao
45820cf3b9 add optimize model option (#9530) 2023-11-24 17:10:49 +08:00
Guancheng Fu
bf579507c2 Integrate vllm (#9310)
* done

* Rename structure

* add models

* Add structure/sampling_params,sequence

* add input_metadata

* add outputs

* Add policy,logger

* add and update

* add parallelconfig back

* core/scheduler.py

* Add llm_engine.py

* Add async_llm_engine.py

* Add tested entrypoint

* fix minor error

* Fix everything

* fix kv cache view

* fix

* fix

* fix

* format&refine

* remove logger from repo

* try to add token latency

* remove logger

* Refine config.py

* finish worker.py

* delete utils.py

* add license

* refine

* refine sequence.py

* remove sampling_params.py

* finish

* add license

* format

* add license

* refine

* refine

* Refine line too long

* remove exception

* so dumb style-check

* refine

* refine

* refine

* refine

* refine

* refine

* add README

* refine README

* add warning instead error

* fix padding

* add license

* format

* format

* format fix

* Refine vllm dependency (#1)

vllm dependency clear

* fix licence

* fix format

* fix format

* fix

* adapt LLM engine

* fix

* add license

* fix format

* fix

* Moving README.md to the correct position

* Fix readme.md

* done

* guide for adding models

* fix

* Fix README.md

* Add new model readme

* remove ray-logic

* refactor arg_utils.py

* remove distributed_init_method logic

* refactor entrypoints

* refactor input_metadata

* refactor model_loader

* refactor utils.py

* refactor models

* fix api server

* remove vllm.stucture

* revert by txy 1120

* remove utils

* format

* fix license

* add bigdl model

* Refer to a specfic commit

* Change code base

* add comments

* add async_llm_engine comment

* refine

* formatted

* add worker comments

* add comments

* add comments

* fix style

* add changes

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2023-11-23 16:46:45 +08:00
Ruonan Wang
139e98aa18 LLM: quick fix benchmark (#9509) 2023-11-22 10:19:57 +08:00
WeiguangHan
c2aeb4d1e8 del model after test (#9504) 2023-11-21 18:41:50 +08:00
Cheen Hau, 俊豪
3e39828420 Update all in one benchmark readme (#9496)
* Add gperftools install to all in one benchmark readme

* Update readme
2023-11-21 14:57:16 +08:00
WeiguangHan
c487b53f21 LLM: only run arc perf test nightly (#9448)
* LLM: only run arc perf test nightly

* deleted unused python scripts

* rebase main
2023-11-15 19:38:14 +08:00
Chen, Zhentao
dbbdb53a18 fix multiple gpu usage (#9459) 2023-11-14 17:06:27 +08:00
Chen, Zhentao
d19ca21957 patch bigdl-llm model to harness by binding instead of patch file (#9420)
* add run_llb.py

* fix args interpret

* modify outputs

* update workflow

* add license

* test mixed 4 bit

* update readme

* use autotokenizer

* add timeout

* refactor workflow file

* fix working directory

* fix env

* throw exception if some jobs failed

* improve terminal outputs

* Disable var which cause the run stuck

* fix unknown precision

* fix key error

* directly output config instead

* rm harness submodule
2023-11-14 12:51:39 +08:00
Chen, Zhentao
0ecb9efb05 use AutoTokenizer to enable more models (#9446) 2023-11-13 17:47:43 +08:00
Cengguang Zhang
ece5805572 LLM: add chatglm3-6b to latency benchmark test. (#9442) 2023-11-13 17:24:37 +08:00
Chen, Zhentao
5747e2fe69 fix multiple gpu usage of harness (#9444) 2023-11-13 16:53:23 +08:00
Heyang Sun
b23b91407c fix llm-init on deepspeed missing lib (#9419) 2023-11-10 13:51:24 +08:00
Chen, Zhentao
298b64217e add auto triggered acc test (#9364)
* add auto triggered acc test

* use llama 7b instead

* fix env

* debug download

* fix download prefix

* add cut dirs

* fix env of model path

* fix dataset download

* full job

* source xpu env vars

* use matrix to trigger model run

* reset batch=1

* remove redirect

* remove some trigger

* add task matrix

* add precision list

* test llama-7b-chat

* use /mnt/disk1 to store model and datasets

* remove installation test

* correct downloading path

* fix HF vars

* add bigdl-llm env vars

* rename file

* fix hf_home

* fix script path

* rename as harness evalution

* rerun
2023-11-08 10:22:27 +08:00
WeiguangHan
84ab614aab LLM: add more models and skip runtime error (#9349)
* add more models and skip runtime error

* upgrade transformers

* temporarily removed Mistral-7B-v0.1

* temporarily disable the upload of arc perf result
2023-11-08 09:45:53 +08:00
Heyang Sun
af94058203 [LLM] Support CPU deepspeed distributed inference (#9259)
* [LLM] Support CPU Deepspeed distributed inference

* Update run_deepspeed.py

* Rename

* fix style

* add new codes

* refine

* remove annotated codes

* refine

* Update README.md

* refine doc and example code
2023-11-06 17:56:42 +08:00
Chen, Zhentao
d4dffbdb62 Merge harness (#9319)
* add harness patch and llb script

* add readme

* add license

* use patch instead

* update readme

* rename tests to evaluation

* fix typo

* remove nano dependency

* add original harness link

* rename title of usage

* rename BigDLGPULM as BigDLLM

* empty commit to rerun job
2023-11-02 15:14:19 +08:00
Ruonan Wang
7e73c354a6 LLM: decoupling bigdl-llm and bigdl-nano (#9306) 2023-11-01 11:00:54 +08:00
binbin Deng
770ac70b00 LLM: add low_bit option in benchmark scripts (#9257) 2023-10-25 10:27:48 +08:00
WeiguangHan
ec9195da42 LLM: using html to visualize the perf result for Arc (#9228)
* LLM: using html to visualize the perf result for Arc

* deploy the html file

* add python license

* reslove some comments
2023-10-24 18:05:25 +08:00
Ruonan Wang
b15656229e LLM: fix benchmark issue (#9255) 2023-10-24 14:15:05 +08:00
WeiguangHan
b9194c5786 LLM: skip some model tests using certain api (#9163)
* LLM: Skip some model tests using certain api

* initialize variable named result
2023-10-18 09:39:27 +08:00
Ruonan Wang
4f34557224 LLM: support num_beams in all-in-one benchmark (#9141)
* support num_beams

* fix
2023-10-12 13:35:12 +08:00
Ruonan Wang
62ac7ae444 LLM: fix inaccurate input / output tokens of current all-in-one benchmark (#9137)
* first fix

* fix all apis

* fix
2023-10-11 17:13:34 +08:00
Ruonan Wang
1c8d5da362 LLM: fix llama tokenizer for all-in-one benchmark (#9129)
* fix tokenizer for gpu benchmark

* fix ipex fp16

* meet code review

* fix
2023-10-11 13:39:39 +08:00
Ruonan Wang
1363e666fc LLM: update benchmark_util.py for beam search (#9126)
* update reorder_cache

* fix
2023-10-11 09:41:53 +08:00
Yuwen Hu
0e09dd926b [LLM] Fix example test (#9118)
* Update llm example test link due to example layout change

* Add better change detect
2023-10-10 13:24:18 +08:00
Ruonan Wang
ad7d9231f5 LLM: add benchmark script for Max gpu and ipex fp16 gpu (#9112)
* add pvc bash

* meet code review

* rename to run-max-gpu.sh
2023-10-10 10:18:41 +08:00
Yuwen Hu
65212451cc [LLM] Small update to performance tests (#9106)
* small updates to llm performance tests regarding model handling

* Small fix
2023-10-09 16:55:25 +08:00
Kai Huang
78ea7ddb1c Combine apply_rotary_pos_emb for gpt-neox (#9074) 2023-10-07 16:27:46 +08:00
Cengguang Zhang
ad62c58b33 LLM: Enable jemalloc in benchmark scripts. (#9058)
* enable jemalloc.

* fix readme.
2023-09-26 15:37:49 +08:00
Cengguang Zhang
26213a5829 LLM: Change benchmark bf16 load format. (#9035)
* LLM: Change benchmark bf16 load format.

* comment on bf16 chatglm.

* fix.
2023-09-22 17:38:38 +08:00
Kai Huang
6981745fe4 Optimize kv_cache for gpt-neox model family (#9015)
* override gptneox

* style

* move to utils

* revert
2023-09-20 19:59:19 +08:00
Xin Qiu
37bb0cbf8f Speed up gpt-j in gpubenchmark (#9000)
* Speedup gpt-j in gpubenchmark

* meet code review
2023-09-19 14:22:28 +08:00
Cengguang Zhang
8299b68fea update readme. (#8996) 2023-09-18 17:06:15 +08:00
Cengguang Zhang
74338fd291 LLM: add auto torch dtype in benchmark. (#8981) 2023-09-18 15:48:25 +08:00
Ruonan Wang
32716106e0 update use_cahce=True (#8986) 2023-09-18 07:59:33 +08:00
Xin Qiu
64ee1d7689 update run_transformer_int4_gpu (#8983)
* xpuperf

* update run.py

* clean upo

* uodate

* update

* meet code review
2023-09-15 15:10:04 +08:00
Cengguang Zhang
cca84b0a64 LLM: update llm benchmark scripts. (#8943)
* update llm benchmark scripts.

* change tranformer_bf16 to pytorch_autocast_bf16.

* add autocast in transformer int4.

* revert autocast.

* add "pytorch_autocast_bf16" to doc

* fix comments.
2023-09-13 12:23:28 +08:00
Xin Qiu
ea0853c0b5 update benchmark_utils readme (#8925)
* update readme

* meet code review
2023-09-08 10:30:26 +08:00
Cengguang Zhang
3d2efe9608 LLM: update llm latency benchmark. (#8922) 2023-09-07 19:00:19 +08:00
binbin Deng
7897eb4b51 LLM: add benchmark scripts on GPU (#8916) 2023-09-07 18:08:17 +08:00
Xin Qiu
d8a01d7c4f fix chatglm in run.pu (#8919) 2023-09-07 16:44:10 +08:00
Xin Qiu
e9de9d9950 benchmark for native int4 (#8918)
* native4

* update

* update

* update
2023-09-07 15:56:15 +08:00
Ruonan Wang
057e77e229 LLM: update benchmark_utils.py to handle do_sample=True (#8903) 2023-09-07 14:20:47 +08:00
Xin Qiu
5d9942a3ca transformer int4 and native int4's benchmark script for 32 256 1k 2k input (#8871)
* transformer

* move

* update

* add header

* update all-in-one

* clean up
2023-09-07 09:49:55 +08:00
Xin Qiu
49a39452c6 update benchmark (#8899) 2023-09-06 15:11:43 +08:00
Song Jiaming
7b3ac66e17 [LLM] auto performance test fix specific settings to template (#8876) 2023-09-01 15:49:04 +08:00
Song Jiaming
c06f1ca93e [LLM] auto perf test to output to csv (#8846) 2023-09-01 10:48:00 +08:00
Song Jiaming
b8b1b6888b [LLM] Performance test (#8796) 2023-08-25 14:31:45 +08:00
Ruonan Wang
e9aa2bd890 LLM: reduce GPU 1st token latency and update example (#8763)
* reduce 1st token latency

* update example

* fix

* fix style

* update readme of gpu benchmark
2023-08-16 18:01:23 +08:00
Song Jiaming
c1f9af6d97 [LLM] chatglm example and transformers low-bit examples (#8751) 2023-08-16 11:41:44 +08:00
Ruonan Wang
8805186f2f LLM: add benchmark tool for gpu (#8760)
* add benchmark tool for gpu

* update
2023-08-16 11:22:10 +08:00
Song Jiaming
e717e304a6 LLM first example test and template (#8658) 2023-08-10 10:03:11 +08:00
Ruonan Wang
64b38e1dc8 llm: benchmark tool for transformers int4 (separate 1st token and rest) (#8460)
* add benchmark utils

* fix

* fix bug and add readme

* hidden latency data
2023-07-06 09:49:52 +08:00
Junwei Deng
2fd751de7a LLM: add a dev tool for getting glibc/glibcxx requirement (#8399)
* add a dev tool

* pep8 change
2023-06-30 11:09:50 +08:00
Shengsheng Huang
02c583144c [LLM] langchain integrations and examples (#8256)
* langchain intergrations and examples

* add licences and rename

* add licences

* fix license issues and change backbone to model_family

* update examples to use model_family param

* fix linting

* fix code style

* exclude langchain integration from stylecheck

* update langchain examples and update integrations based on latets changes

* update simple llama-cpp-python style API example

* remove bloom in README

* change default n_threads to 2 and remove redundant code

---------

Co-authored-by: leonardozcm <changmin.zhao@intel.com>
2023-06-12 19:22:07 +08:00
Pingchuan Ma (Henry)
773255e009 [LLM] Add dev wheel building and basic UT script for LLM package on Linux (#8264)
* add wheel build for linux

* test fix

* test self-hosted runner

* test fix

* update runner

* update runner

* update fix

* init cicd

* init cicd

* test conda

* update fix

* update no need manual python deps

* test fix bugs

* test fix bugs

* test fix bugs

* fix bugs
2023-06-08 00:49:57 +08:00
Pingchuan Ma (Henry)
2ed5842448 [LLM] add convert's python deps for LLM (#8260)
* add python deps for LLM

* update release.sh

* change deps group name

* update all

* fix update

* test fix

* update
2023-06-06 16:01:17 +08:00
Pingchuan Ma (Henry)
c48d5f7cff [LLM] Enable UT workflow logics for LLM (#8243)
* check push connection

* enable UT workflow logics for LLM

* test fix

* add licenses

* test fix according to suggestions

* test fix

* update changes
2023-06-02 17:06:35 +08:00
Pingchuan Ma (Henry)
141febec1f Add dev wheel building script for LLM package on Windows (#8238)
* Add dev wheel building script for LLM package on Windows

* delete conda

* delete python version check

* minor adjust

* wheel name fixed

* test check

* test fix

* change wheel name
2023-06-01 11:55:26 +08:00
binbin Deng
8421af51ae LLM: support converting to ggml format (#8235)
* add convert

* fix

* fix

* fix

* try

* test

* update check

* fix

* fix
2023-05-31 15:20:06 +08:00
Pingchuan Ma (Henry)
1f913a6941 [LLM] Add LLM pep8 coding style checking (#8233)
* add LLM pep8 coding checking

* resolve bugs in testing scripts and code style revision
2023-05-30 15:58:14 +08:00