Commit graph

670 commits

Author SHA1 Message Date
Jason Dai
361781bcd0 Update readme (#9788) 2023-12-26 19:46:11 +08:00
Yuwen Hu
c38e18f2ff [LLM] Migrate iGPU perf tests to new machine (#9784)
* Move 1024 test just after 32-32 test; and enable all model for 1024-128

* Make sure python output encoding in utf-8 so that redirect to txt can always be success

* Upload results to ftp

* Small fix
2023-12-26 19:15:57 +08:00
WeiguangHan
c05d7e1532 LLM: add star_corder_15.5b model (#9772)
* LLM: add star_corder_15.5b model

* revert llm_performance_tests.yml
2023-12-26 18:55:56 +08:00
Ziteng Zhang
44b4a0c9c5 [LLM] Correct prompt format of Yi, Llama2 and Qwen in generate.py (#9786)
* correct prompt format of Yi

* correct prompt format of llama2 in cpu generate.py

* correct prompt format of Qwen in GPU example
2023-12-26 16:57:55 +08:00
Xiangyu Tian
0ea842231e [LLM] vLLM: Add api_server entrypoint (#9783)
Add vllm.entrypoints.api_server for benchmark_serving.py in vllm.
2023-12-26 16:03:57 +08:00
dingbaorong
64d05e581c add peak gpu mem stats in transformer_int4_gpu (#9766)
* add peak gpu mem stats in transformer_int4_gpu

* address weiguang's comments
2023-12-26 15:38:28 +08:00
Ziteng Zhang
87b4100054 [LLM] Support Yi model in chat.py (#9778)
* Suppot Yi model

* code style& add reference link
2023-12-26 10:03:39 +08:00
Ruonan Wang
11d883301b LLM: fix wrong batch output caused by flash attention (#9780)
* fix

* meet code review

* move batch size check to the beginning

* move qlen check inside function

* meet code review
2023-12-26 09:41:27 +08:00
Heyang Sun
66e286a73d Support for Mixtral AWQ (#9775)
* Support for Mixtral AWQ

* Update README.md

* Update README.md

* Update awq_config.py

* Update README.md

* Update README.md
2023-12-25 16:08:09 +08:00
Ruonan Wang
1917bbe626 LLM: fix BF16Linear related training & inference issue (#9755)
* fix bf16 related issue

* fix

* update based on comment & add arc lora script

* update readme

* update based on comment

* update based on comment

* update

* force to bf16

* fix style

* move check input dtype into function

* update convert

* meet code review

* meet code review

* update merged model to support new training_mode api

* fix typo
2023-12-25 14:49:30 +08:00
Xiangyu Tian
30dab36f76 [LLM] vLLM: Fix kv cache init (#9771)
Fix kv cache init
2023-12-25 14:17:06 +08:00
Yina Chen
449b387125 Support relora in bigdl-llm (#9687)
* init

* fix style

* update

* support resume & update readme

* update

* update

* remove important

* add training mode

* meet comments
2023-12-25 14:04:28 +08:00
Shaojun Liu
b6222404b8 bigdl-llm stable version: let the perf test fail if the difference between perf and baseline is greater than 5% (#9750)
* test

* test

* test

* update

* revert
2023-12-25 13:47:11 +08:00
Ziteng Zhang
986f65cea9 [LLM] Add trust_remote_code for local renamed model in bigdl_llm_model.py (#9762) 2023-12-25 11:31:14 +08:00
Yishuo Wang
be13b162fe add codeshell example (#9743) 2023-12-25 10:54:01 +08:00
Guancheng Fu
daf536fb2d vLLM: Apply attention optimizations for selective batching (#9758)
* fuse_rope for prefil

* apply kv_cache optimizations

* apply fast_decoding_path

* Re-enable kv_cache optimizations for prefill

* reduce KV_CACHE_ALLOC_BLOCK for selective_batching
2023-12-25 10:29:31 +08:00
binbin Deng
ed8ed76d4f LLM: update deepspeed autotp usage (#9733) 2023-12-25 09:41:14 +08:00
Yuwen Hu
02436c6cce [LLM] Enable more long context in-out pairs for iGPU perf tests (#9765)
* Add test for 1024-128 and enable more tests for 512-64

* Fix date in results csv name to the time when the performance is triggered

* Small fix

* Small fix

* further fixes
2023-12-22 18:18:23 +08:00
Chen, Zhentao
7fd7c37e1b Enable fp8e5 harness (#9761)
* fix precision format like fp8e5

* match fp8_e5m2
2023-12-22 16:59:48 +08:00
Qiyuan Gong
4c487313f2 Revert "[LLM] IPEX auto importer turn on by default for XPU (#9730)" (#9759)
This reverts commit 0284801fbd.
2023-12-22 16:38:24 +08:00
Qiyuan Gong
0284801fbd [LLM] IPEX auto importer turn on by default for XPU (#9730)
* Set BIGDL_IMPORT_IPEX default to true, i.e., auto import IPEX for XPU.
* Remove import intel_extension_for_pytorch as ipex from GPU example.
* Add support for bigdl-core-xe-21.
2023-12-22 16:20:32 +08:00
Chen, Zhentao
86a69e289c fix harness runner label of manual trigger (#9754)
* fix runner

* update golden
2023-12-22 15:09:22 +08:00
Guancheng Fu
fdf93c9267 Implement selective batching for vLLM (#9659)
* add control to load hf model

* finish initial version of selective_batching

* temp

* finish

* Remove print statement

* fix error

* Apply yang's optimization

* a version that works

* We need to check kv_cache passed in, this could be an error. TODO: add fast decoding path

* format

* temp solution: not batching prefill requests

* a version that works for prefill batching

* format

* a solid version: works normally

* a temp version

* Solid version: remove redundant functions

* fix format

* format

* solid: add option to enable selective_batching

* remove logic for using transformer models

* format

* format

* solid: enable argument VLLM_ENABLE_SELECTIVE_BATCHING

* format

* finish

* format
2023-12-22 13:45:46 +08:00
Ruonan Wang
2f36769208 LLM: bigdl-llm lora support & lora example (#9740)
* lora support and single card example

* support multi-card, refactor code

* fix model id and style

* remove torch patch, add two new class for bf16, update example

* fix style

* change to training_mode

* small fix

* add more info in help

* fixstyle, update readme

* fix ut

* fix ut

* Handling compatibility issues with default LoraConfig
2023-12-22 11:05:39 +08:00
SONG Ge
ba0b939579 [LLM] Support transformers-v4.36.0 on mistral model (#9744)
* add support transformers-v4.36.0 on mistral model

* python/llm/src/bigdl/llm/transformers/models/mistral.py

* make the redundant implementation as utils

* fix code style

* fix

* fix style

* update with utils enough_kv_room
2023-12-22 09:59:27 +08:00
Xin Qiu
e36111e713 mixstral fused qkv and rope (#9724)
* mixstral fused qkv and rope

* fix and clean

* fix style

* update

* update

* fix

* update

* fix
2023-12-22 09:26:35 +08:00
Jiao Wang
e4f6e43675 safetenor to false (#9728) 2023-12-21 14:41:51 -08:00
Shaojun Liu
bb52239e0a bigdl-llm stable version release & test (#9732)
* stable version test

* trigger spr test

* update

* trigger

* test

* test

* test

* test

* test

* refine

* release linux first
2023-12-21 22:55:33 +08:00
WeiguangHan
d4d2ccdd9d LLM: remove startcorder-15.5b (#9748) 2023-12-21 18:52:52 +08:00
WeiguangHan
474c099559 LLM: using separate threads to do inference (#9727)
* using separate threads to do inference

* resolve some comments

* resolve some comments

* revert llm_performance_tests.yml file
2023-12-21 17:56:43 +08:00
Yishuo Wang
426660b88e simplify qwen attention (#9747) 2023-12-21 17:53:29 +08:00
Wang, Jian4
984697afe2 LLM: Add bloom gguf support (#9734)
* init

* update bloom add merges

* update

* update readme

* update for llama error

* update
2023-12-21 14:06:25 +08:00
Heyang Sun
df775cf316 fix python style (#9742)
* fix python style

* fix

* fix
2023-12-21 11:25:05 +08:00
Chen, Zhentao
b06a3146c8 Fix 70b oom (#9738)
* add default value to bigdl llm

* fix model oom
2023-12-21 10:40:52 +08:00
Xin Qiu
6c3e698bf1 mistral decoding_fast_path and fused mlp (#9714)
* mistral decoding_fast_path and fused mlp

* meet code review
2023-12-21 10:11:37 +08:00
Heyang Sun
d157f623b6 Load Mixtral gguf in a block-wise way (#9725)
* Load Mixtral gguf in a block-wise way

* refine
2023-12-21 10:03:23 +08:00
WeiguangHan
34bb804189 LLM: check csv and its corresponding yaml file (#9702)
* LLM: check csv and its corresponding yaml file

* run PR arc perf test

* modify the name of some variables

* execute the check results script in right place

* use cp to replace mv command

* resolve some comments

* resolve more comments

* revert the llm_performance_test.yaml file
2023-12-21 09:54:33 +08:00
Zhao Changmin
4bda975a3e LLM: Align lowbit model config (#9735)
* align lowbit model config
2023-12-21 09:48:58 +08:00
Wang, Jian4
e1e921f425 LLM: gguf other model using dtype (#9729) 2023-12-21 09:33:40 +08:00
Yishuo Wang
13ea6330bd optimize qwen rope (#9737) 2023-12-20 17:34:34 +08:00
Ziteng Zhang
4c032a433e [LLM] Add glibc checker (#9624)
* Add glibc checker
* Add env BIGDL_GLIBC_CHECK to control glibc checker. The default is false, i.e., don't check.
2023-12-20 16:52:43 +08:00
Yina Chen
cd652a1710 Support fp8 e5m2 on arc (#9711)
* init

* fix style

* update

* fix style

* update
2023-12-20 16:26:17 +08:00
Yishuo Wang
e54c428d30 add bf16/fp16 fuse mlp support (#9726) 2023-12-20 10:40:45 +08:00
Heyang Sun
612651cb5d fix typo (#9723) 2023-12-20 09:41:59 +08:00
WeiguangHan
3aa8b66bc3 LLM: remove starcoder-15.5b model temporarily (#9720) 2023-12-19 20:14:46 +08:00
Yishuo Wang
522cf5ed82 [LLM] Improve chatglm2/3 rest token performance with long context (#9716) 2023-12-19 17:29:38 +08:00
Yishuo Wang
f2e6abb563 fix mlp batch size check (#9718) 2023-12-19 14:22:22 +08:00
Heyang Sun
1fa7793fc0 Load Mixtral GGUF Model (#9690)
* Load Mixtral GGUF Model

* refactor

* fix empty tensor when to cpu

* update gpu and cpu readmes

* add dtype when set tensor into module
2023-12-19 13:54:38 +08:00
Qiyuan Gong
d0a3095b97 [LLM] IPEX auto importer (#9706)
* IPEX auto importer and get_ipex_version.
* Add BIGDL_IMPORT_IPEX to control auto import, default is false.
2023-12-19 13:39:38 +08:00
Yang Wang
f4fb58d99c fusing qkv project and rope (#9612)
* Try fusing qkv project and rope

* add fused mlp

* fuse append cache

* fix style and clean up code

* clean up
2023-12-18 16:45:00 -08:00