Jinyi Wan
3147ebe63d
Add cpu and gpu examples for SOLAR-10.7B ( #9821 )
2024-01-05 09:50:28 +08:00
WeiguangHan
ad6b182916
LLM: change the color of peak diff ( #9836 )
2024-01-04 19:30:32 +08:00
Xiangyu Tian
38c05be1c0
[LLM] Fix dtype mismatch in Baichuan2-13b ( #9834 )
2024-01-04 15:34:42 +08:00
Ruonan Wang
8504a2bbca
LLM: update qlora alpaca example to change lora usage ( #9835 )
...
* update example
* fix style
2024-01-04 15:22:20 +08:00
Ziteng Zhang
05b681fa85
[LLM] IPEX auto importer set on by default ( #9832 )
...
* Set BIGDL_IMPORT_IPEX default to True
* Remove import intel_extension_for_pytorch as ipex from GPU example
2024-01-04 13:33:29 +08:00
Wang, Jian4
4ceefc9b18
LLM: Support bitsandbytes config on qlora finetune ( #9715 )
...
* test support bitsandbytesconfig
* update style
* update cpu example
* update example
* update readme
* update unit test
* use bfloat16
* update logic
* use int4
* set defalut bnb_4bit_use_double_quant
* update
* update example
* update model.py
* update
* support lora example
2024-01-04 11:23:16 +08:00
WeiguangHan
9a14465560
LLM: add peak diff ( #9789 )
...
* add peak diff
* small fix
* revert yml file
2024-01-03 18:18:19 +08:00
Mingyu Wei
f4eb5da42d
disable arc ut ( #9825 )
2024-01-03 18:10:34 +08:00
Ruonan Wang
20e9742fa0
LLM: fix chatglm3 issue ( #9820 )
...
* fix chatglm3 issue
* small update
2024-01-03 16:15:55 +08:00
Wang, Jian4
a54cd767b1
LLM: Add gguf falcon ( #9801 )
...
* init falcon
* update convert.py
* update style
2024-01-03 14:49:02 +08:00
Guancheng Fu
0396fafed1
Update BigDL-LLM-inference image ( #9805 )
...
* upgrade to oneapi 2024
* Pin level-zero-gpu version
* add flag
2024-01-03 14:00:09 +08:00
Yishuo Wang
5c6543e070
Reorganize LLM GPU installation document ( #9777 )
2024-01-03 13:53:05 +08:00
Jason Dai
3ab3105bab
Update readme ( #9816 )
2024-01-03 12:07:00 +08:00
Yuwen Hu
668c2095b1
Remove unnecessary warning when installing llm ( #9815 )
2024-01-03 10:30:05 +08:00
dingbaorong
f5752ead36
Add whisper test ( #9808 )
...
* add whisper benchmark code
* add librispeech_asr.py
* add bigdl license
2024-01-02 16:36:05 +08:00
binbin Deng
6584539c91
LLM: fix installation of codellama ( #9813 )
2024-01-02 14:32:50 +08:00
Kai Huang
4d01069302
Temp remove baichuan2-13b 1k from arc perf test ( #9810 )
2023-12-29 12:54:13 +08:00
dingbaorong
a2e668a61d
fix arc ut test ( #9736 )
2023-12-28 16:55:34 +08:00
Qiyuan Gong
f0f9d45eac
[LLM] IPEX import support bigdl-core-xe-21 ( #9769 )
...
Add support for bigdl-core-xe-21.
2023-12-28 15:23:58 +08:00
dingbaorong
a8baf68865
fix csv_to_html ( #9802 )
2023-12-28 14:58:51 +08:00
Guancheng Fu
5857a38321
[vLLM] Add option to adjust KV_CACHE_ALLOC_BLOCK_LENGTH ( #9782 )
...
* add option kv_cache_block
* change var name
2023-12-28 14:41:47 +08:00
Ruonan Wang
99bddd3ab4
LLM: better FP16 support for Intel GPUs ( #9791 )
...
* initial support
* fix
* fix style
* fix
* limi esimd usage condition
* refactor code
* fix style
* small fix
* meet code review
* small fix
2023-12-28 13:30:13 +08:00
Yishuo Wang
7d9f6c6efc
fix cpuinfo error ( #9793 )
2023-12-28 09:23:44 +08:00
Wang, Jian4
7ed9538b9f
LLM: support gguf mpt ( #9773 )
...
* add gguf mpt
* update
2023-12-28 09:22:39 +08:00
Cengguang Zhang
d299f108d0
update falcon attention forward. ( #9796 )
2023-12-28 09:11:59 +08:00
Shaojun Liu
a5e5c3daec
set warm_up: 3 num_trials: 50 for cpu stress test ( #9799 )
2023-12-28 08:55:43 +08:00
dingbaorong
f6bb4ab313
Arc stress test ( #9795 )
...
* add arc stress test
* triger ci
* triger CI
* triger ci
* disable ci
2023-12-27 21:02:41 +08:00
Kai Huang
40eaf76ae3
Add baichuan2-13b to Arc perf ( #9794 )
...
* add baichuan2-13b
* fix indent
* revert
2023-12-27 19:38:53 +08:00
Yuwen Hu
dfe28c58bb
Small upload fix for igpu-perf test ( #9792 )
2023-12-27 15:50:58 +08:00
Shaojun Liu
6c75c689ea
bigdl-llm stress test for stable version ( #9781 )
...
* 1k-512 2k-512 baseline
* add cpu stress test
* update yaml name
* update
* update
* clean up
* test
* update
* update
* update
* test
* update
2023-12-27 15:40:53 +08:00
dingbaorong
5cfb4c4f5b
Arc stable version performance regression test ( #9785 )
...
* add arc stable version regression test
* empty gpu mem between different models
* triger ci
* comment spr test
* triger ci
* address kai's comments and disable ci
* merge fp8 and int4
* disable ci
2023-12-27 11:01:56 +08:00
binbin Deng
40edb7b5d7
LLM: fix get environment variables setting ( #9787 )
2023-12-27 09:11:37 +08:00
Kai Huang
689889482c
Reduce max_cache_pos to reduce Baichuan2-13B memory ( #9694 )
...
* optimize baichuan2 memory
* fix
* style
* fp16 mask
* disable fp16
* fix style
* empty cache
* revert empty cache
2023-12-26 19:51:25 +08:00
Jason Dai
361781bcd0
Update readme ( #9788 )
2023-12-26 19:46:11 +08:00
Yuwen Hu
c38e18f2ff
[LLM] Migrate iGPU perf tests to new machine ( #9784 )
...
* Move 1024 test just after 32-32 test; and enable all model for 1024-128
* Make sure python output encoding in utf-8 so that redirect to txt can always be success
* Upload results to ftp
* Small fix
2023-12-26 19:15:57 +08:00
WeiguangHan
c05d7e1532
LLM: add star_corder_15.5b model ( #9772 )
...
* LLM: add star_corder_15.5b model
* revert llm_performance_tests.yml
2023-12-26 18:55:56 +08:00
Ziteng Zhang
44b4a0c9c5
[LLM] Correct prompt format of Yi, Llama2 and Qwen in generate.py ( #9786 )
...
* correct prompt format of Yi
* correct prompt format of llama2 in cpu generate.py
* correct prompt format of Qwen in GPU example
2023-12-26 16:57:55 +08:00
Xiangyu Tian
0ea842231e
[LLM] vLLM: Add api_server entrypoint ( #9783 )
...
Add vllm.entrypoints.api_server for benchmark_serving.py in vllm.
2023-12-26 16:03:57 +08:00
dingbaorong
64d05e581c
add peak gpu mem stats in transformer_int4_gpu ( #9766 )
...
* add peak gpu mem stats in transformer_int4_gpu
* address weiguang's comments
2023-12-26 15:38:28 +08:00
Ziteng Zhang
87b4100054
[LLM] Support Yi model in chat.py ( #9778 )
...
* Suppot Yi model
* code style& add reference link
2023-12-26 10:03:39 +08:00
Ruonan Wang
11d883301b
LLM: fix wrong batch output caused by flash attention ( #9780 )
...
* fix
* meet code review
* move batch size check to the beginning
* move qlen check inside function
* meet code review
2023-12-26 09:41:27 +08:00
Heyang Sun
66e286a73d
Support for Mixtral AWQ ( #9775 )
...
* Support for Mixtral AWQ
* Update README.md
* Update README.md
* Update awq_config.py
* Update README.md
* Update README.md
2023-12-25 16:08:09 +08:00
Ruonan Wang
1917bbe626
LLM: fix BF16Linear related training & inference issue ( #9755 )
...
* fix bf16 related issue
* fix
* update based on comment & add arc lora script
* update readme
* update based on comment
* update based on comment
* update
* force to bf16
* fix style
* move check input dtype into function
* update convert
* meet code review
* meet code review
* update merged model to support new training_mode api
* fix typo
2023-12-25 14:49:30 +08:00
Xiangyu Tian
30dab36f76
[LLM] vLLM: Fix kv cache init ( #9771 )
...
Fix kv cache init
2023-12-25 14:17:06 +08:00
Yina Chen
449b387125
Support relora in bigdl-llm ( #9687 )
...
* init
* fix style
* update
* support resume & update readme
* update
* update
* remove important
* add training mode
* meet comments
2023-12-25 14:04:28 +08:00
Shaojun Liu
b6222404b8
bigdl-llm stable version: let the perf test fail if the difference between perf and baseline is greater than 5% ( #9750 )
...
* test
* test
* test
* update
* revert
2023-12-25 13:47:11 +08:00
Ziteng Zhang
986f65cea9
[LLM] Add trust_remote_code for local renamed model in bigdl_llm_model.py ( #9762 )
2023-12-25 11:31:14 +08:00
Yishuo Wang
be13b162fe
add codeshell example ( #9743 )
2023-12-25 10:54:01 +08:00
Guancheng Fu
daf536fb2d
vLLM: Apply attention optimizations for selective batching ( #9758 )
...
* fuse_rope for prefil
* apply kv_cache optimizations
* apply fast_decoding_path
* Re-enable kv_cache optimizations for prefill
* reduce KV_CACHE_ALLOC_BLOCK for selective_batching
2023-12-25 10:29:31 +08:00
binbin Deng
ed8ed76d4f
LLM: update deepspeed autotp usage ( #9733 )
2023-12-25 09:41:14 +08:00