Xin Qiu
dacf680294
add fused rotary pos emb for qwen ( #9956 )
...
* add fused rotary pos emb for qwen
* update
2024-01-23 10:37:56 +08:00
Ruonan Wang
7b1d9ad7c0
LLM: limit esimd sdp usage for k_len < 8 ( #9959 )
...
* update
* fix
2024-01-23 09:28:23 +08:00
Ruonan Wang
3e601f9a5d
LLM: Support speculative decoding in bigdl-llm ( #9951 )
...
* first commit
* fix error, add llama example
* hidden print
* update api usage
* change to api v3
* update
* meet code review
* meet code review, fix style
* add reference, fix style
* fix style
* fix first token time
2024-01-22 19:14:56 +08:00
Jinyi Wan
6341c498b3
Fix the links of BlueLM and SOLAR ( #9954 )
2024-01-22 15:58:10 +08:00
Cheen Hau, 俊豪
947b1e27b7
Add readme for Whisper Test ( #9944 )
...
* Fix local data path
* Remove non-essential files
* Add readme
* Minor fixes to script
* Bugfix, refactor
* Add references to original source. Bugfixes.
* Reviewer comments
* Properly print and explain output
* Move files to dev/benchmark
* Fixes
2024-01-22 15:11:33 +08:00
Xin Qiu
6fb3f40f7e
fix error for benchmark_util.py running on cpu ( #9949 )
2024-01-22 10:14:40 +08:00
Heyang Sun
fb91c97fe8
support for Baichuan/Baichuan2 13B Chat running speculative decoding ( #9921 )
...
* support for Baichuan/Baichuan2 13B Chat running speculative decoding
* fix stype
2024-01-22 09:11:44 +08:00
Xin Qiu
97f0cd8975
optimize Decilm 7b ( #9922 )
...
* optimize deci
* update
* decilm attension forward
2024-01-19 17:31:13 +08:00
Wang, Jian4
bcaeb05272
Update optimize qwen ( #9943 )
...
* update for n tokens input
* fix dtype
* update
2024-01-19 16:54:59 +08:00
binbin Deng
db8e90796a
LLM: add avg token latency information and benchmark guide of autotp ( #9940 )
2024-01-19 15:09:57 +08:00
Ruonan Wang
bf37b3a670
LLM: optimize CPU speculative decoding of chatglm3 ( #9928 )
...
* update
* fix style
* meet code review
2024-01-19 14:10:22 +08:00
Shaojun Liu
967714bac8
gguf memory optimization for mixtral ( #9939 )
2024-01-19 11:13:15 +08:00
Xin Qiu
610b5226be
move reserved memory to benchmark_utils.py ( #9907 )
...
* move reserved memory to benchmark_utils.py
* meet code review
2024-01-19 09:44:30 +08:00
Lilac09
7032a2ad73
Optimize gguf load memory for mistral ( #9923 )
...
* optimize gguf load for mistral
* fix output of gguf mistral
* reset
2024-01-19 09:14:39 +08:00
Shaojun Liu
9a46f019d7
gguf memory optimization for baichuan ( #9937 )
2024-01-19 09:11:02 +08:00
Guancheng Fu
2e1448f08e
[Serving] Add vllm_worker to fastchat serving framework ( #9934 )
...
* add worker
* finish
* finish
* add license
* add more comments
2024-01-18 21:33:36 +08:00
Chen, Zhentao
a8c866c32b
add ppl benchmark ( #9914 )
...
* add ppl benchmark
* add license
* add readme
* add dataset argument
* add dataset usage
* fixed low bit args
* correct result
* fix terminal display
* fix ppl update
* enable fp16 fp32 bf16
* format the desc
* fix model_kwargs
* add more readme
2024-01-18 17:54:28 +08:00
WeiguangHan
100e0a87e5
LLM: add compressed chatglm3 model ( #9892 )
...
* LLM: add compressed chatglm3 model
* small fix
* revert github action
2024-01-18 17:48:15 +08:00
Yuwen Hu
9e2ac5291b
Add rwkv v4 back for igpu perf test 32-512 ( #9938 )
2024-01-18 17:15:28 +08:00
Yishuo Wang
7bbb98abb6
Disable fused layer norm when using XMX to fix mpt UT ( #9933 )
2024-01-18 16:22:12 +08:00
Wang, Jian4
1fc9dfa265
LLM: Update for Qwen n tokens inputs ( #9931 )
...
* update for n tokens inputs
* update style
* update
2024-01-18 15:56:29 +08:00
Heyang Sun
5184f400f9
Fix Mixtral GGUF Wrong Output Issue ( #9930 )
...
* Fix Mixtral GGUF Wrong Output Issue
* fix style
* fix style
2024-01-18 14:11:27 +08:00
Yishuo Wang
453df868c9
add rwkv v5 attention kernel ( #9927 )
2024-01-18 10:16:29 +08:00
Ruonan Wang
054952f82f
LLM: Fix rope of chatglm3 to support speculative decoding on CPU ( #9926 )
2024-01-18 09:28:10 +08:00
Ziteng Zhang
18cd1f1432
[LLM]Solve the problem of calling bmm operator in BF16Linear ( #9924 )
...
* Solve the problem of calling bmm operator in BF16Linear
2024-01-17 18:08:35 +08:00
Cheen Hau, 俊豪
e403e4a8b7
Add APT install instructions for oneAPI ( #9875 )
...
* Add APT installer
* Include reviewer suggestions
* Add note - ensure matching version of oneAPI and pytorch/ipex
* Fix 'command line installer'
* Fix formatting. Address review comments
* Append ':' to '..by running the following commands'
* Fix formatting
* achieve -> archive
2024-01-17 17:30:30 +08:00
Yina Chen
98b86f83d4
Support fast rope for training ( #9745 )
...
* init
* init
* fix style
* add test and fix
* address comment
* update
* merge upstream main
2024-01-17 15:51:38 +08:00
Yuwen Hu
0c498a7b64
Add llama2-13b to igpu perf test ( #9920 )
2024-01-17 14:58:45 +08:00
Ruonan Wang
b059a32fff
LLM: add benchmark api for bigdl-llm fp16 on GPU ( #9919 )
...
* add bmk for bigdl fp16
* fix
2024-01-17 14:24:35 +08:00
Ruonan Wang
427f75000b
LLM: fix sdp of chatglm3 ( #9917 )
...
* fix
* fix
* fix
2024-01-17 13:37:28 +08:00
Yuwen Hu
68d78fb57e
[LLM] Small improvement to iGPU perf test #9915 )
...
- Avoid delete csv if there is something wrong with concating csv
2024-01-17 11:21:58 +08:00
Shaojun Liu
32c56ffc71
pip install deps ( #9916 )
2024-01-17 11:03:57 +08:00
Yishuo Wang
94767da7cf
optimize rwkv v4 first token performance ( #9912 )
2024-01-17 09:27:41 +08:00
Cengguang Zhang
511cbcf773
LLM: add Ceval benchmark test. ( #9872 )
...
* init ceval benchmark test.
* upload dataset.
* add other tests.
* add qwen evaluator.
* fix qwen evaluator style.
* fix qwen evaluator style.
* update qwen evaluator.
* add llama evaluator.
* update eval
* fix typo.
* fix
* fix typo.
* fix llama evaluator.
* fix bug.
* fix style.
* delete dataset.
* fix style.
* fix style.
* add README.md and fix typo.
* fix comments.
* remove run scripts
2024-01-16 19:14:26 +08:00
Shaojun Liu
b909c5c9c2
GGUF load memory optimization ( #9913 )
...
* block-wise
* convert linear for module
* revert
* Fix PEP8 checks Error
2024-01-16 18:54:39 +08:00
Yuwen Hu
8643b62521
[LLM] Support longer context in iGPU perf tests (2048-256) ( #9910 )
2024-01-16 17:48:37 +08:00
Xin Qiu
dee32f7d15
copy fused rms norm's reuslt to avoid <unk> ( #9909 )
2024-01-16 16:54:08 +08:00
ZehuaCao
05ea0ecd70
add pv for llm-serving k8s deployment ( #9906 )
2024-01-16 11:32:54 +08:00
Ruonan Wang
8d7326ae03
LLM: fix chatglm3 sdp to support speculative decoding ( #9900 )
...
* fix chatglm3
* fix
* update
* meet code review
* fix
2024-01-16 11:29:13 +08:00
Guancheng Fu
9f34da7cdb
Update PVC XMX condition ( #9901 )
...
* update pvc xmx condition
* update condition
* update conditon
2024-01-15 15:42:15 +08:00
Yishuo Wang
6637860ddf
change xmx condition ( #9896 )
2024-01-12 19:51:48 +08:00
WeiguangHan
0e69bfe6b0
LLM: fix the performance drop of starcoder ( #9889 )
...
* LLM: fix the performance drop of starcoder
* small fix
* small fix
2024-01-12 09:14:15 +08:00
Ruonan Wang
d9cf55bce9
LLM: fix MLP check of mixtral ( #9891 )
2024-01-11 18:01:59 +08:00
Ziteng Zhang
4f4ce73f31
[LLM] Add transformer_autocast_bf16 into all-in-one ( #9890 )
...
* Add transformer_autocast_bf16 into all-in-one
2024-01-11 17:51:07 +08:00
Ziteng Zhang
4af88a67b9
support chatglm3 with bf16 ( #9888 )
...
* support chatglm3 with bigdl-bf16
2024-01-11 16:45:21 +08:00
Yuwen Hu
0aef35a965
[LLM] Improve LLM doc regarding windows gpu related info ( #9880 )
...
* Improve runtime configuration for windows
* Add python 310/311 supports for wheel downloading
* Add troubleshooting for windows gpu
* Remove manually import ipex due to auto importer
* Add info regarding cpu_embedding=True on iGPU
* More info for Windows users
* Small updates to API docs
* Python style fix
* Remove tip for loading from saved optimize_model for now
* Updated based on comments
* Update win info for multi-intel gpus selection
* Small fix
* Small fix
2024-01-11 14:37:16 +08:00
Jinyi Wan
07485eff5a
Add SOLAR-10.7B to README ( #9869 )
2024-01-11 14:28:41 +08:00
Kai Huang
5e766e8105
Fix Mixtral typo ( #9882 )
2024-01-10 19:51:24 +08:00
Kai Huang
b53a5cb6c9
Fix Mixtral typo ( #9881 )
...
* fix typo
* fix doc page
2024-01-10 19:40:52 +08:00
WeiguangHan
33fd1f9c76
LLM: fix input length logic for run_transformer_int4_gpu ( #9864 )
...
* LLM: fix input length logic for run_transformer_int4_gpu
* small fix
* small fix
* small fix
2024-01-10 18:20:14 +08:00