WeiguangHan
|
be5836bee1
|
LLM: fix outlier value (#9945)
* fix outlier value
* small fix
|
2024-01-23 17:04:13 +08:00 |
|
Yishuo Wang
|
2c8a9aaf0d
|
fix qwen causal mask when quantize_kv_cache=True (#9968)
|
2024-01-23 16:34:05 +08:00 |
|
Yina Chen
|
5aa4b32c1b
|
LLM: Add qwen spec gpu example (#9965)
* add qwen spec gpu example
* update readme
---------
Co-authored-by: rnwang04 <ruonan1.wang@intel.com>
|
2024-01-23 15:59:43 +08:00 |
|
Yina Chen
|
36c665667d
|
Add logits processor & qwen eos stop in speculative decoding (#9963)
* add logits processor & qwen eos
* fix style
* fix
* fix
* fix style
* fix style
* support transformers 4.31
* fix style
* fix style
---------
Co-authored-by: rnwang04 <ruonan1.wang@intel.com>
|
2024-01-23 15:57:28 +08:00 |
|
Ruonan Wang
|
60b35db1f1
|
LLM: add chatglm3 speculative decoding example (#9966)
* add chatglm3 example
* update
* fix
|
2024-01-23 15:54:12 +08:00 |
|
Xin Qiu
|
da4687c917
|
fix fp16 (#9970)
|
2024-01-23 15:53:32 +08:00 |
|
Lilac09
|
052962dfa5
|
Using original fastchat and add bigdl worker in docker image (#9967)
* add vllm worker
* add options in entrypoint
|
2024-01-23 14:17:05 +08:00 |
|
Chen, Zhentao
|
301425e377
|
harness tests on pvc multiple xpus (#9908)
* add run_multi_llb.py
* update readme
* add job hint
|
2024-01-23 13:20:37 +08:00 |
|
Ruonan Wang
|
27b19106f3
|
LLM: add readme for speculative decoding gpu examples (#9961)
* add readme
* add readme
* meet code review
|
2024-01-23 12:54:19 +08:00 |
|
Chen, Zhentao
|
39219b7e9a
|
add default device meta when lcmu enabled (#9941)
|
2024-01-23 11:00:49 +08:00 |
|
Xin Qiu
|
dacf680294
|
add fused rotary pos emb for qwen (#9956)
* add fused rotary pos emb for qwen
* update
|
2024-01-23 10:37:56 +08:00 |
|
Ruonan Wang
|
7b1d9ad7c0
|
LLM: limit esimd sdp usage for k_len < 8 (#9959)
* update
* fix
|
2024-01-23 09:28:23 +08:00 |
|
Ruonan Wang
|
3e601f9a5d
|
LLM: Support speculative decoding in bigdl-llm (#9951)
* first commit
* fix error, add llama example
* hidden print
* update api usage
* change to api v3
* update
* meet code review
* meet code review, fix style
* add reference, fix style
* fix style
* fix first token time
|
2024-01-22 19:14:56 +08:00 |
|
Jinyi Wan
|
6341c498b3
|
Fix the links of BlueLM and SOLAR (#9954)
|
2024-01-22 15:58:10 +08:00 |
|
Cheen Hau, 俊豪
|
947b1e27b7
|
Add readme for Whisper Test (#9944)
* Fix local data path
* Remove non-essential files
* Add readme
* Minor fixes to script
* Bugfix, refactor
* Add references to original source. Bugfixes.
* Reviewer comments
* Properly print and explain output
* Move files to dev/benchmark
* Fixes
|
2024-01-22 15:11:33 +08:00 |
|
Xin Qiu
|
6fb3f40f7e
|
fix error for benchmark_util.py running on cpu (#9949)
|
2024-01-22 10:14:40 +08:00 |
|
Heyang Sun
|
fb91c97fe8
|
support for Baichuan/Baichuan2 13B Chat running speculative decoding (#9921)
* support for Baichuan/Baichuan2 13B Chat running speculative decoding
* fix stype
|
2024-01-22 09:11:44 +08:00 |
|
Xin Qiu
|
97f0cd8975
|
optimize Decilm 7b (#9922)
* optimize deci
* update
* decilm attension forward
|
2024-01-19 17:31:13 +08:00 |
|
Wang, Jian4
|
bcaeb05272
|
Update optimize qwen (#9943)
* update for n tokens input
* fix dtype
* update
|
2024-01-19 16:54:59 +08:00 |
|
binbin Deng
|
db8e90796a
|
LLM: add avg token latency information and benchmark guide of autotp (#9940)
|
2024-01-19 15:09:57 +08:00 |
|
Ruonan Wang
|
bf37b3a670
|
LLM: optimize CPU speculative decoding of chatglm3 (#9928)
* update
* fix style
* meet code review
|
2024-01-19 14:10:22 +08:00 |
|
Shaojun Liu
|
967714bac8
|
gguf memory optimization for mixtral (#9939)
|
2024-01-19 11:13:15 +08:00 |
|
Xin Qiu
|
610b5226be
|
move reserved memory to benchmark_utils.py (#9907)
* move reserved memory to benchmark_utils.py
* meet code review
|
2024-01-19 09:44:30 +08:00 |
|
Lilac09
|
7032a2ad73
|
Optimize gguf load memory for mistral (#9923)
* optimize gguf load for mistral
* fix output of gguf mistral
* reset
|
2024-01-19 09:14:39 +08:00 |
|
Shaojun Liu
|
9a46f019d7
|
gguf memory optimization for baichuan (#9937)
|
2024-01-19 09:11:02 +08:00 |
|
Guancheng Fu
|
2e1448f08e
|
[Serving] Add vllm_worker to fastchat serving framework (#9934)
* add worker
* finish
* finish
* add license
* add more comments
|
2024-01-18 21:33:36 +08:00 |
|
Chen, Zhentao
|
a8c866c32b
|
add ppl benchmark (#9914)
* add ppl benchmark
* add license
* add readme
* add dataset argument
* add dataset usage
* fixed low bit args
* correct result
* fix terminal display
* fix ppl update
* enable fp16 fp32 bf16
* format the desc
* fix model_kwargs
* add more readme
|
2024-01-18 17:54:28 +08:00 |
|
WeiguangHan
|
100e0a87e5
|
LLM: add compressed chatglm3 model (#9892)
* LLM: add compressed chatglm3 model
* small fix
* revert github action
|
2024-01-18 17:48:15 +08:00 |
|
Yuwen Hu
|
9e2ac5291b
|
Add rwkv v4 back for igpu perf test 32-512 (#9938)
|
2024-01-18 17:15:28 +08:00 |
|
Yishuo Wang
|
7bbb98abb6
|
Disable fused layer norm when using XMX to fix mpt UT (#9933)
|
2024-01-18 16:22:12 +08:00 |
|
Wang, Jian4
|
1fc9dfa265
|
LLM: Update for Qwen n tokens inputs (#9931)
* update for n tokens inputs
* update style
* update
|
2024-01-18 15:56:29 +08:00 |
|
Heyang Sun
|
5184f400f9
|
Fix Mixtral GGUF Wrong Output Issue (#9930)
* Fix Mixtral GGUF Wrong Output Issue
* fix style
* fix style
|
2024-01-18 14:11:27 +08:00 |
|
Yishuo Wang
|
453df868c9
|
add rwkv v5 attention kernel (#9927)
|
2024-01-18 10:16:29 +08:00 |
|
Ruonan Wang
|
054952f82f
|
LLM: Fix rope of chatglm3 to support speculative decoding on CPU (#9926)
|
2024-01-18 09:28:10 +08:00 |
|
Ziteng Zhang
|
18cd1f1432
|
[LLM]Solve the problem of calling bmm operator in BF16Linear (#9924)
* Solve the problem of calling bmm operator in BF16Linear
|
2024-01-17 18:08:35 +08:00 |
|
Cheen Hau, 俊豪
|
e403e4a8b7
|
Add APT install instructions for oneAPI (#9875)
* Add APT installer
* Include reviewer suggestions
* Add note - ensure matching version of oneAPI and pytorch/ipex
* Fix 'command line installer'
* Fix formatting. Address review comments
* Append ':' to '..by running the following commands'
* Fix formatting
* achieve -> archive
|
2024-01-17 17:30:30 +08:00 |
|
Yina Chen
|
98b86f83d4
|
Support fast rope for training (#9745)
* init
* init
* fix style
* add test and fix
* address comment
* update
* merge upstream main
|
2024-01-17 15:51:38 +08:00 |
|
Yuwen Hu
|
0c498a7b64
|
Add llama2-13b to igpu perf test (#9920)
|
2024-01-17 14:58:45 +08:00 |
|
Ruonan Wang
|
b059a32fff
|
LLM: add benchmark api for bigdl-llm fp16 on GPU (#9919)
* add bmk for bigdl fp16
* fix
|
2024-01-17 14:24:35 +08:00 |
|
Ruonan Wang
|
427f75000b
|
LLM: fix sdp of chatglm3 (#9917)
* fix
* fix
* fix
|
2024-01-17 13:37:28 +08:00 |
|
Yuwen Hu
|
68d78fb57e
|
[LLM] Small improvement to iGPU perf test #9915)
- Avoid delete csv if there is something wrong with concating csv
|
2024-01-17 11:21:58 +08:00 |
|
Shaojun Liu
|
32c56ffc71
|
pip install deps (#9916)
|
2024-01-17 11:03:57 +08:00 |
|
Yishuo Wang
|
94767da7cf
|
optimize rwkv v4 first token performance (#9912)
|
2024-01-17 09:27:41 +08:00 |
|
Cengguang Zhang
|
511cbcf773
|
LLM: add Ceval benchmark test. (#9872)
* init ceval benchmark test.
* upload dataset.
* add other tests.
* add qwen evaluator.
* fix qwen evaluator style.
* fix qwen evaluator style.
* update qwen evaluator.
* add llama evaluator.
* update eval
* fix typo.
* fix
* fix typo.
* fix llama evaluator.
* fix bug.
* fix style.
* delete dataset.
* fix style.
* fix style.
* add README.md and fix typo.
* fix comments.
* remove run scripts
|
2024-01-16 19:14:26 +08:00 |
|
Shaojun Liu
|
b909c5c9c2
|
GGUF load memory optimization (#9913)
* block-wise
* convert linear for module
* revert
* Fix PEP8 checks Error
|
2024-01-16 18:54:39 +08:00 |
|
Yuwen Hu
|
8643b62521
|
[LLM] Support longer context in iGPU perf tests (2048-256) (#9910)
|
2024-01-16 17:48:37 +08:00 |
|
Xin Qiu
|
dee32f7d15
|
copy fused rms norm's reuslt to avoid <unk> (#9909)
|
2024-01-16 16:54:08 +08:00 |
|
ZehuaCao
|
05ea0ecd70
|
add pv for llm-serving k8s deployment (#9906)
|
2024-01-16 11:32:54 +08:00 |
|
Ruonan Wang
|
8d7326ae03
|
LLM: fix chatglm3 sdp to support speculative decoding (#9900)
* fix chatglm3
* fix
* update
* meet code review
* fix
|
2024-01-16 11:29:13 +08:00 |
|
Guancheng Fu
|
9f34da7cdb
|
Update PVC XMX condition (#9901)
* update pvc xmx condition
* update condition
* update conditon
|
2024-01-15 15:42:15 +08:00 |
|