Commit graph

2235 commits

Author SHA1 Message Date
hxsz1997
6e10d98a8d Fix some typos (#10175)
* add llm-ppl workflow

* update the DATASET_DIR

* test multiple precisions

* modify nightly test

* match the updated ppl code

* add matrix.include

* fix the include error

* update the include

* add more model

* update the precision of include

* update nightly time and add more models

* fix the workflow_dispatch description, change default model of pr and modify the env

* modify workflow_dispatch language options

* modify options

* modify language options

* modeify workflow_dispatch type

* modify type

* modify the type of language

* change seq_len type

* fix some typos

* revert changes to stress_test.txt
2024-02-20 14:14:53 +08:00
Zhicun
add3899311 Add ziya CPU example (#10114)
* ziya on CPU

* add README for ziya

* specify use_cache

* add arc CPU

* update prompt format

* update link

* add comments to emphasize use_cache

* update pip cmd
2024-02-20 13:59:52 +08:00
Yuxuan Xia
71875ebc24 Fix the C-Eval nightly test trigger time (#10172)
* Add c-eval workflow and modify running files

* Modify the chatglm evaluator file

* Modify the ceval workflow for triggering test

* Modify the ceval workflow file

* Modify the ceval workflow file

* Modify ceval workflow

* Adjust the ceval dataset download

* Add ceval workflow dependencies

* Modify ceval workflow dataset download

* Add ceval test dependencies

* Add ceval test dependencies

* Correct the result print

* Fix the nightly test trigger time
2024-02-20 09:53:59 +08:00
binbin Deng
2bb96c775c LLM: fix device setting during saving optimized model (#10154) 2024-02-20 09:52:59 +08:00
Xin Qiu
1f6d5b9f30 enable fused rmsnorm and rope qwen2 (#10163)
* qwen2

* change convert

* cleanup
2024-02-20 08:33:09 +08:00
yb-peng
e31210ba00 Modify html table style and add fp16.csv in harness (#10169)
* Specify the version of pandas in harness evaluation workflow

* Specify the version of pandas in harness evaluation workflow

* Modify html table style and add fp16.csv in harness

* Modify comments
2024-02-19 18:13:40 +08:00
WeiguangHan
6c09aed90d LLM: add qwen_1.5_7b model for arc perf test (#10166)
* LLM: add qwen_1.5_7b model for arc perf test

* small fix

* revert some codes
2024-02-19 17:21:00 +08:00
Yuxuan Xia
209122559a Add Ceval workflow and modify the result printing (#10140)
* Add c-eval workflow and modify running files

* Modify the chatglm evaluator file

* Modify the ceval workflow for triggering test

* Modify the ceval workflow file

* Modify the ceval workflow file

* Modify ceval workflow

* Adjust the ceval dataset download

* Add ceval workflow dependencies

* Modify ceval workflow dataset download

* Add ceval test dependencies

* Add ceval test dependencies

* Correct the result print
2024-02-19 17:06:53 +08:00
yb-peng
50fa004ba5 Specify the version of pandas in harness evaluation workflow (#10159)
* Specify the version of pandas in harness evaluation workflow

* Specify the version of pandas in harness evaluation workflow
2024-02-19 16:27:08 +08:00
Zhao Changmin
f8730e8dc1 Skip rescale rwkv linear when load_low_bit (#10164)
* rwkv_ld
2024-02-19 15:56:42 +08:00
Heyang Sun
3e2af5ec0a Fix IPEX Baichuan Speculative (#10162)
* Fix IPEX Baichuan Speculative

* compatible with 13B

* Update speculative.py
2024-02-19 15:27:34 +08:00
Cheen Hau, 俊豪
6952847f68 GPU install doc - add pip install oneAPI for windows (#10157)
* Add instructions for pip install oneAPI for windows

* Improve clarity

* Format fix

* Fix

* Fix in runtime configuration
2024-02-19 14:46:08 +08:00
Yina Chen
23c91cdce6 [LLM] Add min_step_draft in speculative decoding (#10142)
* Fix gptj kvcache & position id

* Add min_draft_tokens in speculative decoding

* fix style

* update
2024-02-19 14:31:41 +08:00
Chen, Zhentao
14ba2c5135 Harness: remove deprecated files (#10165) 2024-02-19 14:27:49 +08:00
Wang, Jian4
d3591383d5 LLM : Add CPU chatglm3 speculative example (#10004)
* init chatglm

* update

* update
2024-02-19 13:38:52 +08:00
Wang, Jian4
f2417e083c LLM: enable chatglm3-6b target_model ipex (#10085)
* init

* always make casual_mask

* not return last tensor

* update

* optimize_model = False

* enable optimized=False

* enable optimized_model=true

* speed_up ipex target_model

* remove if True

* use group_size

* update python style

* update

* update
2024-02-19 13:38:32 +08:00
Heyang Sun
177273c1a4 IPEX Speculative Support for Baichuan2 7B (#10112)
* IPEX Speculative Support for Baichuan2 7B

* fix license problems

* refine
2024-02-19 09:12:57 +08:00
Jason Dai
6f38e604de Fix README.md (#10156) 2024-02-18 21:51:40 +08:00
Shaojun Liu
7a3a20cf5b Fix: GitHub-owned GitHubAction not pinned by hash (#10152) 2024-02-18 16:49:28 +08:00
Shaojun Liu
c3daacec6d Fix Token Permission issues (#10151)
Co-authored-by: Your Name <Your Email>
2024-02-18 13:23:54 +08:00
Yina Chen
1508d6b089 Fix gptj kvcache & position id (#10141) 2024-02-18 10:02:49 +08:00
Kai Huang
7400401706 Update gpu pip install oneapi doc (#10137)
* fix link

* fix

* fix

* minor
2024-02-09 11:27:40 +08:00
yb-peng
b7c5104d98 remove limit in harness run (#10139) 2024-02-09 11:20:53 +08:00
yb-peng
b4dc33def6 In harness-evaluation workflow, add statistical tables (#10118)
* chnage storage

* fix typo

* change label

* change label to arc03

* change needs in the last step

* add generate csv in harness/make_table_results.py

* modify needs in the last job

* add csv to html

* mfix path issue in llm-harness-summary-nightly

* modify output_path

* modify args in make_table_results.py

* modify make table command in summary

* change pr env label

* remove irrelevant code in summary; add set output path step; add limit in harness run

* re-organize code structure

* modify limit in run harness

* modify csv_to_html input path

* modify needs in summary-nightly
2024-02-08 19:01:05 +08:00
Shaojun Liu
c2378a9546 Fix code scanning issues (#10129)
* Fix code scanning issues

* update oneccl_bind_pt link

* update

* update

---------

Co-authored-by: Your Name <Your Email>
2024-02-08 17:19:44 +08:00
Yishuo Wang
4d33aac7f9 quick fix qwen2 fp8 kv cache (#10135) 2024-02-08 17:04:59 +08:00
Cengguang Zhang
39d90839aa LLM: add quantize kv cache for llama. (#10086)
* feat: add quantize kv cache for llama.

* fix style.

* add quantized attention forward function.

* revert style.

* fix style.

* fix style.

* update quantized kv cache and add quantize_qkv

* fix style.

* fix style.

* optimize quantize kv cache.

* fix style.
2024-02-08 16:49:22 +08:00
Yishuo Wang
d848efe17c add quantize kv cache support for qwen2 (#10134) 2024-02-08 16:17:21 +08:00
SONG Ge
3f79128ed7 [LLM] Enable kv_cache optimization for Qwen2 on transformers-v4.37.0 (#10131)
* add support for kv_cache optimization on transformers-v4.37.0

* enable attention forward

* style fix

* disable rotary for now
2024-02-08 14:20:26 +08:00
Ruonan Wang
063dc145ac LLM: basic support for q2k (#10132)
* basic support for q2k

* fix style
2024-02-08 13:52:01 +08:00
binbin Deng
11fe5a87ec LLM: add Modelscope model example (#10126) 2024-02-08 11:18:07 +08:00
Cengguang Zhang
0cf6a12691 LLM: add default torch_dtype for fp16. (#10124)
* set default torch_dtype for fp16.

* fix style.

* bug fix.

* update bug fix.
2024-02-08 10:24:16 +08:00
Yishuo Wang
1aa0c623ce disable fused layer norm on UHD (#10130) 2024-02-08 10:20:01 +08:00
Yuwen Hu
a8450fc300 [LLM] Support MLP optimization for Qwen1.5 (#10123) 2024-02-08 09:15:34 +08:00
Yuwen Hu
81ed65fbe7 [LLM] Add qwen1.5-7B in iGPU perf (#10127)
* Add qwen1.5 test config yaml with transformers 4.37.0

* Update for yaml file
2024-02-07 22:31:20 +08:00
Cheen Hau, 俊豪
a7f9a13f6e Enhance gpu doc with PIP install oneAPI (#10109)
* Add pip install oneapi instructions

* Fixes

* Add instruction for oneapi2023

* Runtime config

* Fixes

* Remove "Currently, oneAPI installed with .. "

* Add pip package version for oneAPI 2024

* Reviewer comments

* Fix errors
2024-02-07 21:14:15 +08:00
hxsz1997
b4c327ea78 Llm ppl workflow bug fix (#10128)
* add llm-ppl workflow

* update the DATASET_DIR

* test multiple precisions

* modify nightly test

* match the updated ppl code

* add matrix.include

* fix the include error

* update the include

* add more model

* update the precision of include

* update nightly time and add more models

* fix the workflow_dispatch description, change default model of pr and modify the env

* modify workflow_dispatch language options

* modify options

* modify language options

* modeify workflow_dispatch type

* modify type

* modify the type of language

* change seq_len type
2024-02-07 18:48:14 +08:00
hxsz1997
76bd792ff1 Fix llm ppl workflow workflow_dispatch bugs (#10125)
* add llm-ppl workflow

* update the DATASET_DIR

* test multiple precisions

* modify nightly test

* match the updated ppl code

* add matrix.include

* fix the include error

* update the include

* add more model

* update the precision of include

* update nightly time and add more models

* fix the workflow_dispatch description, change default model of pr and modify the env

* modify workflow_dispatch language options

* modify options

* modify language options
2024-02-07 17:41:44 +08:00
Jin Qiao
0fcfbfaf6f LLM: add rwkv5 eagle GPU HF example (#10122)
* LLM: add rwkv5 eagle example

* fix

* fix link
2024-02-07 16:58:29 +08:00
Shaojun Liu
9f5a86f9db fix OpenSSF Token-Permissions issues (#10121)
Co-authored-by: Your Name <Your Email>
2024-02-07 16:51:10 +08:00
binbin Deng
925f82107e LLM: support models hosted by modelscope (#10106) 2024-02-07 16:46:36 +08:00
hxsz1997
1710ecb990 Add llm-ppl workflow (#10074)
* add llm-ppl workflow

* update the DATASET_DIR

* test multiple precisions

* modify nightly test

* match the updated ppl code

* add matrix.include

* fix the include error

* update the include

* add more model

* update the precision of include

* update nightly time and add more models

* fix the workflow_dispatch description, change default model of pr and modify the env
2024-02-07 16:29:57 +08:00
binbin Deng
c1ec3d8921 LLM: update FAQ about too many open files (#10119) 2024-02-07 15:02:24 +08:00
Keyan (Kyrie) Zhang
2e80701f58 Unit test on final logits and the logits of the last attention layer (#10093)
* Add unit test on final logits and attention

* Add unit test on final logits and attention

* Modify unit test on final logits and attention
2024-02-07 14:25:36 +08:00
Yuxuan Xia
3832eb0ce0 Add ChatGLM C-Eval Evaluator (#10095)
* Add ChatGLM ceval evaluator

* Modify ChatGLM Evaluator Reference
2024-02-07 11:27:06 +08:00
Shaojun Liu
5e9710cec4 Update threshold for cpu stable version tests (#10108)
* update threshold

* update

* test

* update

* update

* revert

* revert

---------

Co-authored-by: Your Name <Your Email>
2024-02-07 11:21:23 +08:00
Jin Qiao
63050c954d fix (#10117) 2024-02-07 11:05:11 +08:00
Jin Qiao
d3d2ee1b63 LLM: add speech T5 GPU example (#10090)
* add speech t5 example

* fix

* fix
2024-02-07 10:50:02 +08:00
Jin Qiao
2f4c754759 LLM: add bark gpu example (#10091)
* add bark gpu example

* fix

* fix license

* add bark

* add example

* fix

* another way
2024-02-07 10:47:11 +08:00
Xiangyu Tian
8953acd7d6 [LLM] Fix log condition for BIGDL_OPT_IPEX (#10115)
Fix log condition for BIGDL_OPT_IPEX
2024-02-07 10:27:10 +08:00