Commit graph

939 commits

Author SHA1 Message Date
Zhicun
6d60982746 Env script: add license (#10257)
* env script

* update README.md

* modify README

* modify cpu info output

* add env-check.sh

* add env-check.bat

* add windows

* modify bat

* add license
2024-02-27 15:29:20 +08:00
Yishuo Wang
b4fa4ab46f optimize yuan 2.0 again (#10252) 2024-02-27 14:51:42 +08:00
Zhicun
03b9c4930a UX: Script to print env info (#10088)
* env script

* update README.md

* modify README

* modify cpu info output

* add env-check.sh

* add env-check.bat

* add windows

* modify bat
2024-02-27 14:45:36 +08:00
Keyan (Kyrie) Zhang
843fe546b0 Add CPU and GPU examples for DeciLM-7B (#9867)
* Add cpu and gpu examples for DeciLM-7B

* Add cpu and gpu examples for DeciLM-7B

* Add DeciLM-7B to README table

* modify deciLM

* modify deciLM

* modify deciLM

* Add verified model in README

* Add cpu_embedding=True
2024-02-27 13:15:49 +08:00
Yuwen Hu
38ae4b372f Add yuan2-2b to win igpu perf test (#10250) 2024-02-27 11:08:33 +08:00
Heyang Sun
36a9e88104 Speculative Starcoder on CPU (#10138)
* Speculative Starcoder on CPU

* enable kv-cache pre-allocation

* refine codes

* refine

* fix style

* fix style

* fix style

* refine

* refine

* Update speculative.py

* Update gptbigcode.py

* fix style

* Update speculative.py

* enable mixed-datatype layernorm on top of torch API

* adaptive dtype

* Update README.md
2024-02-27 09:57:29 +08:00
Yishuo Wang
a47989c860 optimize yuan 2.0 performance (#10244) 2024-02-26 17:20:10 +08:00
Wang, Jian4
6c74b99a28 LLM: Update qwen readme (#10245) 2024-02-26 17:03:09 +08:00
hxsz1997
15ad2fd72e Merge pull request #10226 from zhentaocc/fix_harness
Fix harness
2024-02-26 16:49:27 +08:00
Wang, Jian4
f9b75f900b LLM: Enable qwen target_model ipex (#10232)
* change order

* enable qwen ipex

* update qwen example

* update

* fix style

* update
2024-02-26 16:41:12 +08:00
Jin Qiao
3e6d188553 LLM: add baichuan2-13b to mtl perf (#10238) 2024-02-26 15:55:56 +08:00
Yuwen Hu
e38e29511c [LLM] Yuan2 MLP and Rotary optimization (#10231)
* Add optimization for rotary embedding

* Add mlp fused optimizatgion

* Python style fix

* Fix rotary embedding due to logits difference

* Small fix
2024-02-26 15:10:08 +08:00
Ziteng Zhang
ea23afc8ec [LLM]update ipex part in mistral example readme (#10239)
* update ipex part in mistral example readme
2024-02-26 14:35:20 +08:00
SONG Ge
df2f3885ba [LLM] Enable kv_cache and forward_qkv optimizations for yuan2 (#10225)
* add init kv_cache support for yuan2

* add forward qkv in yuan
2024-02-26 11:29:48 +08:00
Xiangyu Tian
85a99e13e8 LLM: Fix ChatGLM3 Speculative Example (#10236)
Fix ChatGLM3 Speculative Example.
2024-02-26 10:57:28 +08:00
Chen, Zhentao
213ef06691 fix readme 2024-02-24 00:38:08 +08:00
Ruonan Wang
28513f3978 LLM: support fp16 embedding & add mlp fusion for iq2_xxs (#10219)
* add fp16 embed

* small fixes

* fix style

* fix style

* fix comment
2024-02-23 17:26:24 +08:00
Yuwen Hu
eeecd9fc08 Python style fix (#10230) 2024-02-23 17:21:23 +08:00
Yuwen Hu
e511bbd8f1 [LLM] Add basic optimization framework for Yuan2 (#10227)
* Add basic optimization framework for Yuan2

* Small fix

* Python style fix

* Small fix

* Small fix
2024-02-23 17:05:00 +08:00
Xin Qiu
8ef5482da2 update Gemma readme (#10229)
* Update README.md

* Update README.md

* Update README.md

* Update README.md
2024-02-23 16:57:08 +08:00
Chen, Zhentao
6fe5344fa6 separate make_csv from the file 2024-02-23 16:33:38 +08:00
Chen, Zhentao
bfa98666a6 fall back to make_table.py 2024-02-23 16:33:38 +08:00
Ruonan Wang
19260492c7 LLM: fix action/installation error of mpmath (#10223)
* fix

* test

* fix

* update
2024-02-23 16:14:53 +08:00
Xin Qiu
aabfc06977 add gemma example (#10224)
* add gemma gpu example

* Update README.md

* add cpu example

* Update README.md

* Update README.md

* Update generate.py

* Update generate.py
2024-02-23 15:20:57 +08:00
yb-peng
a2c1675546 Add CPU and GPU examples for Yuan2-2B-hf (#9946)
* Add a new CPU example of Yuan2-2B-hf

* Add a new CPU generate.py of Yuan2-2B-hf example

* Add a new GPU example of Yuan2-2B-hf

* Add Yuan2 to README table

* In CPU example:1.Use English as default prompt; 2.Provide modified files in yuan2-2B-instruct

* In GPU example:1.Use English as default prompt;2.Provide modified files

* GPU example:update README

* update Yuan2-2B-hf in README table

* Add CPU example for Yuan2-2B in Pytorch-Models

* Add GPU example for Yuan2-2B in Pytorch-Models

* Add license in generate.py; Modify README

* In GPU Add license in generate.py; Modify README

* In CPU yuan2 modify README

* In GPU yuan2 modify README

* In CPU yuan2 modify README

* In GPU example, updated the readme for Windows GPU supports

* In GPU torch example, updated the readme for Windows GPU supports

* GPU hf example README modified

* GPU example README modified
2024-02-23 14:09:30 +08:00
yb-peng
f1f4094a09 Add CPU and GPU examples of phi-2 (#10014)
* Add CPU and GPU examples of phi-2

* In GPU hf example, updated the readme for Windows GPU supports

* In GPU torch example, updated the readme for Windows GPU supports

* update the table in BigDL/README.md

* update the table in BigDL/python/llm/README.md
2024-02-23 14:05:53 +08:00
Chen, Zhentao
f315c7f93a Move harness nightly related files to llm/test folder (#10209)
* move harness nightly files to test folder

* change workflow file path accordingly

* use arc01 when pr

* fix path

* fix fp16 csv path
2024-02-23 11:12:36 +08:00
Xin Qiu
30795bdfbc Gemma optimization: rms_norm, kv_cache, fused_rope, fused_rope+qkv (#10212)
* gemma optimization

* update

* update

* fix style

* meet code review
2024-02-23 10:07:24 +08:00
Guoqiong Song
63681af97e falcon for transformers 4.36 (#9960)
* falcon for transformers 4.36
2024-02-22 17:04:40 -08:00
Jason Dai
84d5f40936 Update README.md (#10213) 2024-02-22 17:22:59 +08:00
Yina Chen
ce5840a8b7 GPT-J rope optimization on xpu (#10182)
* optimize

* update

* fix style & move use_fuse_rope

* add ipex version check

* fix style

* update

* fix style

* meet comments

* address comments

* fix style
2024-02-22 16:25:12 +08:00
Xiangyu Tian
f445217d02 LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189)
Update IPEX to 2.2.0+cpu and refactor for _ipex_optimize.
2024-02-22 16:01:11 +08:00
Heyang Sun
c876d9b5ca Support for MPT rotary embedding (#10208) 2024-02-22 15:16:31 +08:00
Ruonan Wang
5e1fee5e05 LLM: add GGUF-IQ2 examples (#10207)
* add iq2 examples

* small fix

* meet code review

* fix

* meet review

* small fix
2024-02-22 14:18:45 +08:00
Yuwen Hu
21de2613ce [LLM] Add model loading time record for all-in-one benchmark (#10201)
* Add model loading time record in csv for all-in-one benchmark

* Small fix

* Small fix to number after .
2024-02-22 13:57:18 +08:00
Ovo233
60e11b6739 LLM: Add mlp layer unit tests (#10200)
* add mlp layer unit tests

* add download baichuan-13b

* exclude llama for now

* install additional packages

* rename bash file

* switch to Baichuan2

* delete attention related code

* fix name errors in yml file
2024-02-22 13:44:45 +08:00
SONG Ge
ca1166a0e5 [LLM] Add quantize kv_cache for Baichuan2-13B (#10203)
* add quantize kv_cache for baichuan2-13b

* style fix
2024-02-22 13:43:35 +08:00
Ruonan Wang
34ee1aa91f LLM: add esimd sdp support for chatglm3 (#10205)
* add esimd sdp support

* fix style
2024-02-22 13:37:16 +08:00
Yuxuan Xia
7cbc2429a6 Fix C-Eval ChatGLM loading issue (#10206)
* Add c-eval workflow and modify running files

* Modify the chatglm evaluator file

* Modify the ceval workflow for triggering test

* Modify the ceval workflow file

* Modify the ceval workflow file

* Modify ceval workflow

* Adjust the ceval dataset download

* Add ceval workflow dependencies

* Modify ceval workflow dataset download

* Add ceval test dependencies

* Add ceval test dependencies

* Correct the result print

* Fix the nightly test trigger time

* Fix ChatGLM loading issue
2024-02-22 10:00:43 +08:00
Yuwen Hu
94cb16fe40 [LLM] Small updates to Win GPU Install Doc (#10199)
* Make Offline installer as default for win gpu doc for oneAPI

* Small other fixes
2024-02-21 17:58:40 +08:00
binbin Deng
9975b029c5 LLM: add qlora finetuning example using trl.SFTTrainer (#10183) 2024-02-21 16:40:04 +08:00
Ruonan Wang
f7c96b19ef LLM: support iq2 for mixtral (#10191)
* support name mapping for mixtral

* support mixtral mixed quantization

* fix style

* fix
2024-02-21 16:00:29 +08:00
yb-peng
b1a97b71a9 Harness eval: Add is_last parameter and fix logical operator in highlight_vals (#10192)
* Add is_last parameter and fix logical operator in highlight_vals

* Add script to update HTML files in parent folder

* Add running update_html_in_parent_folder.py in summarize step

* Add licence info

* Remove update_html_in_parent_folder.py in Summarize the results for pull request
2024-02-21 14:45:32 +08:00
Zhicun
c7e839e66c Add Qwen1.5-7B-Chat (#10113)
* add Qwen1.5-7B-Chat

* modify Qwen1.5 example

* update README

* update prompt format

* update folder name and example README

* add Chinese prompt sample output

* update link in README

* correct the link

* update transformer version
2024-02-21 13:29:29 +08:00
Xin Qiu
56ad781f2f qwen2 cpu fix (#10187) 2024-02-21 11:23:51 +08:00
Chen, Zhentao
39d37bd042 upgrade harness package version in workflow (#10188)
* upgrade harness

* update readme
2024-02-21 11:21:30 +08:00
Yuwen Hu
001c13243e [LLM] Add support for low_low_bit benchmark on Windows GPU (#10167)
* Add support for low_low_bit performance test on Windows GPU

* Small fix

* Small fix

* Save memory during converting model process

* Drop the results for first time when loading in low bit on mtl igpu for better performance

* Small fix
2024-02-21 10:51:52 +08:00
Ziteng Zhang
276ef0e885 Speculative Ziya on CPU (#10160)
* Speculative Ziya on CPU

* Without part of Accelerate with BIGDL_OPT_IPEX
2024-02-21 10:30:39 +08:00
Zhao Changmin
4fbf449c2d for rwkv4 (#10179) 2024-02-21 10:11:10 +08:00
yb-peng
de3dc609ee Modify harness evaluation workflow (#10174)
* Modify table head in harness

* Specify the file path of fp16.csv

* change run to run nightly and run pr to debug

* Modify the way to get fp16.csv to downloading from github

* Change the method to calculate diff in html table

* Change the method to calculate diff in html table

* Re-arrange job order

* Re-arrange job order

* Change limit

* Change fp16.csv  path

* Change highlight rules

* Change limit
2024-02-20 18:55:43 +08:00