Commit graph

1116 commits

Author SHA1 Message Date
Heyang Sun
d72c0fad0d Qwen2 SDPA forward on CPU (#10395)
* Fix Qwen1.5 CPU forward

* Update convert.py

* Update qwen2.py
2024-03-13 13:10:03 +08:00
Yishuo Wang
ca58a69b97 fix arc rms norm UT (#10394) 2024-03-13 13:09:15 +08:00
Wang, Jian4
0193f29411 LLM : Enable gguf float16 and Yuan2 model (#10372)
* enable float16

* add yun files

* enable yun

* enable set low_bit on yuan2

* update

* update license

* update generate

* update readme

* update python style

* update
2024-03-13 10:19:18 +08:00
Yina Chen
f5d65203c0 First token lm_head optimization (#10318)
* add lm head linear

* update

* address comments and fix style

* address comment
2024-03-13 10:11:32 +08:00
Keyan (Kyrie) Zhang
7cf01e6ec8 Add LangChain upstream ut test (#10349)
* Add LangChain upstream ut test

* Add LangChain upstream ut test

* Specify version numbers in yml script

* Correct langchain-community version
2024-03-13 09:52:45 +08:00
Xin Qiu
28c4a8cf5c Qwen fused qkv (#10368)
* fused qkv + rope for qwen

* quantized kv cache

* fix

* update qwen

* fixed quantized qkv

* fix

* meet code review

* update split

* convert.py

* extend when no enough kv

* fix
2024-03-12 17:39:00 +08:00
Yishuo Wang
741c2bf1df use new rms norm (#10384) 2024-03-12 17:29:51 +08:00
Xiangyu Tian
0ded0b4b13 LLM: Enable BigDL IPEX optimization for int4 (#10319)
Enable BigDL IPEX optimization for int4
2024-03-12 17:08:50 +08:00
binbin Deng
5d7e044dbc LLM: add low bit option in deepspeed autotp example (#10382) 2024-03-12 17:07:09 +08:00
binbin Deng
df3bcc0e65 LLM: remove english_quotes dataset (#10370) 2024-03-12 16:57:40 +08:00
Zhao Changmin
df2b84f7de Enable kv cache on arc batch (#10308) 2024-03-12 16:46:04 +08:00
Lilac09
5809a3f5fe Add run-hbm.sh & add user guide for spr and hbm (#10357)
* add run-hbm.sh

* add spr and hbm guide

* only support quad mode

* only support quad mode

* update special cases

* update special cases
2024-03-12 16:15:27 +08:00
binbin Deng
5d996a5caf LLM: add benchmark script for deepspeed autotp on gpu (#10380) 2024-03-12 15:19:57 +08:00
Keyan (Kyrie) Zhang
f9c144dc4c Fix final logits ut failure (#10377)
* Fix final logits ut failure

* Fix final logits ut failure

* Remove Falcon from completion test for now

* Remove Falcon from unit test for now
2024-03-12 14:34:01 +08:00
Guancheng Fu
cc4148636d [FastChat-integration] Add initial implementation for loader (#10323)
* add initial implementation for loader

* add test method for model_loader

* data

* Refine
2024-03-12 10:54:59 +08:00
WeiguangHan
17bdb1a60b LLM: add whisper models into nightly test (#10193)
* LLM: add whisper models into nightly test

* small fix

* small fix

* add more whisper models

* test all cases

* test specific cases

* collect the csv

* store the resut

* to html

* small fix

* small test

* test all cases

* modify whisper_csv_to_html
2024-03-11 20:00:47 +08:00
binbin Deng
dbcfc5c2fa LLM: fix error of 'AI-ModelScope/phi-2' hosted by ModelScope hub (#10364) 2024-03-11 16:19:17 +08:00
binbin Deng
fe27a6971c LLM: update modelscope version (#10367) 2024-03-11 16:18:27 +08:00
Chen, Zhentao
a425eaabfc fix from_pretrained when device_map=None (#10361)
* pr trigger

* fix error when device_map=None

* fix device_map=None
2024-03-11 16:06:12 +08:00
Yina Chen
d7b765fd3f serving xpu memory opt (#10358) 2024-03-11 15:21:22 +08:00
Ruonan Wang
be29833b2b LLM: fix qwen2 (#10356) 2024-03-11 09:29:08 +08:00
Zhicun
9026c08633 Fix llamaindex AutoTokenizer bug (#10345)
* fix tokenizer

* fix AutoTokenizer bug

* modify code style
2024-03-08 16:24:50 +08:00
Zhicun
2a10b53d73 rename docqa.py->rag.py (#10353) 2024-03-08 16:07:09 +08:00
Keyan (Kyrie) Zhang
f1825d7408 Add RMSNorm unit test (#10190) 2024-03-08 15:51:03 +08:00
Shengsheng Huang
370c52090c Langchain readme (#10348)
* update langchain readme

* update readme

* create new README

* Update README_nativeint4.md
2024-03-08 14:57:24 +08:00
Keyan (Kyrie) Zhang
7a621a4db0 Fix device_map bug by raise an error when using device_map=xpu (#10340)
* Fix device_map bug by raise an error when using device_map=xpu

* Fix sync error

* Fix python style

* Use invalidInputError instead of invalidOperationError
2024-03-08 13:38:52 +08:00
Yishuo Wang
1ac193ba02 add rope theta argument (#10343) 2024-03-07 17:27:19 +08:00
Yuxuan Xia
0c8d3c9830 Add C-Eval HTML report (#10294)
* Add C-Eval HTML report

* Fix C-Eval workflow pr trigger path

* Fix C-Eval workflow typos

* Add permissions to C-Eval workflow

* Fix C-Eval workflow typo

* Add pandas dependency

* Fix C-Eval workflow typo
2024-03-07 16:44:49 +08:00
Cengguang Zhang
496d18ab6d LLM: add quantize kv cache support for baichuan 7b and 13b. (#10330)
* add quantize kv cache for baichuan 7b and 13b.

* fix typo.

* fix.

* fix style.

* fix style.
2024-03-07 16:17:38 +08:00
hxsz1997
b7db21414e Update llamaindex ut (#10338)
* add test_llamaindex of gpu

* add llamaindex gpu tests bash

* add llamaindex cpu tests bash

* update name of Run LLM langchain GPU test

* import llama_index in llamaindex gpu ut

* update the dependency of test_llamaindex

* add Run LLM llamaindex GPU test

* modify import dependency of llamaindex cpu test

* add Run LLM llamaindex test

* update llama_model_path

* delete unused model path

* add LLAMA2_7B_ORIGIN_PATH in llamaindex cpu test
2024-03-07 10:06:16 +08:00
ZehuaCao
267de7abc3 fix fschat DEP version error (#10325) 2024-03-06 16:15:27 +08:00
Yina Chen
9ea499ca68 Optimize speculative decoding PVC memory usage (#10329)
* optimize memory

* update

* update

* update

* support other models

* update

* fix style
2024-03-06 09:54:21 +08:00
dingbaorong
cc796848ea fix typos (#10274)
Co-authored-by: Ariadne <wyn2000330@126.com>
2024-03-05 18:38:22 +08:00
hxsz1997
af11c53473 Add the installation step of postgresql and pgvector on windows in LlamaIndex GPU support (#10328)
* add the installation of postgresql and pgvector of windows

* fix some format
2024-03-05 18:31:19 +08:00
Yishuo Wang
0011ff9f64 optimize bge large performance (#10324) 2024-03-05 17:06:03 +08:00
Shaojun Liu
178eea5009 upload bigdl-llm wheel to sourceforge for backup (#10321)
* test: upload to sourceforge

* update scripts

* revert
2024-03-05 16:36:01 +08:00
Cengguang Zhang
30d009bca7 LLM: support quantized kv cache for Mistral in transformers >=4.36.0 (#10326)
* support quantize kv for mistral in transformers 4.36

* update mistral support.

* fix style.
2024-03-05 16:23:50 +08:00
dingbaorong
1e6f0c6f1a Add llamaindex gpu example (#10314)
* add llamaindex example

* fix core dump

* refine readme

* add trouble shooting

* refine readme

---------

Co-authored-by: Ariadne <wyn2000330@126.com>
2024-03-05 13:36:00 +08:00
dingbaorong
fc7f10cd12 add langchain gpu example (#10277)
* first draft

* fix

* add readme for transformer_int4_gpu

* fix doc

* check device_map

* add arc ut test

* fix ut test

* fix langchain ut

* Refine README

* fix gpu mem too high

* fix ut test

---------

Co-authored-by: Ariadne <wyn2000330@126.com>
2024-03-05 13:33:57 +08:00
Yuwen Hu
5dbbe1a826 [LLM] Support for new arc ut runner (#10311)
* Support for new arc ut runner

* Comment unnecessary OMP_NUM_THREADS related settings for arc uts
2024-03-04 18:42:02 +08:00
Yuwen Hu
d45e577d8c [LLM] Test load_low_bit in iGPU perf test on Windows (#10313) 2024-03-04 18:03:57 +08:00
WeiguangHan
fd81d66047 LLM: Compress some models to save space (#10315)
* LLM: compress some models to save space

* add deleted comments
2024-03-04 17:53:03 +08:00
Shaojun Liu
bab2ee5f9e update nightly spr perf test (#10178)
* update nightly spr perf test

* update

* update runner lable

* update

* update

* update folder

* revert
2024-03-04 13:46:33 +08:00
Cengguang Zhang
ab9fc2485f LLM: add quantize kv support for llama transformer 4.36 (#10298)
* add quantize kv support for llama transformer 4.36

* fix style.

* fix style.
2024-03-04 10:33:35 +08:00
Xin Qiu
58208a5883 Update FAQ document. (#10300)
* Update install_gpu.md

* Update resolve_error.md

* Update README.md

* Update resolve_error.md

* Update README.md

* Update resolve_error.md
2024-03-04 08:35:11 +08:00
Yuwen Hu
27d9a14989 [LLM] all-on-one update: memory optimize and streaming output (#10302)
* Memory saving for continous in-out pair run and add support for streaming output on MTL iGPU

* Small fix

* Small fix

* Add things back
2024-03-01 18:02:30 +08:00
SONG Ge
0ab40917fb [LLM] Split merged_qk to separated q/k linear (#10299)
* modify merge_qk_linear to separated q/k linear

* update
2024-03-01 16:48:55 +08:00
Yang Wang
f4d7dbcde2 use fused qkv forward in qwen2 (#10185)
* use fused qkv forward in qwen2

* support both

* fix style

* fix rope

* remove pring

* fix style

* clean up
2024-03-01 16:46:35 +08:00
Xin Qiu
509e206de0 update doc about gemma random and unreadable output. (#10297)
* Update install_gpu.md

* Update README.md

* Update README.md
2024-03-01 15:41:16 +08:00
Wang, Jian4
beb9433cec LLM: Reduce speculative _ipex_optimize_model memory use (#10281)
* use tpp

* update ipex
2024-03-01 13:48:23 +08:00
Yuwen Hu
f0ff0eebe1 [LLM] Support quantize kv cache for Baichuan2 7B (#10280)
* Add quatized kv cache framework for Baichuan2 7B

* Support quantize kv cache for baichuan2

* Small fix

* Fix python style
2024-03-01 13:35:42 +08:00
SONG Ge
273de341d7 hot-fix silu error import (#10292) 2024-03-01 10:11:37 +08:00
Shengsheng Huang
bcfad555df revise llamaindex readme (#10283) 2024-02-29 17:19:23 +08:00
Xin Qiu
232273a1b5 Enable Gemma fused mlp + Gelu (#10276)
* update llama mlp forward

* add all

* fix style check

* split

* update

* update

* update

* fix style
2024-02-29 16:53:24 +08:00
Guancheng Fu
2d930bdca8 Add vLLM bf16 support (#10278)
* add argument load_in_low_bit

* add docs

* modify gpu doc

* done

---------

Co-authored-by: ivy-lv11 <lvzc@lamda.nju.edu.cn>
2024-02-29 16:33:42 +08:00
SONG Ge
13b0bc9075 [LLM] Add quantize_kv optimization for yuan2 model (#10243)
* add initial quantize_kv support for yuan2 model

* fix yuan2 quantize_kv generation

* apply fp16 conv layer optimizations

* disable mlp for quantize_kv
2024-02-29 16:33:26 +08:00
Zhicun
4e6cc424f1 Add LlamaIndex RAG (#10263)
* run demo

* format code

* add llamaindex

* add custom LLM with bigdl

* update

* add readme

* begin ut

* add unit test

* add license

* add license

* revised

* update

* modify docs

* remove data folder

* update

* modify prompt

* fixed

* fixed

* fixed
2024-02-29 15:21:19 +08:00
Jin Qiao
5d7243067c LLM: add Baichuan2-13B-Chat 2048-256 to MTL perf (#10273) 2024-02-29 13:48:55 +08:00
Ruonan Wang
a9fd20b6ba LLM: Update qkv fusion for GGUF-IQ2 (#10271)
* first commit

* update mistral

* fix transformers==4.36.0

* fix

* disable qk for mixtral now

* fix style
2024-02-29 12:49:53 +08:00
Jiao Wang
6fb65bb9d2 fix in transformers 4.36 (#10150) 2024-02-28 18:43:01 -08:00
Shengsheng Huang
43dac97e03 Update README.md (#10260) 2024-02-29 10:41:14 +08:00
Ruonan Wang
4b08bc1417 LLM: relax batch check of flash atttention by double check attention mask (#10270)
* relax batch check

* fix

* fix style
2024-02-29 09:39:55 +08:00
Yina Chen
07f36fbfcc Fix gptj failed to extend (#10269) 2024-02-29 09:39:27 +08:00
Yishuo Wang
cccb02dad1 fix baichuan2 13b 2k input (#10267) 2024-02-28 17:20:20 +08:00
Heyang Sun
7244fd1ba5 Fix Arc StarCoder wrong query_shape when input is long (#10268)
* Fix Arc StarCoder wrong query_shape when input is long

* Update gptbigcode.py
2024-02-28 17:07:08 +08:00
Cengguang Zhang
a4de3095f3 LLM: Support quantize kv cache in mistral. (#10261)
* init

* update quantize kv.
2024-02-28 14:08:08 +08:00
Shengsheng Huang
db0d129226 Revert "Add rwkv example (#9432)" (#10264)
This reverts commit 6930422b42.
2024-02-28 11:48:31 +08:00
Yining Wang
6930422b42 Add rwkv example (#9432)
* codeshell fix wrong urls

* restart runner

* add RWKV CPU & GPU example (rwkv-4-world-7b)

* restart runner

* update submodule

* fix runner

* runner-test

---------

Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-02-28 11:41:00 +08:00
Keyan (Kyrie) Zhang
59861f73e5 Add Deepseek-6.7B (#9991)
* Add new example Deepseek

* Add new example Deepseek

* Add new example Deepseek

* Add new example Deepseek

* Add new example Deepseek

* modify deepseek

* modify deepseek

* Add verified model in README

* Turn cpu_embedding=True in Deepseek example

---------

Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-02-28 11:36:39 +08:00
Yuxuan Xia
2524273198 Update AutoGen README (#10255)
* Update AutoGen README

* Fix AutoGen README typos

* Update AutoGen README

* Update AutoGen README
2024-02-28 11:34:45 +08:00
Zheng, Yi
2347f611cf Add cpu and gpu examples of Mamba (#9797)
* Add mamba cpu example

* Add mamba gpu example

* Use a smaller model as the example

* minor fixes

---------

Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-02-28 11:33:29 +08:00
Zhao Changmin
937e1f7c74 rebase (#9104)
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2024-02-28 11:18:21 +08:00
JunX
4833067489 fix GPU example link in README.md (#9533)
* fix GPU example link in README.md

* fix GPU links in llm README.md
2024-02-28 11:13:18 +08:00
Zhicun
308e637d0d Add DeepSeek-MoE-16B-Chat (#10155)
* dsmoe-hf add

* add dsmoe pytorch

* update README

* modify comment

* remove GPU example

* update model name

* format code
2024-02-28 10:12:09 +08:00
Guoqiong Song
f4a2e32106 Stream llm example for both GPU and CPU (#9390) 2024-02-27 15:54:47 -08:00
Yang Wang
c581c6db30 draft mmint4 (#10031)
change to llm.cpp

support transposed format

revert

implement qkv fuse

fix style

change to vertically pack

change to enable_xetla

fix mlp_fusion_check

remove comments

address comments

add some comments

fix style
2024-02-27 14:55:16 -08:00
hxsz1997
cba61a2909 Add html report of ppl (#10218)
* remove include and language option, select the corresponding dataset based on the model name in Run

* change the nightly test time

* change the nightly test time of harness and ppl

* save the ppl result to json file

* generate csv file and print table result

* generate html

* modify the way to get parent folder

* update html in parent folder

* add llm-ppl-summary and llm-ppl-summary-html

* modify echo single result

* remove download fp16.csv

* change model name of PR

* move ppl nightly related files to llm/test folder

* reformat

* seperate make_table from make_table_and_csv.py

* separate make_csv from make_table_and_csv.py

* update llm-ppl-html

* remove comment

* add Download fp16.results
2024-02-27 17:37:08 +08:00
Zhicun
6d60982746 Env script: add license (#10257)
* env script

* update README.md

* modify README

* modify cpu info output

* add env-check.sh

* add env-check.bat

* add windows

* modify bat

* add license
2024-02-27 15:29:20 +08:00
Yishuo Wang
b4fa4ab46f optimize yuan 2.0 again (#10252) 2024-02-27 14:51:42 +08:00
Zhicun
03b9c4930a UX: Script to print env info (#10088)
* env script

* update README.md

* modify README

* modify cpu info output

* add env-check.sh

* add env-check.bat

* add windows

* modify bat
2024-02-27 14:45:36 +08:00
Keyan (Kyrie) Zhang
843fe546b0 Add CPU and GPU examples for DeciLM-7B (#9867)
* Add cpu and gpu examples for DeciLM-7B

* Add cpu and gpu examples for DeciLM-7B

* Add DeciLM-7B to README table

* modify deciLM

* modify deciLM

* modify deciLM

* Add verified model in README

* Add cpu_embedding=True
2024-02-27 13:15:49 +08:00
Yuwen Hu
38ae4b372f Add yuan2-2b to win igpu perf test (#10250) 2024-02-27 11:08:33 +08:00
Heyang Sun
36a9e88104 Speculative Starcoder on CPU (#10138)
* Speculative Starcoder on CPU

* enable kv-cache pre-allocation

* refine codes

* refine

* fix style

* fix style

* fix style

* refine

* refine

* Update speculative.py

* Update gptbigcode.py

* fix style

* Update speculative.py

* enable mixed-datatype layernorm on top of torch API

* adaptive dtype

* Update README.md
2024-02-27 09:57:29 +08:00
Yishuo Wang
a47989c860 optimize yuan 2.0 performance (#10244) 2024-02-26 17:20:10 +08:00
Wang, Jian4
6c74b99a28 LLM: Update qwen readme (#10245) 2024-02-26 17:03:09 +08:00
hxsz1997
15ad2fd72e Merge pull request #10226 from zhentaocc/fix_harness
Fix harness
2024-02-26 16:49:27 +08:00
Wang, Jian4
f9b75f900b LLM: Enable qwen target_model ipex (#10232)
* change order

* enable qwen ipex

* update qwen example

* update

* fix style

* update
2024-02-26 16:41:12 +08:00
Jin Qiao
3e6d188553 LLM: add baichuan2-13b to mtl perf (#10238) 2024-02-26 15:55:56 +08:00
Yuwen Hu
e38e29511c [LLM] Yuan2 MLP and Rotary optimization (#10231)
* Add optimization for rotary embedding

* Add mlp fused optimizatgion

* Python style fix

* Fix rotary embedding due to logits difference

* Small fix
2024-02-26 15:10:08 +08:00
Ziteng Zhang
ea23afc8ec [LLM]update ipex part in mistral example readme (#10239)
* update ipex part in mistral example readme
2024-02-26 14:35:20 +08:00
SONG Ge
df2f3885ba [LLM] Enable kv_cache and forward_qkv optimizations for yuan2 (#10225)
* add init kv_cache support for yuan2

* add forward qkv in yuan
2024-02-26 11:29:48 +08:00
Xiangyu Tian
85a99e13e8 LLM: Fix ChatGLM3 Speculative Example (#10236)
Fix ChatGLM3 Speculative Example.
2024-02-26 10:57:28 +08:00
Chen, Zhentao
213ef06691 fix readme 2024-02-24 00:38:08 +08:00
Ruonan Wang
28513f3978 LLM: support fp16 embedding & add mlp fusion for iq2_xxs (#10219)
* add fp16 embed

* small fixes

* fix style

* fix style

* fix comment
2024-02-23 17:26:24 +08:00
Yuwen Hu
eeecd9fc08 Python style fix (#10230) 2024-02-23 17:21:23 +08:00
Yuwen Hu
e511bbd8f1 [LLM] Add basic optimization framework for Yuan2 (#10227)
* Add basic optimization framework for Yuan2

* Small fix

* Python style fix

* Small fix

* Small fix
2024-02-23 17:05:00 +08:00
Xin Qiu
8ef5482da2 update Gemma readme (#10229)
* Update README.md

* Update README.md

* Update README.md

* Update README.md
2024-02-23 16:57:08 +08:00
Chen, Zhentao
6fe5344fa6 separate make_csv from the file 2024-02-23 16:33:38 +08:00
Chen, Zhentao
bfa98666a6 fall back to make_table.py 2024-02-23 16:33:38 +08:00
Ruonan Wang
19260492c7 LLM: fix action/installation error of mpmath (#10223)
* fix

* test

* fix

* update
2024-02-23 16:14:53 +08:00
Xin Qiu
aabfc06977 add gemma example (#10224)
* add gemma gpu example

* Update README.md

* add cpu example

* Update README.md

* Update README.md

* Update generate.py

* Update generate.py
2024-02-23 15:20:57 +08:00
yb-peng
a2c1675546 Add CPU and GPU examples for Yuan2-2B-hf (#9946)
* Add a new CPU example of Yuan2-2B-hf

* Add a new CPU generate.py of Yuan2-2B-hf example

* Add a new GPU example of Yuan2-2B-hf

* Add Yuan2 to README table

* In CPU example:1.Use English as default prompt; 2.Provide modified files in yuan2-2B-instruct

* In GPU example:1.Use English as default prompt;2.Provide modified files

* GPU example:update README

* update Yuan2-2B-hf in README table

* Add CPU example for Yuan2-2B in Pytorch-Models

* Add GPU example for Yuan2-2B in Pytorch-Models

* Add license in generate.py; Modify README

* In GPU Add license in generate.py; Modify README

* In CPU yuan2 modify README

* In GPU yuan2 modify README

* In CPU yuan2 modify README

* In GPU example, updated the readme for Windows GPU supports

* In GPU torch example, updated the readme for Windows GPU supports

* GPU hf example README modified

* GPU example README modified
2024-02-23 14:09:30 +08:00
yb-peng
f1f4094a09 Add CPU and GPU examples of phi-2 (#10014)
* Add CPU and GPU examples of phi-2

* In GPU hf example, updated the readme for Windows GPU supports

* In GPU torch example, updated the readme for Windows GPU supports

* update the table in BigDL/README.md

* update the table in BigDL/python/llm/README.md
2024-02-23 14:05:53 +08:00
Chen, Zhentao
f315c7f93a Move harness nightly related files to llm/test folder (#10209)
* move harness nightly files to test folder

* change workflow file path accordingly

* use arc01 when pr

* fix path

* fix fp16 csv path
2024-02-23 11:12:36 +08:00
Xin Qiu
30795bdfbc Gemma optimization: rms_norm, kv_cache, fused_rope, fused_rope+qkv (#10212)
* gemma optimization

* update

* update

* fix style

* meet code review
2024-02-23 10:07:24 +08:00
Guoqiong Song
63681af97e falcon for transformers 4.36 (#9960)
* falcon for transformers 4.36
2024-02-22 17:04:40 -08:00
Jason Dai
84d5f40936 Update README.md (#10213) 2024-02-22 17:22:59 +08:00
Yina Chen
ce5840a8b7 GPT-J rope optimization on xpu (#10182)
* optimize

* update

* fix style & move use_fuse_rope

* add ipex version check

* fix style

* update

* fix style

* meet comments

* address comments

* fix style
2024-02-22 16:25:12 +08:00
Xiangyu Tian
f445217d02 LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189)
Update IPEX to 2.2.0+cpu and refactor for _ipex_optimize.
2024-02-22 16:01:11 +08:00
Heyang Sun
c876d9b5ca Support for MPT rotary embedding (#10208) 2024-02-22 15:16:31 +08:00
Ruonan Wang
5e1fee5e05 LLM: add GGUF-IQ2 examples (#10207)
* add iq2 examples

* small fix

* meet code review

* fix

* meet review

* small fix
2024-02-22 14:18:45 +08:00
Yuwen Hu
21de2613ce [LLM] Add model loading time record for all-in-one benchmark (#10201)
* Add model loading time record in csv for all-in-one benchmark

* Small fix

* Small fix to number after .
2024-02-22 13:57:18 +08:00
Ovo233
60e11b6739 LLM: Add mlp layer unit tests (#10200)
* add mlp layer unit tests

* add download baichuan-13b

* exclude llama for now

* install additional packages

* rename bash file

* switch to Baichuan2

* delete attention related code

* fix name errors in yml file
2024-02-22 13:44:45 +08:00
SONG Ge
ca1166a0e5 [LLM] Add quantize kv_cache for Baichuan2-13B (#10203)
* add quantize kv_cache for baichuan2-13b

* style fix
2024-02-22 13:43:35 +08:00
Ruonan Wang
34ee1aa91f LLM: add esimd sdp support for chatglm3 (#10205)
* add esimd sdp support

* fix style
2024-02-22 13:37:16 +08:00
Yuxuan Xia
7cbc2429a6 Fix C-Eval ChatGLM loading issue (#10206)
* Add c-eval workflow and modify running files

* Modify the chatglm evaluator file

* Modify the ceval workflow for triggering test

* Modify the ceval workflow file

* Modify the ceval workflow file

* Modify ceval workflow

* Adjust the ceval dataset download

* Add ceval workflow dependencies

* Modify ceval workflow dataset download

* Add ceval test dependencies

* Add ceval test dependencies

* Correct the result print

* Fix the nightly test trigger time

* Fix ChatGLM loading issue
2024-02-22 10:00:43 +08:00
Yuwen Hu
94cb16fe40 [LLM] Small updates to Win GPU Install Doc (#10199)
* Make Offline installer as default for win gpu doc for oneAPI

* Small other fixes
2024-02-21 17:58:40 +08:00
binbin Deng
9975b029c5 LLM: add qlora finetuning example using trl.SFTTrainer (#10183) 2024-02-21 16:40:04 +08:00
Ruonan Wang
f7c96b19ef LLM: support iq2 for mixtral (#10191)
* support name mapping for mixtral

* support mixtral mixed quantization

* fix style

* fix
2024-02-21 16:00:29 +08:00
yb-peng
b1a97b71a9 Harness eval: Add is_last parameter and fix logical operator in highlight_vals (#10192)
* Add is_last parameter and fix logical operator in highlight_vals

* Add script to update HTML files in parent folder

* Add running update_html_in_parent_folder.py in summarize step

* Add licence info

* Remove update_html_in_parent_folder.py in Summarize the results for pull request
2024-02-21 14:45:32 +08:00
Zhicun
c7e839e66c Add Qwen1.5-7B-Chat (#10113)
* add Qwen1.5-7B-Chat

* modify Qwen1.5 example

* update README

* update prompt format

* update folder name and example README

* add Chinese prompt sample output

* update link in README

* correct the link

* update transformer version
2024-02-21 13:29:29 +08:00
Xin Qiu
56ad781f2f qwen2 cpu fix (#10187) 2024-02-21 11:23:51 +08:00
Chen, Zhentao
39d37bd042 upgrade harness package version in workflow (#10188)
* upgrade harness

* update readme
2024-02-21 11:21:30 +08:00
Yuwen Hu
001c13243e [LLM] Add support for low_low_bit benchmark on Windows GPU (#10167)
* Add support for low_low_bit performance test on Windows GPU

* Small fix

* Small fix

* Save memory during converting model process

* Drop the results for first time when loading in low bit on mtl igpu for better performance

* Small fix
2024-02-21 10:51:52 +08:00
Ziteng Zhang
276ef0e885 Speculative Ziya on CPU (#10160)
* Speculative Ziya on CPU

* Without part of Accelerate with BIGDL_OPT_IPEX
2024-02-21 10:30:39 +08:00
Zhao Changmin
4fbf449c2d for rwkv4 (#10179) 2024-02-21 10:11:10 +08:00
yb-peng
de3dc609ee Modify harness evaluation workflow (#10174)
* Modify table head in harness

* Specify the file path of fp16.csv

* change run to run nightly and run pr to debug

* Modify the way to get fp16.csv to downloading from github

* Change the method to calculate diff in html table

* Change the method to calculate diff in html table

* Re-arrange job order

* Re-arrange job order

* Change limit

* Change fp16.csv  path

* Change highlight rules

* Change limit
2024-02-20 18:55:43 +08:00
Ruonan Wang
3288acb8de LLM : Support embedding quantization (only q2k now) (#10170)
* basic logic added

* basic support

* support save&load, update mixed strategy

* fix style

* use int8 for lm_head

* add check for xpu
2024-02-20 16:56:57 +08:00
hxsz1997
6e10d98a8d Fix some typos (#10175)
* add llm-ppl workflow

* update the DATASET_DIR

* test multiple precisions

* modify nightly test

* match the updated ppl code

* add matrix.include

* fix the include error

* update the include

* add more model

* update the precision of include

* update nightly time and add more models

* fix the workflow_dispatch description, change default model of pr and modify the env

* modify workflow_dispatch language options

* modify options

* modify language options

* modeify workflow_dispatch type

* modify type

* modify the type of language

* change seq_len type

* fix some typos

* revert changes to stress_test.txt
2024-02-20 14:14:53 +08:00
Zhicun
add3899311 Add ziya CPU example (#10114)
* ziya on CPU

* add README for ziya

* specify use_cache

* add arc CPU

* update prompt format

* update link

* add comments to emphasize use_cache

* update pip cmd
2024-02-20 13:59:52 +08:00
binbin Deng
2bb96c775c LLM: fix device setting during saving optimized model (#10154) 2024-02-20 09:52:59 +08:00
Xin Qiu
1f6d5b9f30 enable fused rmsnorm and rope qwen2 (#10163)
* qwen2

* change convert

* cleanup
2024-02-20 08:33:09 +08:00
yb-peng
e31210ba00 Modify html table style and add fp16.csv in harness (#10169)
* Specify the version of pandas in harness evaluation workflow

* Specify the version of pandas in harness evaluation workflow

* Modify html table style and add fp16.csv in harness

* Modify comments
2024-02-19 18:13:40 +08:00
WeiguangHan
6c09aed90d LLM: add qwen_1.5_7b model for arc perf test (#10166)
* LLM: add qwen_1.5_7b model for arc perf test

* small fix

* revert some codes
2024-02-19 17:21:00 +08:00
Yuxuan Xia
209122559a Add Ceval workflow and modify the result printing (#10140)
* Add c-eval workflow and modify running files

* Modify the chatglm evaluator file

* Modify the ceval workflow for triggering test

* Modify the ceval workflow file

* Modify the ceval workflow file

* Modify ceval workflow

* Adjust the ceval dataset download

* Add ceval workflow dependencies

* Modify ceval workflow dataset download

* Add ceval test dependencies

* Add ceval test dependencies

* Correct the result print
2024-02-19 17:06:53 +08:00
Zhao Changmin
f8730e8dc1 Skip rescale rwkv linear when load_low_bit (#10164)
* rwkv_ld
2024-02-19 15:56:42 +08:00
Heyang Sun
3e2af5ec0a Fix IPEX Baichuan Speculative (#10162)
* Fix IPEX Baichuan Speculative

* compatible with 13B

* Update speculative.py
2024-02-19 15:27:34 +08:00
Yina Chen
23c91cdce6 [LLM] Add min_step_draft in speculative decoding (#10142)
* Fix gptj kvcache & position id

* Add min_draft_tokens in speculative decoding

* fix style

* update
2024-02-19 14:31:41 +08:00
Chen, Zhentao
14ba2c5135 Harness: remove deprecated files (#10165) 2024-02-19 14:27:49 +08:00
Wang, Jian4
d3591383d5 LLM : Add CPU chatglm3 speculative example (#10004)
* init chatglm

* update

* update
2024-02-19 13:38:52 +08:00
Wang, Jian4
f2417e083c LLM: enable chatglm3-6b target_model ipex (#10085)
* init

* always make casual_mask

* not return last tensor

* update

* optimize_model = False

* enable optimized=False

* enable optimized_model=true

* speed_up ipex target_model

* remove if True

* use group_size

* update python style

* update

* update
2024-02-19 13:38:32 +08:00
Heyang Sun
177273c1a4 IPEX Speculative Support for Baichuan2 7B (#10112)
* IPEX Speculative Support for Baichuan2 7B

* fix license problems

* refine
2024-02-19 09:12:57 +08:00
Yina Chen
1508d6b089 Fix gptj kvcache & position id (#10141) 2024-02-18 10:02:49 +08:00
yb-peng
b4dc33def6 In harness-evaluation workflow, add statistical tables (#10118)
* chnage storage

* fix typo

* change label

* change label to arc03

* change needs in the last step

* add generate csv in harness/make_table_results.py

* modify needs in the last job

* add csv to html

* mfix path issue in llm-harness-summary-nightly

* modify output_path

* modify args in make_table_results.py

* modify make table command in summary

* change pr env label

* remove irrelevant code in summary; add set output path step; add limit in harness run

* re-organize code structure

* modify limit in run harness

* modify csv_to_html input path

* modify needs in summary-nightly
2024-02-08 19:01:05 +08:00
Yishuo Wang
4d33aac7f9 quick fix qwen2 fp8 kv cache (#10135) 2024-02-08 17:04:59 +08:00
Cengguang Zhang
39d90839aa LLM: add quantize kv cache for llama. (#10086)
* feat: add quantize kv cache for llama.

* fix style.

* add quantized attention forward function.

* revert style.

* fix style.

* fix style.

* update quantized kv cache and add quantize_qkv

* fix style.

* fix style.

* optimize quantize kv cache.

* fix style.
2024-02-08 16:49:22 +08:00
Yishuo Wang
d848efe17c add quantize kv cache support for qwen2 (#10134) 2024-02-08 16:17:21 +08:00
SONG Ge
3f79128ed7 [LLM] Enable kv_cache optimization for Qwen2 on transformers-v4.37.0 (#10131)
* add support for kv_cache optimization on transformers-v4.37.0

* enable attention forward

* style fix

* disable rotary for now
2024-02-08 14:20:26 +08:00
Ruonan Wang
063dc145ac LLM: basic support for q2k (#10132)
* basic support for q2k

* fix style
2024-02-08 13:52:01 +08:00
binbin Deng
11fe5a87ec LLM: add Modelscope model example (#10126) 2024-02-08 11:18:07 +08:00