Heyang Sun
d72c0fad0d
Qwen2 SDPA forward on CPU ( #10395 )
...
* Fix Qwen1.5 CPU forward
* Update convert.py
* Update qwen2.py
2024-03-13 13:10:03 +08:00
Yishuo Wang
ca58a69b97
fix arc rms norm UT ( #10394 )
2024-03-13 13:09:15 +08:00
Wang, Jian4
0193f29411
LLM : Enable gguf float16 and Yuan2 model ( #10372 )
...
* enable float16
* add yun files
* enable yun
* enable set low_bit on yuan2
* update
* update license
* update generate
* update readme
* update python style
* update
2024-03-13 10:19:18 +08:00
Yina Chen
f5d65203c0
First token lm_head optimization ( #10318 )
...
* add lm head linear
* update
* address comments and fix style
* address comment
2024-03-13 10:11:32 +08:00
Keyan (Kyrie) Zhang
7cf01e6ec8
Add LangChain upstream ut test ( #10349 )
...
* Add LangChain upstream ut test
* Add LangChain upstream ut test
* Specify version numbers in yml script
* Correct langchain-community version
2024-03-13 09:52:45 +08:00
Xin Qiu
28c4a8cf5c
Qwen fused qkv ( #10368 )
...
* fused qkv + rope for qwen
* quantized kv cache
* fix
* update qwen
* fixed quantized qkv
* fix
* meet code review
* update split
* convert.py
* extend when no enough kv
* fix
2024-03-12 17:39:00 +08:00
Yishuo Wang
741c2bf1df
use new rms norm ( #10384 )
2024-03-12 17:29:51 +08:00
Xiangyu Tian
0ded0b4b13
LLM: Enable BigDL IPEX optimization for int4 ( #10319 )
...
Enable BigDL IPEX optimization for int4
2024-03-12 17:08:50 +08:00
binbin Deng
5d7e044dbc
LLM: add low bit option in deepspeed autotp example ( #10382 )
2024-03-12 17:07:09 +08:00
binbin Deng
df3bcc0e65
LLM: remove english_quotes dataset ( #10370 )
2024-03-12 16:57:40 +08:00
Zhao Changmin
df2b84f7de
Enable kv cache on arc batch ( #10308 )
2024-03-12 16:46:04 +08:00
Lilac09
5809a3f5fe
Add run-hbm.sh & add user guide for spr and hbm ( #10357 )
...
* add run-hbm.sh
* add spr and hbm guide
* only support quad mode
* only support quad mode
* update special cases
* update special cases
2024-03-12 16:15:27 +08:00
binbin Deng
5d996a5caf
LLM: add benchmark script for deepspeed autotp on gpu ( #10380 )
2024-03-12 15:19:57 +08:00
Keyan (Kyrie) Zhang
f9c144dc4c
Fix final logits ut failure ( #10377 )
...
* Fix final logits ut failure
* Fix final logits ut failure
* Remove Falcon from completion test for now
* Remove Falcon from unit test for now
2024-03-12 14:34:01 +08:00
Guancheng Fu
cc4148636d
[FastChat-integration] Add initial implementation for loader ( #10323 )
...
* add initial implementation for loader
* add test method for model_loader
* data
* Refine
2024-03-12 10:54:59 +08:00
WeiguangHan
17bdb1a60b
LLM: add whisper models into nightly test ( #10193 )
...
* LLM: add whisper models into nightly test
* small fix
* small fix
* add more whisper models
* test all cases
* test specific cases
* collect the csv
* store the resut
* to html
* small fix
* small test
* test all cases
* modify whisper_csv_to_html
2024-03-11 20:00:47 +08:00
binbin Deng
dbcfc5c2fa
LLM: fix error of 'AI-ModelScope/phi-2' hosted by ModelScope hub ( #10364 )
2024-03-11 16:19:17 +08:00
binbin Deng
fe27a6971c
LLM: update modelscope version ( #10367 )
2024-03-11 16:18:27 +08:00
Chen, Zhentao
a425eaabfc
fix from_pretrained when device_map=None ( #10361 )
...
* pr trigger
* fix error when device_map=None
* fix device_map=None
2024-03-11 16:06:12 +08:00
Yina Chen
d7b765fd3f
serving xpu memory opt ( #10358 )
2024-03-11 15:21:22 +08:00
Ruonan Wang
be29833b2b
LLM: fix qwen2 ( #10356 )
2024-03-11 09:29:08 +08:00
Zhicun
9026c08633
Fix llamaindex AutoTokenizer bug ( #10345 )
...
* fix tokenizer
* fix AutoTokenizer bug
* modify code style
2024-03-08 16:24:50 +08:00
Zhicun
2a10b53d73
rename docqa.py->rag.py ( #10353 )
2024-03-08 16:07:09 +08:00
Keyan (Kyrie) Zhang
f1825d7408
Add RMSNorm unit test ( #10190 )
2024-03-08 15:51:03 +08:00
Shengsheng Huang
370c52090c
Langchain readme ( #10348 )
...
* update langchain readme
* update readme
* create new README
* Update README_nativeint4.md
2024-03-08 14:57:24 +08:00
Keyan (Kyrie) Zhang
7a621a4db0
Fix device_map bug by raise an error when using device_map=xpu ( #10340 )
...
* Fix device_map bug by raise an error when using device_map=xpu
* Fix sync error
* Fix python style
* Use invalidInputError instead of invalidOperationError
2024-03-08 13:38:52 +08:00
Yishuo Wang
1ac193ba02
add rope theta argument ( #10343 )
2024-03-07 17:27:19 +08:00
Yuxuan Xia
0c8d3c9830
Add C-Eval HTML report ( #10294 )
...
* Add C-Eval HTML report
* Fix C-Eval workflow pr trigger path
* Fix C-Eval workflow typos
* Add permissions to C-Eval workflow
* Fix C-Eval workflow typo
* Add pandas dependency
* Fix C-Eval workflow typo
2024-03-07 16:44:49 +08:00
Cengguang Zhang
496d18ab6d
LLM: add quantize kv cache support for baichuan 7b and 13b. ( #10330 )
...
* add quantize kv cache for baichuan 7b and 13b.
* fix typo.
* fix.
* fix style.
* fix style.
2024-03-07 16:17:38 +08:00
hxsz1997
b7db21414e
Update llamaindex ut ( #10338 )
...
* add test_llamaindex of gpu
* add llamaindex gpu tests bash
* add llamaindex cpu tests bash
* update name of Run LLM langchain GPU test
* import llama_index in llamaindex gpu ut
* update the dependency of test_llamaindex
* add Run LLM llamaindex GPU test
* modify import dependency of llamaindex cpu test
* add Run LLM llamaindex test
* update llama_model_path
* delete unused model path
* add LLAMA2_7B_ORIGIN_PATH in llamaindex cpu test
2024-03-07 10:06:16 +08:00
ZehuaCao
267de7abc3
fix fschat DEP version error ( #10325 )
2024-03-06 16:15:27 +08:00
Yina Chen
9ea499ca68
Optimize speculative decoding PVC memory usage ( #10329 )
...
* optimize memory
* update
* update
* update
* support other models
* update
* fix style
2024-03-06 09:54:21 +08:00
dingbaorong
cc796848ea
fix typos ( #10274 )
...
Co-authored-by: Ariadne <wyn2000330@126.com>
2024-03-05 18:38:22 +08:00
hxsz1997
af11c53473
Add the installation step of postgresql and pgvector on windows in LlamaIndex GPU support ( #10328 )
...
* add the installation of postgresql and pgvector of windows
* fix some format
2024-03-05 18:31:19 +08:00
Yishuo Wang
0011ff9f64
optimize bge large performance ( #10324 )
2024-03-05 17:06:03 +08:00
Shaojun Liu
178eea5009
upload bigdl-llm wheel to sourceforge for backup ( #10321 )
...
* test: upload to sourceforge
* update scripts
* revert
2024-03-05 16:36:01 +08:00
Cengguang Zhang
30d009bca7
LLM: support quantized kv cache for Mistral in transformers >=4.36.0 ( #10326 )
...
* support quantize kv for mistral in transformers 4.36
* update mistral support.
* fix style.
2024-03-05 16:23:50 +08:00
dingbaorong
1e6f0c6f1a
Add llamaindex gpu example ( #10314 )
...
* add llamaindex example
* fix core dump
* refine readme
* add trouble shooting
* refine readme
---------
Co-authored-by: Ariadne <wyn2000330@126.com>
2024-03-05 13:36:00 +08:00
dingbaorong
fc7f10cd12
add langchain gpu example ( #10277 )
...
* first draft
* fix
* add readme for transformer_int4_gpu
* fix doc
* check device_map
* add arc ut test
* fix ut test
* fix langchain ut
* Refine README
* fix gpu mem too high
* fix ut test
---------
Co-authored-by: Ariadne <wyn2000330@126.com>
2024-03-05 13:33:57 +08:00
Yuwen Hu
5dbbe1a826
[LLM] Support for new arc ut runner ( #10311 )
...
* Support for new arc ut runner
* Comment unnecessary OMP_NUM_THREADS related settings for arc uts
2024-03-04 18:42:02 +08:00
Yuwen Hu
d45e577d8c
[LLM] Test load_low_bit in iGPU perf test on Windows ( #10313 )
2024-03-04 18:03:57 +08:00
WeiguangHan
fd81d66047
LLM: Compress some models to save space ( #10315 )
...
* LLM: compress some models to save space
* add deleted comments
2024-03-04 17:53:03 +08:00
Shaojun Liu
bab2ee5f9e
update nightly spr perf test ( #10178 )
...
* update nightly spr perf test
* update
* update runner lable
* update
* update
* update folder
* revert
2024-03-04 13:46:33 +08:00
Cengguang Zhang
ab9fc2485f
LLM: add quantize kv support for llama transformer 4.36 ( #10298 )
...
* add quantize kv support for llama transformer 4.36
* fix style.
* fix style.
2024-03-04 10:33:35 +08:00
Xin Qiu
58208a5883
Update FAQ document. ( #10300 )
...
* Update install_gpu.md
* Update resolve_error.md
* Update README.md
* Update resolve_error.md
* Update README.md
* Update resolve_error.md
2024-03-04 08:35:11 +08:00
Yuwen Hu
27d9a14989
[LLM] all-on-one update: memory optimize and streaming output ( #10302 )
...
* Memory saving for continous in-out pair run and add support for streaming output on MTL iGPU
* Small fix
* Small fix
* Add things back
2024-03-01 18:02:30 +08:00
SONG Ge
0ab40917fb
[LLM] Split merged_qk to separated q/k linear ( #10299 )
...
* modify merge_qk_linear to separated q/k linear
* update
2024-03-01 16:48:55 +08:00
Yang Wang
f4d7dbcde2
use fused qkv forward in qwen2 ( #10185 )
...
* use fused qkv forward in qwen2
* support both
* fix style
* fix rope
* remove pring
* fix style
* clean up
2024-03-01 16:46:35 +08:00
Xin Qiu
509e206de0
update doc about gemma random and unreadable output. ( #10297 )
...
* Update install_gpu.md
* Update README.md
* Update README.md
2024-03-01 15:41:16 +08:00
Wang, Jian4
beb9433cec
LLM: Reduce speculative _ipex_optimize_model memory use ( #10281 )
...
* use tpp
* update ipex
2024-03-01 13:48:23 +08:00
Yuwen Hu
f0ff0eebe1
[LLM] Support quantize kv cache for Baichuan2 7B ( #10280 )
...
* Add quatized kv cache framework for Baichuan2 7B
* Support quantize kv cache for baichuan2
* Small fix
* Fix python style
2024-03-01 13:35:42 +08:00
SONG Ge
273de341d7
hot-fix silu error import ( #10292 )
2024-03-01 10:11:37 +08:00
Shengsheng Huang
bcfad555df
revise llamaindex readme ( #10283 )
2024-02-29 17:19:23 +08:00
Xin Qiu
232273a1b5
Enable Gemma fused mlp + Gelu ( #10276 )
...
* update llama mlp forward
* add all
* fix style check
* split
* update
* update
* update
* fix style
2024-02-29 16:53:24 +08:00
Guancheng Fu
2d930bdca8
Add vLLM bf16 support ( #10278 )
...
* add argument load_in_low_bit
* add docs
* modify gpu doc
* done
---------
Co-authored-by: ivy-lv11 <lvzc@lamda.nju.edu.cn>
2024-02-29 16:33:42 +08:00
SONG Ge
13b0bc9075
[LLM] Add quantize_kv optimization for yuan2 model ( #10243 )
...
* add initial quantize_kv support for yuan2 model
* fix yuan2 quantize_kv generation
* apply fp16 conv layer optimizations
* disable mlp for quantize_kv
2024-02-29 16:33:26 +08:00
Zhicun
4e6cc424f1
Add LlamaIndex RAG ( #10263 )
...
* run demo
* format code
* add llamaindex
* add custom LLM with bigdl
* update
* add readme
* begin ut
* add unit test
* add license
* add license
* revised
* update
* modify docs
* remove data folder
* update
* modify prompt
* fixed
* fixed
* fixed
2024-02-29 15:21:19 +08:00
Jin Qiao
5d7243067c
LLM: add Baichuan2-13B-Chat 2048-256 to MTL perf ( #10273 )
2024-02-29 13:48:55 +08:00
Ruonan Wang
a9fd20b6ba
LLM: Update qkv fusion for GGUF-IQ2 ( #10271 )
...
* first commit
* update mistral
* fix transformers==4.36.0
* fix
* disable qk for mixtral now
* fix style
2024-02-29 12:49:53 +08:00
Jiao Wang
6fb65bb9d2
fix in transformers 4.36 ( #10150 )
2024-02-28 18:43:01 -08:00
Shengsheng Huang
43dac97e03
Update README.md ( #10260 )
2024-02-29 10:41:14 +08:00
Ruonan Wang
4b08bc1417
LLM: relax batch check of flash atttention by double check attention mask ( #10270 )
...
* relax batch check
* fix
* fix style
2024-02-29 09:39:55 +08:00
Yina Chen
07f36fbfcc
Fix gptj failed to extend ( #10269 )
2024-02-29 09:39:27 +08:00
Yishuo Wang
cccb02dad1
fix baichuan2 13b 2k input ( #10267 )
2024-02-28 17:20:20 +08:00
Heyang Sun
7244fd1ba5
Fix Arc StarCoder wrong query_shape when input is long ( #10268 )
...
* Fix Arc StarCoder wrong query_shape when input is long
* Update gptbigcode.py
2024-02-28 17:07:08 +08:00
Cengguang Zhang
a4de3095f3
LLM: Support quantize kv cache in mistral. ( #10261 )
...
* init
* update quantize kv.
2024-02-28 14:08:08 +08:00
Shengsheng Huang
db0d129226
Revert "Add rwkv example ( #9432 )" ( #10264 )
...
This reverts commit 6930422b42 .
2024-02-28 11:48:31 +08:00
Yining Wang
6930422b42
Add rwkv example ( #9432 )
...
* codeshell fix wrong urls
* restart runner
* add RWKV CPU & GPU example (rwkv-4-world-7b)
* restart runner
* update submodule
* fix runner
* runner-test
---------
Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-02-28 11:41:00 +08:00
Keyan (Kyrie) Zhang
59861f73e5
Add Deepseek-6.7B ( #9991 )
...
* Add new example Deepseek
* Add new example Deepseek
* Add new example Deepseek
* Add new example Deepseek
* Add new example Deepseek
* modify deepseek
* modify deepseek
* Add verified model in README
* Turn cpu_embedding=True in Deepseek example
---------
Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-02-28 11:36:39 +08:00
Yuxuan Xia
2524273198
Update AutoGen README ( #10255 )
...
* Update AutoGen README
* Fix AutoGen README typos
* Update AutoGen README
* Update AutoGen README
2024-02-28 11:34:45 +08:00
Zheng, Yi
2347f611cf
Add cpu and gpu examples of Mamba ( #9797 )
...
* Add mamba cpu example
* Add mamba gpu example
* Use a smaller model as the example
* minor fixes
---------
Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-02-28 11:33:29 +08:00
Zhao Changmin
937e1f7c74
rebase ( #9104 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2024-02-28 11:18:21 +08:00
JunX
4833067489
fix GPU example link in README.md ( #9533 )
...
* fix GPU example link in README.md
* fix GPU links in llm README.md
2024-02-28 11:13:18 +08:00
Zhicun
308e637d0d
Add DeepSeek-MoE-16B-Chat ( #10155 )
...
* dsmoe-hf add
* add dsmoe pytorch
* update README
* modify comment
* remove GPU example
* update model name
* format code
2024-02-28 10:12:09 +08:00
Guoqiong Song
f4a2e32106
Stream llm example for both GPU and CPU ( #9390 )
2024-02-27 15:54:47 -08:00
Yang Wang
c581c6db30
draft mmint4 ( #10031 )
...
change to llm.cpp
support transposed format
revert
implement qkv fuse
fix style
change to vertically pack
change to enable_xetla
fix mlp_fusion_check
remove comments
address comments
add some comments
fix style
2024-02-27 14:55:16 -08:00
hxsz1997
cba61a2909
Add html report of ppl ( #10218 )
...
* remove include and language option, select the corresponding dataset based on the model name in Run
* change the nightly test time
* change the nightly test time of harness and ppl
* save the ppl result to json file
* generate csv file and print table result
* generate html
* modify the way to get parent folder
* update html in parent folder
* add llm-ppl-summary and llm-ppl-summary-html
* modify echo single result
* remove download fp16.csv
* change model name of PR
* move ppl nightly related files to llm/test folder
* reformat
* seperate make_table from make_table_and_csv.py
* separate make_csv from make_table_and_csv.py
* update llm-ppl-html
* remove comment
* add Download fp16.results
2024-02-27 17:37:08 +08:00
Zhicun
6d60982746
Env script: add license ( #10257 )
...
* env script
* update README.md
* modify README
* modify cpu info output
* add env-check.sh
* add env-check.bat
* add windows
* modify bat
* add license
2024-02-27 15:29:20 +08:00
Yishuo Wang
b4fa4ab46f
optimize yuan 2.0 again ( #10252 )
2024-02-27 14:51:42 +08:00
Zhicun
03b9c4930a
UX: Script to print env info ( #10088 )
...
* env script
* update README.md
* modify README
* modify cpu info output
* add env-check.sh
* add env-check.bat
* add windows
* modify bat
2024-02-27 14:45:36 +08:00
Keyan (Kyrie) Zhang
843fe546b0
Add CPU and GPU examples for DeciLM-7B ( #9867 )
...
* Add cpu and gpu examples for DeciLM-7B
* Add cpu and gpu examples for DeciLM-7B
* Add DeciLM-7B to README table
* modify deciLM
* modify deciLM
* modify deciLM
* Add verified model in README
* Add cpu_embedding=True
2024-02-27 13:15:49 +08:00
Yuwen Hu
38ae4b372f
Add yuan2-2b to win igpu perf test ( #10250 )
2024-02-27 11:08:33 +08:00
Heyang Sun
36a9e88104
Speculative Starcoder on CPU ( #10138 )
...
* Speculative Starcoder on CPU
* enable kv-cache pre-allocation
* refine codes
* refine
* fix style
* fix style
* fix style
* refine
* refine
* Update speculative.py
* Update gptbigcode.py
* fix style
* Update speculative.py
* enable mixed-datatype layernorm on top of torch API
* adaptive dtype
* Update README.md
2024-02-27 09:57:29 +08:00
Yishuo Wang
a47989c860
optimize yuan 2.0 performance ( #10244 )
2024-02-26 17:20:10 +08:00
Wang, Jian4
6c74b99a28
LLM: Update qwen readme ( #10245 )
2024-02-26 17:03:09 +08:00
hxsz1997
15ad2fd72e
Merge pull request #10226 from zhentaocc/fix_harness
...
Fix harness
2024-02-26 16:49:27 +08:00
Wang, Jian4
f9b75f900b
LLM: Enable qwen target_model ipex ( #10232 )
...
* change order
* enable qwen ipex
* update qwen example
* update
* fix style
* update
2024-02-26 16:41:12 +08:00
Jin Qiao
3e6d188553
LLM: add baichuan2-13b to mtl perf ( #10238 )
2024-02-26 15:55:56 +08:00
Yuwen Hu
e38e29511c
[LLM] Yuan2 MLP and Rotary optimization ( #10231 )
...
* Add optimization for rotary embedding
* Add mlp fused optimizatgion
* Python style fix
* Fix rotary embedding due to logits difference
* Small fix
2024-02-26 15:10:08 +08:00
Ziteng Zhang
ea23afc8ec
[LLM]update ipex part in mistral example readme ( #10239 )
...
* update ipex part in mistral example readme
2024-02-26 14:35:20 +08:00
SONG Ge
df2f3885ba
[LLM] Enable kv_cache and forward_qkv optimizations for yuan2 ( #10225 )
...
* add init kv_cache support for yuan2
* add forward qkv in yuan
2024-02-26 11:29:48 +08:00
Xiangyu Tian
85a99e13e8
LLM: Fix ChatGLM3 Speculative Example ( #10236 )
...
Fix ChatGLM3 Speculative Example.
2024-02-26 10:57:28 +08:00
Chen, Zhentao
213ef06691
fix readme
2024-02-24 00:38:08 +08:00
Ruonan Wang
28513f3978
LLM: support fp16 embedding & add mlp fusion for iq2_xxs ( #10219 )
...
* add fp16 embed
* small fixes
* fix style
* fix style
* fix comment
2024-02-23 17:26:24 +08:00
Yuwen Hu
eeecd9fc08
Python style fix ( #10230 )
2024-02-23 17:21:23 +08:00
Yuwen Hu
e511bbd8f1
[LLM] Add basic optimization framework for Yuan2 ( #10227 )
...
* Add basic optimization framework for Yuan2
* Small fix
* Python style fix
* Small fix
* Small fix
2024-02-23 17:05:00 +08:00
Xin Qiu
8ef5482da2
update Gemma readme ( #10229 )
...
* Update README.md
* Update README.md
* Update README.md
* Update README.md
2024-02-23 16:57:08 +08:00
Chen, Zhentao
6fe5344fa6
separate make_csv from the file
2024-02-23 16:33:38 +08:00
Chen, Zhentao
bfa98666a6
fall back to make_table.py
2024-02-23 16:33:38 +08:00
Ruonan Wang
19260492c7
LLM: fix action/installation error of mpmath ( #10223 )
...
* fix
* test
* fix
* update
2024-02-23 16:14:53 +08:00
Xin Qiu
aabfc06977
add gemma example ( #10224 )
...
* add gemma gpu example
* Update README.md
* add cpu example
* Update README.md
* Update README.md
* Update generate.py
* Update generate.py
2024-02-23 15:20:57 +08:00
yb-peng
a2c1675546
Add CPU and GPU examples for Yuan2-2B-hf ( #9946 )
...
* Add a new CPU example of Yuan2-2B-hf
* Add a new CPU generate.py of Yuan2-2B-hf example
* Add a new GPU example of Yuan2-2B-hf
* Add Yuan2 to README table
* In CPU example:1.Use English as default prompt; 2.Provide modified files in yuan2-2B-instruct
* In GPU example:1.Use English as default prompt;2.Provide modified files
* GPU example:update README
* update Yuan2-2B-hf in README table
* Add CPU example for Yuan2-2B in Pytorch-Models
* Add GPU example for Yuan2-2B in Pytorch-Models
* Add license in generate.py; Modify README
* In GPU Add license in generate.py; Modify README
* In CPU yuan2 modify README
* In GPU yuan2 modify README
* In CPU yuan2 modify README
* In GPU example, updated the readme for Windows GPU supports
* In GPU torch example, updated the readme for Windows GPU supports
* GPU hf example README modified
* GPU example README modified
2024-02-23 14:09:30 +08:00
yb-peng
f1f4094a09
Add CPU and GPU examples of phi-2 ( #10014 )
...
* Add CPU and GPU examples of phi-2
* In GPU hf example, updated the readme for Windows GPU supports
* In GPU torch example, updated the readme for Windows GPU supports
* update the table in BigDL/README.md
* update the table in BigDL/python/llm/README.md
2024-02-23 14:05:53 +08:00
Chen, Zhentao
f315c7f93a
Move harness nightly related files to llm/test folder ( #10209 )
...
* move harness nightly files to test folder
* change workflow file path accordingly
* use arc01 when pr
* fix path
* fix fp16 csv path
2024-02-23 11:12:36 +08:00
Xin Qiu
30795bdfbc
Gemma optimization: rms_norm, kv_cache, fused_rope, fused_rope+qkv ( #10212 )
...
* gemma optimization
* update
* update
* fix style
* meet code review
2024-02-23 10:07:24 +08:00
Guoqiong Song
63681af97e
falcon for transformers 4.36 ( #9960 )
...
* falcon for transformers 4.36
2024-02-22 17:04:40 -08:00
Jason Dai
84d5f40936
Update README.md ( #10213 )
2024-02-22 17:22:59 +08:00
Yina Chen
ce5840a8b7
GPT-J rope optimization on xpu ( #10182 )
...
* optimize
* update
* fix style & move use_fuse_rope
* add ipex version check
* fix style
* update
* fix style
* meet comments
* address comments
* fix style
2024-02-22 16:25:12 +08:00
Xiangyu Tian
f445217d02
LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize ( #10189 )
...
Update IPEX to 2.2.0+cpu and refactor for _ipex_optimize.
2024-02-22 16:01:11 +08:00
Heyang Sun
c876d9b5ca
Support for MPT rotary embedding ( #10208 )
2024-02-22 15:16:31 +08:00
Ruonan Wang
5e1fee5e05
LLM: add GGUF-IQ2 examples ( #10207 )
...
* add iq2 examples
* small fix
* meet code review
* fix
* meet review
* small fix
2024-02-22 14:18:45 +08:00
Yuwen Hu
21de2613ce
[LLM] Add model loading time record for all-in-one benchmark ( #10201 )
...
* Add model loading time record in csv for all-in-one benchmark
* Small fix
* Small fix to number after .
2024-02-22 13:57:18 +08:00
Ovo233
60e11b6739
LLM: Add mlp layer unit tests ( #10200 )
...
* add mlp layer unit tests
* add download baichuan-13b
* exclude llama for now
* install additional packages
* rename bash file
* switch to Baichuan2
* delete attention related code
* fix name errors in yml file
2024-02-22 13:44:45 +08:00
SONG Ge
ca1166a0e5
[LLM] Add quantize kv_cache for Baichuan2-13B ( #10203 )
...
* add quantize kv_cache for baichuan2-13b
* style fix
2024-02-22 13:43:35 +08:00
Ruonan Wang
34ee1aa91f
LLM: add esimd sdp support for chatglm3 ( #10205 )
...
* add esimd sdp support
* fix style
2024-02-22 13:37:16 +08:00
Yuxuan Xia
7cbc2429a6
Fix C-Eval ChatGLM loading issue ( #10206 )
...
* Add c-eval workflow and modify running files
* Modify the chatglm evaluator file
* Modify the ceval workflow for triggering test
* Modify the ceval workflow file
* Modify the ceval workflow file
* Modify ceval workflow
* Adjust the ceval dataset download
* Add ceval workflow dependencies
* Modify ceval workflow dataset download
* Add ceval test dependencies
* Add ceval test dependencies
* Correct the result print
* Fix the nightly test trigger time
* Fix ChatGLM loading issue
2024-02-22 10:00:43 +08:00
Yuwen Hu
94cb16fe40
[LLM] Small updates to Win GPU Install Doc ( #10199 )
...
* Make Offline installer as default for win gpu doc for oneAPI
* Small other fixes
2024-02-21 17:58:40 +08:00
binbin Deng
9975b029c5
LLM: add qlora finetuning example using trl.SFTTrainer ( #10183 )
2024-02-21 16:40:04 +08:00
Ruonan Wang
f7c96b19ef
LLM: support iq2 for mixtral ( #10191 )
...
* support name mapping for mixtral
* support mixtral mixed quantization
* fix style
* fix
2024-02-21 16:00:29 +08:00
yb-peng
b1a97b71a9
Harness eval: Add is_last parameter and fix logical operator in highlight_vals ( #10192 )
...
* Add is_last parameter and fix logical operator in highlight_vals
* Add script to update HTML files in parent folder
* Add running update_html_in_parent_folder.py in summarize step
* Add licence info
* Remove update_html_in_parent_folder.py in Summarize the results for pull request
2024-02-21 14:45:32 +08:00
Zhicun
c7e839e66c
Add Qwen1.5-7B-Chat ( #10113 )
...
* add Qwen1.5-7B-Chat
* modify Qwen1.5 example
* update README
* update prompt format
* update folder name and example README
* add Chinese prompt sample output
* update link in README
* correct the link
* update transformer version
2024-02-21 13:29:29 +08:00
Xin Qiu
56ad781f2f
qwen2 cpu fix ( #10187 )
2024-02-21 11:23:51 +08:00
Chen, Zhentao
39d37bd042
upgrade harness package version in workflow ( #10188 )
...
* upgrade harness
* update readme
2024-02-21 11:21:30 +08:00
Yuwen Hu
001c13243e
[LLM] Add support for low_low_bit benchmark on Windows GPU ( #10167 )
...
* Add support for low_low_bit performance test on Windows GPU
* Small fix
* Small fix
* Save memory during converting model process
* Drop the results for first time when loading in low bit on mtl igpu for better performance
* Small fix
2024-02-21 10:51:52 +08:00
Ziteng Zhang
276ef0e885
Speculative Ziya on CPU ( #10160 )
...
* Speculative Ziya on CPU
* Without part of Accelerate with BIGDL_OPT_IPEX
2024-02-21 10:30:39 +08:00
Zhao Changmin
4fbf449c2d
for rwkv4 ( #10179 )
2024-02-21 10:11:10 +08:00
yb-peng
de3dc609ee
Modify harness evaluation workflow ( #10174 )
...
* Modify table head in harness
* Specify the file path of fp16.csv
* change run to run nightly and run pr to debug
* Modify the way to get fp16.csv to downloading from github
* Change the method to calculate diff in html table
* Change the method to calculate diff in html table
* Re-arrange job order
* Re-arrange job order
* Change limit
* Change fp16.csv path
* Change highlight rules
* Change limit
2024-02-20 18:55:43 +08:00
Ruonan Wang
3288acb8de
LLM : Support embedding quantization (only q2k now) ( #10170 )
...
* basic logic added
* basic support
* support save&load, update mixed strategy
* fix style
* use int8 for lm_head
* add check for xpu
2024-02-20 16:56:57 +08:00
hxsz1997
6e10d98a8d
Fix some typos ( #10175 )
...
* add llm-ppl workflow
* update the DATASET_DIR
* test multiple precisions
* modify nightly test
* match the updated ppl code
* add matrix.include
* fix the include error
* update the include
* add more model
* update the precision of include
* update nightly time and add more models
* fix the workflow_dispatch description, change default model of pr and modify the env
* modify workflow_dispatch language options
* modify options
* modify language options
* modeify workflow_dispatch type
* modify type
* modify the type of language
* change seq_len type
* fix some typos
* revert changes to stress_test.txt
2024-02-20 14:14:53 +08:00
Zhicun
add3899311
Add ziya CPU example ( #10114 )
...
* ziya on CPU
* add README for ziya
* specify use_cache
* add arc CPU
* update prompt format
* update link
* add comments to emphasize use_cache
* update pip cmd
2024-02-20 13:59:52 +08:00
binbin Deng
2bb96c775c
LLM: fix device setting during saving optimized model ( #10154 )
2024-02-20 09:52:59 +08:00
Xin Qiu
1f6d5b9f30
enable fused rmsnorm and rope qwen2 ( #10163 )
...
* qwen2
* change convert
* cleanup
2024-02-20 08:33:09 +08:00
yb-peng
e31210ba00
Modify html table style and add fp16.csv in harness ( #10169 )
...
* Specify the version of pandas in harness evaluation workflow
* Specify the version of pandas in harness evaluation workflow
* Modify html table style and add fp16.csv in harness
* Modify comments
2024-02-19 18:13:40 +08:00
WeiguangHan
6c09aed90d
LLM: add qwen_1.5_7b model for arc perf test ( #10166 )
...
* LLM: add qwen_1.5_7b model for arc perf test
* small fix
* revert some codes
2024-02-19 17:21:00 +08:00
Yuxuan Xia
209122559a
Add Ceval workflow and modify the result printing ( #10140 )
...
* Add c-eval workflow and modify running files
* Modify the chatglm evaluator file
* Modify the ceval workflow for triggering test
* Modify the ceval workflow file
* Modify the ceval workflow file
* Modify ceval workflow
* Adjust the ceval dataset download
* Add ceval workflow dependencies
* Modify ceval workflow dataset download
* Add ceval test dependencies
* Add ceval test dependencies
* Correct the result print
2024-02-19 17:06:53 +08:00
Zhao Changmin
f8730e8dc1
Skip rescale rwkv linear when load_low_bit ( #10164 )
...
* rwkv_ld
2024-02-19 15:56:42 +08:00
Heyang Sun
3e2af5ec0a
Fix IPEX Baichuan Speculative ( #10162 )
...
* Fix IPEX Baichuan Speculative
* compatible with 13B
* Update speculative.py
2024-02-19 15:27:34 +08:00
Yina Chen
23c91cdce6
[LLM] Add min_step_draft in speculative decoding ( #10142 )
...
* Fix gptj kvcache & position id
* Add min_draft_tokens in speculative decoding
* fix style
* update
2024-02-19 14:31:41 +08:00
Chen, Zhentao
14ba2c5135
Harness: remove deprecated files ( #10165 )
2024-02-19 14:27:49 +08:00
Wang, Jian4
d3591383d5
LLM : Add CPU chatglm3 speculative example ( #10004 )
...
* init chatglm
* update
* update
2024-02-19 13:38:52 +08:00
Wang, Jian4
f2417e083c
LLM: enable chatglm3-6b target_model ipex ( #10085 )
...
* init
* always make casual_mask
* not return last tensor
* update
* optimize_model = False
* enable optimized=False
* enable optimized_model=true
* speed_up ipex target_model
* remove if True
* use group_size
* update python style
* update
* update
2024-02-19 13:38:32 +08:00
Heyang Sun
177273c1a4
IPEX Speculative Support for Baichuan2 7B ( #10112 )
...
* IPEX Speculative Support for Baichuan2 7B
* fix license problems
* refine
2024-02-19 09:12:57 +08:00
Yina Chen
1508d6b089
Fix gptj kvcache & position id ( #10141 )
2024-02-18 10:02:49 +08:00
yb-peng
b4dc33def6
In harness-evaluation workflow, add statistical tables ( #10118 )
...
* chnage storage
* fix typo
* change label
* change label to arc03
* change needs in the last step
* add generate csv in harness/make_table_results.py
* modify needs in the last job
* add csv to html
* mfix path issue in llm-harness-summary-nightly
* modify output_path
* modify args in make_table_results.py
* modify make table command in summary
* change pr env label
* remove irrelevant code in summary; add set output path step; add limit in harness run
* re-organize code structure
* modify limit in run harness
* modify csv_to_html input path
* modify needs in summary-nightly
2024-02-08 19:01:05 +08:00
Yishuo Wang
4d33aac7f9
quick fix qwen2 fp8 kv cache ( #10135 )
2024-02-08 17:04:59 +08:00
Cengguang Zhang
39d90839aa
LLM: add quantize kv cache for llama. ( #10086 )
...
* feat: add quantize kv cache for llama.
* fix style.
* add quantized attention forward function.
* revert style.
* fix style.
* fix style.
* update quantized kv cache and add quantize_qkv
* fix style.
* fix style.
* optimize quantize kv cache.
* fix style.
2024-02-08 16:49:22 +08:00
Yishuo Wang
d848efe17c
add quantize kv cache support for qwen2 ( #10134 )
2024-02-08 16:17:21 +08:00
SONG Ge
3f79128ed7
[LLM] Enable kv_cache optimization for Qwen2 on transformers-v4.37.0 ( #10131 )
...
* add support for kv_cache optimization on transformers-v4.37.0
* enable attention forward
* style fix
* disable rotary for now
2024-02-08 14:20:26 +08:00
Ruonan Wang
063dc145ac
LLM: basic support for q2k ( #10132 )
...
* basic support for q2k
* fix style
2024-02-08 13:52:01 +08:00
binbin Deng
11fe5a87ec
LLM: add Modelscope model example ( #10126 )
2024-02-08 11:18:07 +08:00