Yuwen Hu
fac49f15e3
Remove manual importing ipex in all-in-one benchmark ( #11272 )
2024-06-11 09:32:13 +08:00
Wenjing Margaret Mao
70b17c87be
Merge multiple batches ( #11264 )
...
* add merge steps
* move to pr mode
* remove build + add merge.py
* add tohtml and change cp
* change test_batch folder path
* change merge_temp path
* change to html folder
* revert
* change place
* revert 437
* revert space
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-07 18:38:45 +08:00
Xiangyu Tian
4b07712fd8
LLM: Fix vLLM CPU model convert mismatch ( #11254 )
...
Fix vLLM CPU model convert mismatch.
2024-06-07 15:54:34 +08:00
Yishuo Wang
42fab480ea
support stablm2 12b ( #11265 )
2024-06-07 15:46:00 +08:00
Xin Qiu
dbc3c2d72d
glm4 sdp ( #11253 )
...
* glm4 sdp
* fix style
* update comment
2024-06-07 15:42:23 +08:00
Xin Qiu
151fcf37bb
check devie name in use_flash_attention ( #11263 )
2024-06-07 15:07:47 +08:00
Yishuo Wang
2623944604
qwen2 sdpa small fix ( #11261 )
2024-06-07 14:42:18 +08:00
Yishuo Wang
ea0d03fd28
Refactor baichuan1 7B and 13B ( #11258 )
2024-06-07 14:29:20 +08:00
Qiyuan Gong
1aa9c9597a
Avoid duplicate import in IPEX auto importer ( #11227 )
...
* Add custom import to avoid ipex duplicate importing
* Add scope limitation
2024-06-07 14:08:00 +08:00
Wang, Jian4
6f2684e5c9
Update pp llama.py to save memory ( #11233 )
2024-06-07 13:18:16 +08:00
Yishuo Wang
ef8e9b2ecd
Refactor qwen2 moe ( #11244 )
2024-06-07 13:14:54 +08:00
Zijie Li
7b753dc8ca
Update sample output for HF Qwen2 GPU and CPU ( #11257 )
2024-06-07 11:36:22 +08:00
Zhao Changmin
b7948671de
[WIP] Add look up table in 1st token stage ( #11193 )
...
* lookuptb
2024-06-07 10:51:05 +08:00
Yuwen Hu
8c36b5bdde
Add qwen2 example ( #11252 )
...
* Add GPU example for Qwen2
* Update comments in README
* Update README for Qwen2 GPU example
* Add CPU example for Qwen2
Sample Output under README pending
* Update generate.py and README for CPU Qwen2
* Update GPU example for Qwen2
* Small update
* Small fix
* Add Qwen2 table
* Update README for Qwen2 CPU and GPU
Update sample output under README
---------
Co-authored-by: Zijie Li <michael20001122@gmail.com>
2024-06-07 10:29:33 +08:00
Shaojun Liu
85df5e7699
fix nightly perf test ( #11251 )
2024-06-07 09:33:14 +08:00
Xin Qiu
2f809116e2
optimize Chatglm4 ( #11239 )
...
* chatglm4
* update
* update
* add rms norm
* chatglm4
2024-06-06 18:25:20 +08:00
hxsz1997
b6234eb4e2
Add task in allinone ( #11226 )
...
* add task
* update prompt
* modify typos
* add more cases in summarize
* Make the summarize & QA prompt preprocessing as a util function
2024-06-06 17:22:40 +08:00
Yishuo Wang
2e4ccd541c
fix qwen2 cpu ( #11240 )
2024-06-06 16:24:19 +08:00
Yishuo Wang
e738ec38f4
disable quantize kv in specific qwen model ( #11238 )
2024-06-06 14:08:39 +08:00
Yishuo Wang
c4e5806e01
add latest optimization in starcoder2 ( #11236 )
2024-06-06 14:02:17 +08:00
Yishuo Wang
ba27e750b1
refactor yuan2 ( #11235 )
2024-06-06 13:17:54 +08:00
Shaojun Liu
6be24fdd28
OSPDT: add tpp licenses ( #11165 )
...
* add tpp licenses
* add licenses
* add licenses
* delete mitchellh-mapstructure license
* delete stb-image public domain license
* add README.md
* remove core-xe related licenses
2024-06-06 10:59:06 +08:00
Guoqiong Song
09c6780d0c
phi-2 transformers 4.37 ( #11161 )
...
* phi-2 transformers 4.37
2024-06-05 13:36:41 -07:00
Guoqiong Song
f6d5c6af78
fix issue 1407 ( #11171 )
2024-06-05 13:35:57 -07:00
Zijie Li
bfa1367149
Add CPU and GPU example for MiniCPM ( #11202 )
...
* Change installation address
Change former address: "https://docs.conda.io/en/latest/miniconda.html# " to new address: "https://conda-forge.org/download/ " for 63 occurrences under python\llm\example
* Change Prompt
Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence
* Create and update model minicpm
* Update model minicpm
Update model minicpm under GPU/PyTorch-Models
* Update readme and generate.py
change "prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)" and delete "pip install transformers==4.37.0
"
* Update comments for minicpm GPU
Update comments for generate.py at minicpm GPU
* Add CPU example for MiniCPM
* Update minicpm README for CPU
* Update README for MiniCPM and Llama3
* Update Readme for Llama3 CPU Pytorch
* Update and fix comments for MiniCPM
2024-06-05 18:09:53 +08:00
Yuwen Hu
af96579c76
Update installation guide for pipeline parallel inference ( #11224 )
...
* Update installation guide for pipeline parallel inference
* Small fix
* further fix
* Small fix
* Small fix
* Update based on comments
* Small fix
* Small fix
* Small fix
2024-06-05 17:54:29 +08:00
Yina Chen
ed67435491
Support Fp6 k in ipex-llm ( #11222 )
...
* support fp6_k
* support fp6_k
* remove
* fix style
2024-06-05 17:34:36 +08:00
binbin Deng
a6674f5bce
Fix should_use_fuse_rope error of Qwen1.5-MoE-A2.7B-Chat ( #11216 )
2024-06-05 15:56:10 +08:00
Wenjing Margaret Mao
231b968aba
Modify the check_results.py to support batch 2&4 ( #11133 )
...
* add batch 2&4 and exclude to perf_test
* modify the perf-test&437 yaml
* modify llm_performance_test.yml
* remove batch 4
* modify check_results.py to support batch 2&4
* change the batch_size format
* remove genxir
* add str(batch_size)
* change actual_test_casese in check_results file to support batch_size
* change html highlight
* less models to test html and html_path
* delete the moe model
* split batch html
* split
* use installing from pypi
* use installing from pypi - batch2
* revert cpp
* revert cpp
* merge two jobs into one, test batch_size in one job
* merge two jobs into one, test batch_size in one job
* change file directory in workflow
* try catch deal with odd file without batch_size
* modify pandas version
* change the dir
* organize the code
* organize the code
* remove Qwen-MOE
* modify based on feedback
* modify based on feedback
* modify based on second round of feedback
* modify based on second round of feedback + change run-arc.sh mode
* modify based on second round of feedback + revert config
* modify based on second round of feedback + revert config
* modify based on second round of feedback + remove comments
* modify based on second round of feedback + remove comments
* modify based on second round of feedback + revert arc-perf-test
* modify based on third round of feedback
* change error type
* change error type
* modify check_results.html
* split batch into two folders
* add all models
* move csv_name
* revert pr test
* revert pr test
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-05 15:04:55 +08:00
Xin Qiu
566691c5a3
quantized attention forward for minicpm ( #11200 )
...
* quantized minicpm
* fix style check
2024-06-05 09:15:25 +08:00
Jiao Wang
bb83bc23fd
Fix Starcoder issue on CPU on transformers 4.36+ ( #11190 )
...
* fix starcoder for sdpa
* update
* style
2024-06-04 10:05:40 -07:00
Kai Huang
f93664147c
Update config.yaml ( #11208 )
...
* update config.yaml
* fix
* minor
* style
2024-06-04 19:58:18 +08:00
Xiangyu Tian
ac3d53ff5d
LLM: Fix vLLM CPU version error ( #11206 )
...
Fix vLLM CPU version error
2024-06-04 19:10:23 +08:00
Ruonan Wang
1dde204775
update q6k ( #11205 )
2024-06-04 17:14:33 +08:00
Qiyuan Gong
ce3f08b25a
Fix IPEX auto importer ( #11192 )
...
* Fix ipex auto importer with Python builtins.
* Raise errors if the user imports ipex manually before importing ipex_llm. Do nothing if they import ipex after importing ipex_llm.
* Remove import ipex in examples.
2024-06-04 16:57:18 +08:00
Yina Chen
711fa0199e
Fix fp6k phi3 ppl core dump ( #11204 )
2024-06-04 16:44:27 +08:00
Xiangyu Tian
f02f097002
Fix vLLM verion in CPU/vLLM-Serving example README ( #11201 )
2024-06-04 15:56:55 +08:00
Yishuo Wang
6454655dcc
use sdp in baichuan2 13b ( #11198 )
2024-06-04 15:39:00 +08:00
Yishuo Wang
d90cd977d0
refactor stablelm ( #11195 )
2024-06-04 13:14:43 +08:00
Zijie Li
a644e9409b
Miniconda/Anaconda -> Miniforge update in examples ( #11194 )
...
* Change installation address
Change former address: "https://docs.conda.io/en/latest/miniconda.html# " to new address: "https://conda-forge.org/download/ " for 63 occurrences under python\llm\example
* Change Prompt
Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence
2024-06-04 10:14:02 +08:00
Xin Qiu
5f13700c9f
optimize Minicpm ( #11189 )
...
* minicpm optimize
* update
2024-06-03 18:28:29 +08:00
Qiyuan Gong
15a6205790
Fix LoRA tokenizer for Llama and chatglm ( #11186 )
...
* Set pad_token to eos_token if it's None. Otherwise, use model config.
2024-06-03 15:35:38 +08:00
Cengguang Zhang
3eb13ccd8c
LLM: fix input length condition in deepspeed all-in-one benchmark. ( #11185 )
2024-06-03 10:05:43 +08:00
Shaojun Liu
401013a630
Remove chatglm_C Module to Eliminate LGPL Dependency ( #11178 )
...
* remove chatglm_C.**.pyd to solve ngsolve weak copyright vunl
* fix style check error
* remove chatglm native int4 from langchain
2024-05-31 17:03:11 +08:00
Ruonan Wang
50b5f4476f
update q4k convert ( #11179 )
2024-05-31 11:36:53 +08:00
Wang, Jian4
c0f1be6aea
Fix pp logic ( #11175 )
...
* only send no none batch and rank1-n sending first
* always send first
2024-05-30 16:40:59 +08:00
ZehuaCao
4127b99ed6
Fix null pointer dereferences error. ( #11125 )
...
* delete unused function on tgi_server
* update
* update
* fix style
2024-05-30 16:16:10 +08:00
Guancheng Fu
50ee004ac7
Fix vllm condition ( #11169 )
...
* add use-vllm
* done
* fix style
* fix done
2024-05-30 15:23:17 +08:00
Jin Qiao
dcbf4d3d0a
Add phi-3-vision example ( #11156 )
...
* Add phi-3-vision example (HF-Automodels)
* fix
* fix
* fix
* Add phi-3-vision CPU example (HF-Automodels)
* add in readme
* fix
* fix
* fix
* fix
* use fp8 for gpu example
* remove eval
2024-05-30 10:02:47 +08:00
Jiao Wang
93146b9433
Reconstruct Speculative Decoding example directory ( #11136 )
...
* update
* update
* update
2024-05-29 13:15:27 -07:00
Xiangyu Tian
2299698b45
Refine Pipeline Parallel FastAPI example ( #11168 )
2024-05-29 17:16:50 +08:00
Ruonan Wang
9bfbf78bf4
update api usage of xe_batch & fp16 ( #11164 )
...
* update api usage
* update setup.py
2024-05-29 15:15:14 +08:00
Yina Chen
e29e2f1c78
Support new fp8 e4m3 ( #11158 )
2024-05-29 14:27:14 +08:00
Wang, Jian4
8e25de1126
LLM: Add codegeex2 example ( #11143 )
...
* add codegeex example
* update
* update cpu
* add GPU
* add gpu
* update readme
2024-05-29 10:00:26 +08:00
ZehuaCao
751e1a4e29
Fix concurrent issue in autoTP streming. ( #11150 )
...
* add benchmark test
* update
2024-05-29 08:22:38 +08:00
Yishuo Wang
bc5008f0d5
disable sdp_causal in phi-3 to fix overflow ( #11157 )
2024-05-28 17:25:53 +08:00
SONG Ge
33852bd23e
Refactor pipeline parallel device config ( #11149 )
...
* refactor pipeline parallel device config
* meet comments
* update example
* add warnings and update code doc
2024-05-28 16:52:46 +08:00
hxsz1997
62b2d8af6b
Add lookahead in all-in-one ( #11142 )
...
* add lookahead in allinone
* delete save to csv in run_transformer_int4_gpu
* change lookup to lookahead
* fix the error of add model.peak_memory
* Set transformer_int4_gpu as the default option
* add comment of transformer_int4_fp16_lookahead_gpu
2024-05-28 15:39:58 +08:00
Xiangyu Tian
b44cf405e2
Refine Pipeline-Parallel-Fastapi example README ( #11155 )
2024-05-28 15:18:21 +08:00
Yishuo Wang
d307622797
fix first token sdp with batch ( #11153 )
2024-05-28 15:03:06 +08:00
Yina Chen
3464440839
fix qwen import error ( #11154 )
2024-05-28 14:50:12 +08:00
Jin Qiao
25b6402315
Add Windows GPU unit test ( #11050 )
...
* Add Windows GPU UT
* temporarily remove ut on arc
* retry
* retry
* retry
* fix
* retry
* retry
* fix
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* fix
* retry
* retry
* retry
* retry
* retry
* retry
* merge into single workflow
* retry inference test
* retry
* retrigger
* try to fix inference test
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* check lower_bound
* retry
* retry
* try example test
* try fix example test
* retry
* fix
* seperate function into shell script
* remove cygpath
* try remove all cygpath
* retry
* retry
* Revert "try remove all cygpath"
This reverts commit 7ceeff3e48f08429062ecef548c1a3ad3488756f.
* Revert "retry"
This reverts commit 40ea2457843bff6991b8db24316cde5de1d35418.
* Revert "retry"
This reverts commit 817d0db3e5aec3bd449d3deaf4fb01d3ecfdc8a3.
* enable ut
* fix
* retrigger
* retrigger
* update download url
* fix
* fix
* retry
* add comment
* fix
2024-05-28 13:29:47 +08:00
Yina Chen
b6b70d1ba0
Divide core-xe packages ( #11131 )
...
* temp
* add batch
* fix style
* update package name
* fix style
* add workflow
* use temp version to run uts
* trigger performance test
* trigger win igpu perf
* revert workflow & setup
2024-05-28 12:00:18 +08:00
binbin Deng
c9168b85b7
Fix error during merging adapter ( #11145 )
2024-05-27 19:41:42 +08:00
Guancheng Fu
daf7b1cd56
[Docker] Fix image using two cards error ( #11144 )
...
* fix all
* done
2024-05-27 16:20:13 +08:00
Xiangyu Tian
5c8ccf0ba9
LLM: Add Pipeline-Parallel-FastAPI example ( #10917 )
...
Add multi-stage Pipeline-Parallel-FastAPI example
---------
Co-authored-by: hzjane <a1015616934@qq.com>
2024-05-27 14:46:29 +08:00
Ruonan Wang
d550af957a
fix security issue of eagle ( #11140 )
...
* fix security issue of eagle
* small fix
2024-05-27 10:15:28 +08:00
binbin Deng
367de141f2
Fix mixtral-8x7b with transformers=4.37.0 ( #11132 )
2024-05-27 09:50:54 +08:00
Jean Yu
ab476c7fe2
Eagle Speculative Sampling examples ( #11104 )
...
* Eagle Speculative Sampling examples
* rm multi-gpu and ray content
* updated README to include Arc A770
2024-05-24 11:13:43 -07:00
Guancheng Fu
fabc395d0d
add langchain vllm interface ( #11121 )
...
* done
* fix
* fix
* add vllm
* add langchain vllm exampels
* add docs
* temp
2024-05-24 17:19:27 +08:00
ZehuaCao
63e95698eb
[LLM]Reopen autotp generate_stream ( #11120 )
...
* reopen autotp generate_stream
* fix style error
* update
2024-05-24 17:16:14 +08:00
Yishuo Wang
1dc680341b
fix phi-3-vision import ( #11129 )
2024-05-24 15:57:15 +08:00
Guancheng Fu
7f772c5a4f
Add half precision for fastchat models ( #11130 )
2024-05-24 15:41:14 +08:00
Zhao Changmin
65f4212f89
Fix qwen 14b run into register attention fwd ( #11128 )
...
* fix qwen 14b
2024-05-24 14:45:07 +08:00
Shaojun Liu
373f9e6c79
add ipex-llm-init.bat for Windows ( #11082 )
...
* add ipex-llm-init.bat for Windows
* update setup.py
2024-05-24 14:26:25 +08:00
Qiyuan Gong
120a0035ac
Fix type mismatch in eval for Baichuan2 QLora example ( #11117 )
...
* During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.
2024-05-24 14:14:30 +08:00
Yishuo Wang
1db9d9a63b
optimize internlm2 xcomposer agin ( #11124 )
2024-05-24 13:44:52 +08:00
Yishuo Wang
9372ce87ce
fix internlm xcomposer2 fp16 ( #11123 )
2024-05-24 11:03:31 +08:00
Cengguang Zhang
011b9faa5c
LLM: unify baichuan2-13b alibi mask dtype with model dtype. ( #11107 )
...
* LLM: unify alibi mask dtype.
* fix comments.
2024-05-24 10:27:53 +08:00
Jiao Wang
0a06a6e1d4
Update tests for transformers 4.36 ( #10858 )
...
* update unit test
* update
* update
* update
* update
* update
* fix gpu attention test
* update
* update
* update
* update
* update
* update
* update example test
* replace replit code
* update
* update
* update
* update
* set safe_serialization false
* perf test
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* delete
* update
* update
* update
* update
* update
* update
* revert
* update
2024-05-24 10:26:38 +08:00
Xiangyu Tian
b3f6faa038
LLM: Add CPU vLLM entrypoint ( #11083 )
...
Add CPU vLLM entrypoint and update CPU vLLM serving example.
2024-05-24 09:16:59 +08:00
Yishuo Wang
797dbc48b8
fix phi-2 and phi-3 convert ( #11116 )
2024-05-23 17:37:37 +08:00
Yishuo Wang
37b98a531f
support running internlm xcomposer2 on gpu and add sdp optimization ( #11115 )
2024-05-23 17:26:24 +08:00
Zhao Changmin
c5e8b90c8d
Add Qwen register attention implemention ( #11110 )
...
* qwen_register
2024-05-23 17:17:45 +08:00
Yishuo Wang
0e53f20edb
support running internlm-xcomposer2 on cpu ( #11111 )
2024-05-23 16:36:09 +08:00
Yuwen Hu
d36b41d59e
Add setuptools limitation for ipex-llm[xpu] ( #11102 )
...
* Add setuptool limitation for ipex-llm[xpu]
* llamaindex option update
2024-05-22 18:20:30 +08:00
Yishuo Wang
cd4dff09ee
support phi-3 vision ( #11101 )
2024-05-22 17:43:50 +08:00
Zhao Changmin
15d906a97b
Update linux igpu run script ( #11098 )
...
* update run script
2024-05-22 17:18:07 +08:00
Kai Huang
f63172ef63
Align ppl with llama.cpp ( #11055 )
...
* update script
* remove
* add header
* update readme
2024-05-22 16:43:11 +08:00
Qiyuan Gong
f6c9ffe4dc
Add WANDB_MODE and HF_HUB_OFFLINE to XPU finetune README ( #11097 )
...
* Add WANDB_MODE=offline to avoid multi-GPUs finetune errors.
* Add HF_HUB_OFFLINE=1 to avoid Hugging Face related errors.
2024-05-22 15:20:53 +08:00
Shaojun Liu
584439e498
update homepage url for ipex-llm ( #11094 )
...
* update homepage url
* Update python version to 3.11
* Update long description
2024-05-22 11:10:44 +08:00
Xin Qiu
71bcd18f44
fix qwen vl ( #11090 )
2024-05-21 18:40:29 +08:00
Yishuo Wang
f00625f9a4
refactor qwen2 ( #11087 )
2024-05-21 16:53:42 +08:00
Qiyuan Gong
492ed3fd41
Add verified models to GPU finetune README ( #11088 )
...
* Add verified models to GPU finetune README
2024-05-21 15:49:15 +08:00
Qiyuan Gong
1210491748
ChatGLM3, Baichuan2 and Qwen1.5 QLoRA example ( #11078 )
...
* Add chatglm3, qwen15-7b and baichuan-7b QLoRA alpaca example
* Remove unnecessary tokenization setting.
2024-05-21 15:29:43 +08:00
ZehuaCao
842d6dfc2d
Further Modify CPU example ( #11081 )
...
* modify CPU example
* update
2024-05-21 13:55:47 +08:00
Yishuo Wang
d830a63bb7
refactor qwen ( #11074 )
2024-05-20 18:08:37 +08:00
Wang, Jian4
74950a152a
Fix tgi_api_server error file name ( #11075 )
2024-05-20 16:48:40 +08:00
Yishuo Wang
4e97047d70
fix baichuan2 13b fp16 ( #11071 )
2024-05-20 11:21:20 +08:00
binbin Deng
7170dd9192
Update guide for running qwen with AutoTP ( #11065 )
2024-05-20 10:53:17 +08:00
Wang, Jian4
a2e1578fd9
Merge tgi_api_server to main ( #11036 )
...
* init
* fix style
* speculative can not use benchmark
* add tgi server readme
2024-05-20 09:15:03 +08:00
Yishuo Wang
31ce3e0c13
refactor baichuan2-13b ( #11064 )
2024-05-17 16:25:30 +08:00
ZehuaCao
56cb992497
LLM: Modify CPU Installation Command for most examples ( #11049 )
...
* init
* refine
* refine
* refine
* modify hf-agent example
* modify all CPU model example
* remove readthedoc modify
* replace powershell with cmd
* fix repo
* fix repo
* update
* remove comment on windows code block
* update
* update
* update
* update
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-05-17 15:52:20 +08:00
Ruonan Wang
f1156e6b20
support gguf_q4k_m / gguf_q4k_s ( #10887 )
...
* initial commit
* UPDATE
* fix style
* fix style
* add gguf_q4k_s
* update comment
* fix
2024-05-17 14:30:09 +08:00
Yishuo Wang
981d668be6
refactor baichuan2-7b ( #11062 )
2024-05-17 13:01:34 +08:00
Xiangyu Tian
d963e95363
LLM: Modify CPU Installation Command for documentation ( #11042 )
...
* init
* refine
* refine
* refine
* refine comments
2024-05-17 10:14:00 +08:00
Ruonan Wang
3a72e5df8c
disable mlp fusion of fp6 on mtl ( #11059 )
2024-05-17 10:10:16 +08:00
SONG Ge
192ae35012
Add support for llama2 quantize_kv with transformers 4.38.0 ( #11054 )
...
* add support for llama2 quantize_kv with transformers 4.38.0
* fix code style
* fix code style
2024-05-16 22:23:39 +08:00
SONG Ge
16b2a418be
hotfix native_sdp ut ( #11046 )
...
* hotfix native_sdp
* update
2024-05-16 17:15:37 +08:00
Xin Qiu
6be70283b7
fix chatglm run error ( #11045 )
...
* fix chatglm
* update
* fix style
2024-05-16 15:39:18 +08:00
Yishuo Wang
8cae897643
use new rope in phi3 ( #11047 )
2024-05-16 15:12:35 +08:00
Jin Qiao
9a96af4232
Remove oneAPI pip install command in related examples ( #11030 )
...
* Remove pip install command in windows installation guide
* fix chatglm3 installation guide
* Fix gemma cpu example
* Apply on other examples
* fix
2024-05-16 10:46:29 +08:00
Xiangyu Tian
612a365479
LLM: Install CPU version torch with extras [all] ( #10868 )
...
Modify setup.py to install CPU version torch with extras [all]
2024-05-16 10:39:55 +08:00
Yishuo Wang
59df750326
Use new sdp again ( #11025 )
2024-05-16 09:33:34 +08:00
SONG Ge
9942a4ba69
[WIP] Support llama2 with transformers==4.38.0 ( #11024 )
...
* support llama2 with transformers==4.38.0
* add supprot for quantize_qkv
* add original support for 4.38.0 now
* code style fix
2024-05-15 18:07:00 +08:00
Yina Chen
686f6038a8
Support fp6 save & load ( #11034 )
2024-05-15 17:52:02 +08:00
Ruonan Wang
ac384e0f45
add fp6 mlp fusion ( #11032 )
...
* add fp6 fusion
* add qkv fusion for fp6
* remove qkv first
2024-05-15 17:42:50 +08:00
Wang, Jian4
2084ebe4ee
Enable fastchat benchmark latency ( #11017 )
...
* enable fastchat benchmark
* add readme
* update readme
* update
2024-05-15 14:52:09 +08:00
hxsz1997
93d40ab127
Update lookahead strategy ( #11021 )
...
* update lookahead strategy
* remove lines
* fix python style check
2024-05-15 14:48:05 +08:00
Wang, Jian4
d9f71f1f53
Update benchmark util for example using ( #11027 )
...
* mv benchmark_util.py to utils/
* remove
* update
2024-05-15 14:16:35 +08:00
binbin Deng
4053a6ef94
Update environment variable setting in AutoTP with arc ( #11018 )
2024-05-15 10:23:58 +08:00
Yishuo Wang
fad1dbaf60
use sdp fp8 causal kernel ( #11023 )
2024-05-15 10:22:35 +08:00
Yishuo Wang
ee325e9cc9
fix phi3 ( #11022 )
2024-05-15 09:32:12 +08:00
Ziteng Zhang
7d3791c819
[LLM] Add llama3 alpaca qlora example ( #11011 )
...
* Add llama3 finetune example based on alpaca qlora example
2024-05-15 09:17:32 +08:00
Zhao Changmin
0a732bebe7
Add phi3 cached RotaryEmbedding ( #11013 )
...
* phi3cachedrotaryembed
* pep8
2024-05-15 08:16:43 +08:00
Yina Chen
893197434d
Add fp6 support on gpu ( #11008 )
...
* add fp6 support
* fix style
2024-05-14 16:31:44 +08:00
Zhao Changmin
b03c859278
Add phi3RMS ( #10988 )
...
* phi3RMS
2024-05-14 15:16:27 +08:00
Yishuo Wang
170e3d65e0
use new sdp and fp32 sdp ( #11007 )
2024-05-14 14:29:18 +08:00
Qiyuan Gong
c957ea3831
Add axolotl main support and axolotl Llama-3-8B QLoRA example ( #10984 )
...
* Support axolotl main (796a085).
* Add axolotl Llama-3-8B QLoRA example.
* Change `sequence_len` to 256 for alpaca, and revert `lora_r` value.
* Add example to quick_start.
2024-05-14 13:43:59 +08:00
Yuwen Hu
fb656fbf74
Add requirements for oneAPI pypi packages for windows Intel GPU users ( #11009 )
2024-05-14 13:40:54 +08:00
Shaojun Liu
7f8c5b410b
Quickstart: Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL) ( #10970 )
...
* add entrypoint.sh
* add quickstart
* remove entrypoint
* update
* Install related library of benchmarking
* update
* print out results
* update docs
* minor update
* update
* update quickstart
* update
* update
* update
* update
* update
* update
* add chat & example section
* add more details
* minor update
* rename quickstart
* update
* minor update
* update
* update config.yaml
* update readme
* use --gpu
* add tips
* minor update
* update
2024-05-14 12:58:31 +08:00
Guancheng Fu
a465111cf4
Update README.md ( #11003 )
2024-05-13 16:44:48 +08:00
Guancheng Fu
74997a3ed1
Adding load_low_bit interface for ipex_llm_worker ( #11000 )
...
* initial implementation, need tests
* fix
* fix baichuan issue
* fix typo
2024-05-13 15:30:19 +08:00
Yishuo Wang
1b3c7a6928
remove phi3 empty cache ( #10997 )
2024-05-13 14:09:55 +08:00
ZehuaCao
99255fe36e
fix ppl ( #10996 )
2024-05-13 13:57:19 +08:00
Kai Huang
f8dd2e52ad
Fix Langchain upstream ut ( #10985 )
...
* Fix Langchain upstream ut
* Small fix
* Install bigdl-llm
* Update run-langchain-upstream-tests.sh
* Update run-langchain-upstream-tests.sh
* Update llm_unit_tests.yml
* Update run-langchain-upstream-tests.sh
* Update llm_unit_tests.yml
* Update run-langchain-upstream-tests.sh
* fix git checkout
* fix
---------
Co-authored-by: Zhangky11 <2321096202@qq.com>
Co-authored-by: Keyan (Kyrie) Zhang <79576162+Zhangky11@users.noreply.github.com>
2024-05-11 14:40:37 +08:00
Yuwen Hu
9f6358e4c2
Deprecate support for pytorch 2.0 on Linux for ipex-llm >= 2.1.0b20240511 ( #10986 )
...
* Remove xpu_2.0 option in setup.py
* Disable xpu_2.0 test in UT and nightly
* Update docs for deprecated pytorch 2.0
* Small doc update
2024-05-11 12:33:35 +08:00
Yishuo Wang
ad96f32ce0
optimize phi3 1st token performance ( #10981 )
2024-05-10 17:33:46 +08:00
Cengguang Zhang
cfed76b2ed
LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. ( #10937 )
...
* LLM: add split tensor support for baichuan2-7b and qwen1.5-7b.
* fix style.
* fix style.
* fix style.
* add support for mistral and fix condition threshold.
* fix style.
* fix comments.
2024-05-10 16:40:15 +08:00
binbin Deng
f9615f12d1
Add driver related packages version check in env script ( #10977 )
2024-05-10 15:02:58 +08:00
Kai Huang
a6342cc068
Empty cache after phi first attention to support 4k input ( #10972 )
...
* empty cache
* fix style
2024-05-09 19:50:04 +08:00
Yishuo Wang
e753125880
use fp16_sdp when head_dim=96 ( #10976 )
2024-05-09 17:02:59 +08:00
Yishuo Wang
697ca79eca
use quantize kv and sdp in phi3-mini ( #10973 )
2024-05-09 15:16:18 +08:00
Wang, Jian4
f4c615b1ee
Add cohere example ( #10954 )
...
* add link first
* add_cpu_example
* add GPU example
2024-05-08 17:19:59 +08:00
Wang, Jian4
3209d6b057
Fix spculative llama3 no stop error ( #10963 )
...
* fix normal
* add eos_tokens_id on sp and add list if
* update
* no none
2024-05-08 17:09:47 +08:00
Xiangyu Tian
02870dc385
LLM: Refine README of AutoTP-FastAPI example ( #10960 )
2024-05-08 16:55:23 +08:00
Yishuo Wang
2ebec0395c
optimize phi-3-mini-128 ( #10959 )
2024-05-08 16:33:17 +08:00
Xin Qiu
dfa3147278
update ( #10944 )
2024-05-08 14:28:05 +08:00
Xin Qiu
5973d6c753
make gemma's output better ( #10943 )
2024-05-08 14:27:51 +08:00
Jin Qiao
15ee3fd542
Update igpu perf internlm ( #10958 )
2024-05-08 14:16:43 +08:00
Zhao Changmin
0d6e12036f
Disable fast_init_ in load_low_bit ( #10945 )
...
* fast_init_ disable
2024-05-08 10:46:19 +08:00
Qiyuan Gong
164e6957af
Refine axolotl quickstart ( #10957 )
...
* Add default accelerate config for axolotl quickstart.
* Fix requirement link.
* Upgrade peft to 0.10.0 in requirement.
2024-05-08 09:34:02 +08:00
Yishuo Wang
c801c37bc6
optimize phi3 again: use quantize kv if possible ( #10953 )
2024-05-07 17:26:19 +08:00
Yishuo Wang
aa2fa9fde1
optimize phi3 again: use sdp if possible ( #10951 )
2024-05-07 15:53:08 +08:00
Qiyuan Gong
c11170b96f
Upgrade Peft to 0.10.0 in finetune examples and docker ( #10930 )
...
* Upgrade Peft to 0.10.0 in finetune examples.
* Upgrade Peft to 0.10.0 in docker.
2024-05-07 15:12:26 +08:00
Qiyuan Gong
d7ca5d935b
Upgrade Peft version to 0.10.0 for LLM finetune ( #10886 )
...
* Upgrade Peft version to 0.10.0
* Upgrade Peft version in ARC unit test and HF-Peft example.
2024-05-07 15:09:14 +08:00
Yuwen Hu
0efe26c3b6
Change order of chatglm2-6b and chatglm3-6b in iGPU perf test for more stable performance ( #10948 )
2024-05-07 13:48:39 +08:00
hxsz1997
245c7348bc
Add codegemma example ( #10884 )
...
* add codegemma example in GPU/HF-Transformers-AutoModels/
* add README of codegemma example in GPU/HF-Transformers-AutoModels/
* add codegemma example in GPU/PyTorch-Models/
* add readme of codegemma example in GPU/PyTorch-Models/
* add codegemma example in CPU/HF-Transformers-AutoModels/
* add readme of codegemma example in CPU/HF-Transformers-AutoModels/
* add codegemma example in CPU/PyTorch-Models/
* add readme of codegemma example in CPU/PyTorch-Models/
* fix typos
* fix filename typo
* add codegemma in tables
* add comments of lm_head
* remove comments of use_cache
2024-05-07 13:35:42 +08:00
Shaojun Liu
08ad40b251
improve ipex-llm-init for Linux ( #10928 )
...
* refine ipex-llm-init
* install libtcmalloc.so for Max
* update based on comment
* remove unneeded code
2024-05-07 12:55:14 +08:00
Wang, Jian4
191b184341
LLM: Optimize cohere model ( #10878 )
...
* use mlp and rms
* optimize kv_cache
* add fuse qkv
* add flash attention and fp16 sdp
* error fp8 sdp
* fix optimized
* fix style
* update
* add for pp
2024-05-07 10:19:50 +08:00
Xiangyu Tian
13a44cdacb
LLM: Refine Deepspped-AutoTP-FastAPI example ( #10916 )
2024-05-07 09:37:31 +08:00
Wang, Jian4
1de878bee1
LLM: Fix speculative llama3 long input error ( #10934 )
2024-05-07 09:25:20 +08:00
Guancheng Fu
49ab5a2b0e
Add embeddings ( #10931 )
2024-05-07 09:07:02 +08:00
Wang, Jian4
0e0bd309e2
LLM: Enable Speculative on Fastchat ( #10909 )
...
* init
* enable streamer
* update
* update
* remove deprecated
* update
* update
* add gpu example
2024-05-06 10:06:20 +08:00
Cengguang Zhang
0edef1f94c
LLM: add min_new_tokens to all in one benchmark. ( #10911 )
2024-05-06 09:32:59 +08:00
Cengguang Zhang
75dbf240ec
LLM: update split tensor conditions. ( #10872 )
...
* LLM: update split tensor condition.
* add cond for split tensor.
* update priority of env.
* fix style.
* update env name.
2024-04-30 17:07:21 +08:00
Guancheng Fu
2c64754eb0
Add vLLM to ipex-llm serving image ( #10807 )
...
* add vllm
* done
* doc work
* fix done
* temp
* add docs
* format
* add start-fastchat-service.sh
* fix
2024-04-29 17:25:42 +08:00
Jin Qiao
1f876fd837
Add example for phi-3 ( #10881 )
...
* Add example for phi-3
* add in readme and index
* fix
* fix
* fix
* fix indent
* fix
2024-04-29 16:43:55 +08:00
Yishuo Wang
d884c62dc4
remove new_layout parameter ( #10906 )
2024-04-29 10:31:50 +08:00
Guancheng Fu
fbcd7bc737
Fix Loader issue with dtype fp16 ( #10907 )
2024-04-29 10:16:02 +08:00
Guancheng Fu
c9fac8c26b
Fix sdp logic ( #10896 )
...
* fix
* fix
2024-04-28 22:02:14 +08:00
Yina Chen
015d07a58f
Fix lookahead sample error & add update strategy ( #10894 )
...
* Fix sample error & add update strategy
* add mtl config
* fix style
* remove print
2024-04-28 17:21:00 +08:00
Yuwen Hu
1a8a93d5e0
Further fix nightly perf ( #10901 )
2024-04-28 10:18:58 +08:00
Yuwen Hu
ddfdaec137
Fix nightly perf ( #10899 )
...
* Fix nightly perf by adding default value in benchmark for use_fp16_torch_dtype
* further fixes
2024-04-28 09:39:29 +08:00
Cengguang Zhang
9752ffe979
LLM: update split qkv native sdp. ( #10895 )
...
* LLM: update split qkv native sdp.
* fix typo.
2024-04-26 18:47:35 +08:00
Guancheng Fu
990535b1cf
Add tensor parallel for vLLM ( #10879 )
...
* initial
* test initial tp
* initial sup
* fix format
* fix
* fix
2024-04-26 17:10:49 +08:00
binbin Deng
f51bf018eb
Add benchmark script for pipeline parallel inference ( #10873 )
2024-04-26 15:28:11 +08:00
Yishuo Wang
46ba962168
use new quantize kv ( #10888 )
2024-04-26 14:42:17 +08:00
Xiangyu Tian
3d4950b0f0
LLM: Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example ( #10876 )
...
Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example.
2024-04-26 13:24:28 +08:00
Wang, Jian4
3e8ed54270
LLM: Fix bigdl_ipex_int8 warning ( #10890 )
2024-04-26 11:18:44 +08:00
Jin Qiao
fb3c268d13
Add phi-3 to perf ( #10883 )
2024-04-25 20:21:56 +08:00
Yina Chen
8811f268ff
Use new fp16 sdp in Qwen and modify the constraint ( #10882 )
2024-04-25 19:23:37 +08:00
Yuxuan Xia
0213c1c1da
Add phi3 to the nightly test ( #10885 )
...
* Add llama3 and phi2 nightly test
* Change llama3-8b to llama3-8b-instruct
* Add phi3 to nightly test
* Add phi3 to nightly test
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-04-25 17:39:12 +08:00
Yuxuan Xia
ca2479be87
Update scripts readme ( #10725 )
...
* Update scripts readme
* Update scripts readme
* Update README
* Update readme
* Update readme
* Update windows env check readme
* Adjust env check readme
* Update windows env check
* Update env check readme
* Adjust the env-check README
* Modify the env-check README
2024-04-25 17:24:37 +08:00
Cengguang Zhang
cd369c2715
LLM: add device id to benchmark utils. ( #10877 )
2024-04-25 14:01:51 +08:00
Yang Wang
1ce8d7bcd9
Support the desc_act feature in GPTQ model ( #10851 )
...
* support act_order
* update versions
* fix style
* fix bug
* clean up
2024-04-24 10:17:13 -07:00
Yina Chen
dc27b3bc35
Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) ( #10790 )
...
* update sdp condition
* update
* fix
* update & test llama
* mistral
* fix style
* update
* fix style
* remove pvc constrain
* update ds on arc
* fix style
2024-04-24 17:24:01 +08:00
Yuxuan Xia
844e18b1db
Add llama3 and phi2 nightly test ( #10874 )
...
* Add llama3 and phi2 nightly test
* Change llama3-8b to llama3-8b-instruct
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-04-24 16:58:56 +08:00
binbin Deng
c9feffff9a
LLM: support Qwen1.5-MoE-A2.7B-Chat pipeline parallel inference ( #10864 )
2024-04-24 16:02:27 +08:00
Yishuo Wang
2d210817ff
add phi3 optimization ( #10871 )
2024-04-24 15:17:40 +08:00
Cengguang Zhang
eb39c61607
LLM: add min new token to perf test. ( #10869 )
2024-04-24 14:32:02 +08:00
Yuwen Hu
fb2a160af3
Add phi-2 to 2048-256 test for fixes ( #10867 )
2024-04-24 10:00:25 +08:00
binbin Deng
fabf54e052
LLM: make pipeline parallel inference example more common ( #10786 )
2024-04-24 09:28:52 +08:00
hxsz1997
328b1a1de9
Fix the not stop issue of llama3 examples ( #10860 )
...
* fix not stop issue in GPU/HF-Transformers-AutoModels
* fix not stop issue in GPU/PyTorch-Models/Model/llama3
* fix not stop issue in CPU/HF-Transformers-AutoModels/Model/llama3
* fix not stop issue in CPU/PyTorch-Models/Model/llama3
* update the output in readme
* update format
* add reference
* update prompt format
* update output format in readme
* update example output in readme
2024-04-23 19:10:09 +08:00
Yuwen Hu
5c9eb5d0f5
Support llama-index install option for upstreaming purposes ( #10866 )
...
* Support llama-index install option for upstreaming purposes
* Small fix
* Small fix
2024-04-23 19:08:29 +08:00
Yuwen Hu
21bb8bd164
Add phi-2 to igpu performance test ( #10865 )
2024-04-23 18:13:14 +08:00
ZehuaCao
36eb8b2e96
Add llama3 speculative example ( #10856 )
...
* Initial llama3 speculative example
* update README
* update README
* update README
2024-04-23 17:03:54 +08:00
Cengguang Zhang
763413b7e1
LLM: support llama split tensor for long context in transformers>=4.36. ( #10844 )
...
* LLm: support llama split tensor for long context in transformers>=4.36.
* fix dtype.
* fix style.
* fix style.
* fix style.
* fix style.
* fix dtype.
* fix style.
2024-04-23 16:13:25 +08:00
ZehuaCao
92ea54b512
Fix speculative decoding bug ( #10855 )
2024-04-23 14:28:31 +08:00
yb-peng
c9dee6cd0e
Update 8192.txt ( #10824 )
...
* Update 8192.txt
* Update 8192.txt with original text
2024-04-23 14:02:09 +08:00
Wang, Jian4
18c032652d
LLM: Add mixtral speculative CPU example ( #10830 )
...
* init mixtral sp example
* use different prompt_format
* update output
* update
2024-04-23 10:05:51 +08:00
Qiyuan Gong
5494aa55f6
Downgrade datasets in axolotl example ( #10849 )
...
* Downgrade datasets to 2.15.0 to address axolotl prepare issue https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544
Tks to @kwaa for providing the solution in https://github.com/intel-analytics/ipex-llm/issues/10821#issuecomment-2068861571
2024-04-23 09:41:58 +08:00
Yishuo Wang
fe5a082b84
add phi-2 optimization ( #10843 )
2024-04-22 18:56:47 +08:00
Guancheng Fu
47bd5f504c
[vLLM]Remove vllm-v1, refactor v2 ( #10842 )
...
* remove vllm-v1
* fix format
2024-04-22 17:51:32 +08:00
Wang, Jian4
23c6a52fb0
LLM: Fix ipex torchscript=True error ( #10832 )
...
* remove
* update
* remove torchscript
2024-04-22 15:53:09 +08:00
Heyang Sun
fc33aa3721
fix missing import ( #10839 )
2024-04-22 14:34:52 +08:00
Yina Chen
3daad242b8
Fix No module named 'transformers.cache_utils' with transformers < 4.36 ( #10835 )
...
* update sdp condition
* update
* fix
* fix 431 error
* revert sdp & style fix
* fix
* meet comments
2024-04-22 14:05:50 +08:00
Guancheng Fu
ae3b577537
Update README.md ( #10833 )
2024-04-22 11:07:10 +08:00
Wang, Jian4
5f95054f97
LLM:Add qwen moe example libs md ( #10828 )
2024-04-22 10:03:19 +08:00
Guancheng Fu
61c67af386
Fix vLLM-v2 install instructions( #10822 )
2024-04-22 09:02:48 +08:00
Guancheng Fu
caf75beef8
Disable sdpa ( #10814 )
2024-04-19 17:33:18 +08:00
Yishuo Wang
57edf2033c
fix lookahead with transformers >= 4.36 ( #10808 )
2024-04-19 16:24:56 +08:00
Ovo233
1a885020ee
Updated importing of top_k_top_p_filtering for transformers>=4.39.0 ( #10794 )
...
* In transformers>=4.39.0, the top_k_top_p_filtering function has been deprecated and moved to the hugging face package trl. Thus, for versions >= 4.39.0, import this function from trl.
2024-04-19 15:34:39 +08:00
Yuwen Hu
07e8b045a9
Add Meta-llama-3-8B-Instruct and Yi-6B-Chat to igpu nightly perf ( #10810 )
2024-04-19 15:09:58 +08:00
Yishuo Wang
08458b4f74
remove rms norm copy ( #10793 )
2024-04-19 13:57:48 +08:00
Yang Wang
8153c3008e
Initial llama3 example ( #10799 )
...
* Add initial hf huggingface GPU example
* Small fix
* Add llama3 gpu pytorch model example
* Add llama 3 hf transformers CPU example
* Add llama 3 pytorch model CPU example
* Fixes
* Small fix
* Small fixes
* Small fix
* Small fix
* Add links
* update repo id
* change prompt tuning url
* remove system header if there is no system prompt
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com>
2024-04-18 11:01:33 -07:00
Ruonan Wang
754b0ffecf
Fix pvc llama ( #10798 )
...
* ifx
* update
2024-04-18 10:44:57 -07:00
Ruonan Wang
439c834ed3
LLM: add mixed precision for lm_head ( #10795 )
...
* add mixed_quantization
* meet code review
* update
* fix style
* meet review
2024-04-18 19:11:31 +08:00
Yina Chen
8796401b08
Support q4k in ipex-llm ( #10796 )
...
* support q4k
* update
2024-04-18 18:55:28 +08:00
Ruonan Wang
0e8aac19e3
add q6k precision in ipex-llm ( #10792 )
...
* add q6k
* add initial 16k
* update
* fix style
2024-04-18 16:52:09 +08:00
Qiyuan Gong
e90e31719f
axolotl lora example ( #10789 )
...
* Add axolotl lora example
* Modify readme
* Add comments in yml
2024-04-18 16:38:32 +08:00
Wang, Jian4
14ca42a048
LLM:Fix moe indexs error on cpu ( #10791 )
2024-04-18 15:56:52 +08:00
Guancheng Fu
cbe7b5753f
Add vLLM[xpu] related code ( #10779 )
...
* Add ipex-llm side change
* add runable offline_inference
* refactor to call vllm2
* Verified async server
* add new v2 example
* add README
* fix
* change dir
* refactor readme.md
* add experimental
* fix
2024-04-18 15:29:20 +08:00
Kai Huang
053ec30737
Transformers ppl evaluation on wikitext ( #10784 )
...
* tranformers code
* cache
2024-04-18 15:27:18 +08:00
Wang, Jian4
209c3501e6
LLM: Optimize qwen1.5 moe model ( #10706 )
...
* update moe block
* fix style
* enable optmize MLP
* enabel kv_cache
* enable fuse rope
* enable fused qkv
* enable flash_attention
* error sdp quantize
* use old api
* use fuse
* use xetla
* fix python style
* update moe_blocks num
* fix output error
* add cpu sdpa
* update
* update
* update
2024-04-18 14:54:05 +08:00
Ziteng Zhang
ff040c8f01
LISA Finetuning Example ( #10743 )
...
* enabling xetla only supports qtype=SYM_INT4 or FP8E5
* LISA Finetuning Example on gpu
* update readme
* add licence
* Explain parameters of lisa & Move backend codes to src dir
* fix style
* fix style
* update readme
* support chatglm
* fix style
* fix style
* update readme
* fix
2024-04-18 13:48:10 +08:00
Heyang Sun
581ebf6104
GaLore Finetuning Example ( #10722 )
...
* GaLore Finetuning Example
* Update README.md
* Update README.md
* change data to HuggingFaceH4/helpful_instructions
* Update README.md
* Update README.md
* shrink train size and delete cache before starting training to save memory
* Update README.md
* Update galore_finetuning.py
* change model to llama2 3b
* Update README.md
2024-04-18 13:47:41 +08:00
Yang Wang
952e517db9
use config rope_theta ( #10787 )
...
* use config rope_theta
* fix style
2024-04-17 20:39:11 -07:00
Guancheng Fu
31ea2f9a9f
Fix wrong output for Llama models on CPU ( #10742 )
2024-04-18 11:07:27 +08:00
Xin Qiu
e764f9b1b1
Disable fast fused rope on UHD ( #10780 )
...
* use decoding fast path
* update
* update
* cleanup
2024-04-18 10:03:53 +08:00
Yina Chen
ea5b373a97
Add lookahead GPU example ( #10785 )
...
* Add lookahead example
* fix style & attn mask
* fix typo
* address comments
2024-04-17 17:41:55 +08:00
Wang, Jian4
a20271ffe4
LLM: Fix yi-6b fp16 error on pvc ( #10781 )
...
* updat for yi fp16
* update
* update
2024-04-17 16:49:59 +08:00
ZehuaCao
0646e2c062
Fix short prompt for IPEX_CPU speculative decoding cause no_attr error ( #10783 )
2024-04-17 16:19:57 +08:00
Cengguang Zhang
7ec82c6042
LLM: add README.md for Long-Context examples. ( #10765 )
...
* LLM: add readme to long-context examples.
* add precision.
* update wording.
* add GPU type.
* add Long-Context example to GPU examples.
* fix comments.
* update max input length.
* update max length.
* add output length.
* fix wording.
2024-04-17 15:34:59 +08:00
Yina Chen
766fe45222
Fix spec error caused by lookup pr ( #10777 )
...
* Fix spec error
* remove
* fix style
2024-04-17 11:27:35 +08:00
Qiyuan Gong
9e5069437f
Fix gradio version in axolotl example ( #10776 )
...
* Change to gradio>=4.19.2
2024-04-17 10:23:43 +08:00
Qiyuan Gong
f2e923b3ca
Axolotl v0.4.0 support ( #10773 )
...
* Add Axolotl 0.4.0, remove legacy 0.3.0 support.
* replace is_torch_bf16_gpu_available
* Add HF_HUB_OFFLINE=1
* Move transformers out of requirement
* Refine readme and qlora.yml
2024-04-17 09:49:11 +08:00
Heyang Sun
26cae0a39c
Update FLEX in Deepspeed README ( #10774 )
...
* Update FLEX in Deepspeed README
* Update README.md
2024-04-17 09:28:24 +08:00
Wenjing Margaret Mao
c41730e024
edit 'ppl_result does not exist' issue, delete useless code ( #10767 )
...
* edit ppl_result not exist issue, delete useless code
* delete nonzero_min function
---------
Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-04-16 18:11:56 +08:00
Yina Chen
899d392e2f
Support prompt lookup in ipex-llm ( #10768 )
...
* lookup init
* add lookup
* fix style
* remove redundant code
* change param name
* fix style
2024-04-16 16:52:38 +08:00
Qiyuan Gong
d30b22a81b
Refine axolotl 0.3.0 documents and links ( #10764 )
...
* Refine axolotl 0.3 based on comments
* Rename requirements to requirement-xpu
* Add comments for paged_adamw_32bit
* change lora_r from 8 to 16
2024-04-16 14:47:45 +08:00
ZehuaCao
599a88db53
Add deepsped-autoTP-Fastapi serving ( #10748 )
...
* add deepsped-autoTP-Fastapi serving
* add readme
* add license
* update
* update
* fix
2024-04-16 14:03:23 +08:00
binbin Deng
0a62933d36
LLM: fix qwen AutoTP ( #10766 )
2024-04-16 09:56:17 +08:00
Cengguang Zhang
3e2662c87e
LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. ( #10771 )
2024-04-16 09:32:30 +08:00
Jin Qiao
73a67804a4
GPU configuration update for examples (windows pip installer, etc.) ( #10762 )
...
* renew chatglm3-6b gpu example readme
fix
fix
fix
* fix for comments
* fix
* fix
* fix
* fix
* fix
* apply on HF-Transformers-AutoModels
* apply on PyTorch-Models
* fix
* fix
2024-04-15 17:42:52 +08:00
yb-peng
b5209d3ec1
Update example/GPU/PyTorch-Models/Model/llava/README.md ( #10757 )
...
* Update example/GPU/PyTorch-Models/Model/llava/README.md
* Update README.md
fix path in windows installation
2024-04-15 13:01:37 +08:00
binbin Deng
3d561b60ac
LLM: add enable_xetla parameter for optimize_model API ( #10753 )
2024-04-15 12:18:25 +08:00
Jiao Wang
a9a6b6b7af
Fix baichuan-13b issue on portable zip under transformers 4.36 ( #10746 )
...
* fix baichuan-13b issue
* update
* update
2024-04-12 16:27:01 -07:00
Jiao Wang
9e668a5bf0
fix_internlm-chat-7b-8k repo name in examples ( #10747 )
2024-04-12 10:15:48 -07:00
binbin Deng
c3fc8f4b90
LLM: add bs limitation for llama softmax upcast to fp32 ( #10752 )
2024-04-12 15:40:25 +08:00
hxsz1997
0d518aab8d
Merge pull request #10697 from MargarettMao/ceval
...
combine english and chinese, remove nan
2024-04-12 14:37:47 +08:00
jenniew
dd0d2df5af
Change fp16.csv mistral-7b-v0.1 into Mistral-7B-v0.1
2024-04-12 14:28:46 +08:00
jenniew
7309f1ddf9
Mofidy Typos
2024-04-12 14:23:13 +08:00
jenniew
cb594e1fc5
Mofidy Typos
2024-04-12 14:22:09 +08:00
jenniew
382c18e600
Mofidy Typos
2024-04-12 14:15:48 +08:00
jenniew
1a360823ce
Mofidy Typos
2024-04-12 14:13:21 +08:00
jenniew
cdbb1de972
Mark Color Modification
2024-04-12 14:00:50 +08:00
jenniew
9bbfcaf736
Mark Color Modification
2024-04-12 13:30:16 +08:00
jenniew
bb34c6e325
Mark Color Modification
2024-04-12 13:26:36 +08:00
Yishuo Wang
8086554d33
use new fp16 sdp in llama and mistral ( #10734 )
2024-04-12 10:49:02 +08:00
Yang Wang
019293e1b9
Fuse MOE indexes computation ( #10716 )
...
* try moe
* use c++ cpu to compute indexes
* fix style
2024-04-11 10:12:55 -07:00
jenniew
b151a9b672
edit csv_to_html to combine en & zh
2024-04-11 17:35:36 +08:00
binbin Deng
70ed9397f9
LLM: fix AttributeError of FP16Linear ( #10740 )
2024-04-11 17:03:56 +08:00
Keyan (Kyrie) Zhang
1256a2cc4e
Add chatglm3 long input example ( #10739 )
...
* Add long context input example for chatglm3
* Small fix
* Small fix
* Small fix
2024-04-11 16:33:43 +08:00
hxsz1997
fd473ddb1b
Merge pull request #10730 from MargarettMao/MargarettMao-parent_folder
...
Edit ppl update_HTML_parent_folder
2024-04-11 15:45:24 +08:00
Qiyuan Gong
2d64630757
Remove transformers version in axolotl example ( #10736 )
...
* Remove transformers version in axolotl requirements.txt
2024-04-11 14:02:31 +08:00
yb-peng
2685c41318
Modify all-in-one benchmark ( #10726 )
...
* Update 8192 prompt in all-in-one
* Add cpu_embedding param for linux api
* Update run.py
* Update README.md
2024-04-11 13:38:50 +08:00
Xiangyu Tian
301504aa8d
Fix transformers version warning ( #10732 )
2024-04-11 13:12:49 +08:00
Wenjing Margaret Mao
9bec233e4d
Delete python/llm/test/benchmark/perplexity/update_html_in_parent_folder.py
...
Delete due to repetition
2024-04-11 07:21:12 +08:00
Cengguang Zhang
4b024b7aac
LLM: optimize chatglm2 8k input. ( #10723 )
...
* LLM: optimize chatglm2 8k input.
* rename.
2024-04-10 16:59:06 +08:00
Yuxuan Xia
cd22cb8257
Update Env check Script ( #10709 )
...
* Update env check bash file
* Update env-check
2024-04-10 15:06:00 +08:00
Shaojun Liu
29bf28bd6f
Upgrade python to 3.11 in Docker Image ( #10718 )
...
* install python 3.11 for cpu-inference docker image
* update xpu-inference dockerfile
* update cpu-serving image
* update qlora image
* update lora image
* update document
2024-04-10 14:41:27 +08:00
Qiyuan Gong
b727767f00
Add axolotl v0.3.0 with ipex-llm on Intel GPU ( #10717 )
...
* Add axolotl v0.3.0 support on Intel GPU.
* Add finetune example on llama-2-7B with Alpaca dataset.
2024-04-10 14:38:29 +08:00
Wang, Jian4
c9e6d42ad1
LLM: Fix chatglm3-6b-32k error ( #10719 )
...
* fix chatglm3-6b-32k
* update style
2024-04-10 11:24:06 +08:00
Keyan (Kyrie) Zhang
585c174e92
Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables ( #10707 )
...
* Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables.
* Fix style
2024-04-10 10:48:46 +08:00
Jiao Wang
d1eaea509f
update chatglm readme ( #10659 )
2024-04-09 14:24:46 -07:00
Jiao Wang
878a97077b
Fix llava example to support transformerds 4.36 ( #10614 )
...
* fix llava example
* update
2024-04-09 13:47:07 -07:00
Jiao Wang
1e817926ba
Fix low memory generation example issue in transformers 4.36 ( #10702 )
...
* update cache in low memory generate
* update
2024-04-09 09:56:52 -07:00
Yuwen Hu
97db2492c8
Update setup.py for bigdl-core-xe-esimd-21 on Windows ( #10705 )
...
* Support bigdl-core-xe-esimd-21 for windows in setup.py
* Update setup-llm-env accordingly
2024-04-09 18:21:21 +08:00
Zhicun
b4147a97bb
Fix dtype mismatch error ( #10609 )
...
* fix llama
* fix
* fix code style
* add torch type in model.py
---------
Co-authored-by: arda <arda@arda-arc19.sh.intel.com>
2024-04-09 17:50:33 +08:00
Shaojun Liu
f37a1f2a81
Upgrade to python 3.11 ( #10711 )
...
* create conda env with python 3.11
* recommend to use Python 3.11
* update
2024-04-09 17:41:17 +08:00
Yishuo Wang
8f45e22072
fix llama2 ( #10710 )
2024-04-09 17:28:37 +08:00
Yishuo Wang
e438f941f2
disable rwkv5 fp16 ( #10699 )
2024-04-09 16:42:11 +08:00
Cengguang Zhang
6a32216269
LLM: add llama2 8k input example. ( #10696 )
...
* LLM: add llama2-32K example.
* refactor name.
* fix comments.
* add IPEX_LLM_LOW_MEM notes and update sample output.
2024-04-09 16:02:37 +08:00
Wenjing Margaret Mao
289cc99cd6
Update README.md ( #10700 )
...
Edit "summarize the results"
2024-04-09 16:01:12 +08:00
Wenjing Margaret Mao
d3116de0db
Update README.md ( #10701 )
...
edit "summarize the results"
2024-04-09 15:50:25 +08:00
Chen, Zhentao
d59e0cce5c
Migrate harness to ipexllm ( #10703 )
...
* migrate to ipexlm
* fix workflow
* fix run_multi
* fix precision map
* rename ipexlm to ipexllm
* rename bigdl to ipex in comments
2024-04-09 15:48:53 +08:00
Keyan (Kyrie) Zhang
1e27e08322
Modify example from fp32 to fp16 ( #10528 )
...
* Modify example from fp32 to fp16
* Remove Falcon from fp16 example for now
* Remove MPT from fp16 example
2024-04-09 15:45:49 +08:00
binbin Deng
44922bb5c2
LLM: support baichuan2-13b using AutoTP ( #10691 )
2024-04-09 14:06:01 +08:00
Yina Chen
c7422712fc
mistral 4.36 use fp16 sdp ( #10704 )
2024-04-09 13:50:33 +08:00
Ovo233
dcb2038aad
Enable optimization for sentence_transformers ( #10679 )
...
* enable optimization for sentence_transformers
* fix python style check failure
2024-04-09 12:33:46 +08:00
Yang Wang
5a1f446d3c
support fp8 in xetla ( #10555 )
...
* support fp8 in xetla
* change name
* adjust model file
* support convert back to cpu
* factor
* fix bug
* fix style
2024-04-08 13:22:09 -07:00
jenniew
591bae092c
combine english and chinese, remove nan
2024-04-08 19:37:51 +08:00
Cengguang Zhang
7c43ac0164
LLM: optimize llama natvie sdp for split qkv tensor ( #10693 )
...
* LLM: optimize llama natvie sdp for split qkv tensor.
* fix block real size.
* fix comment.
* fix style.
* refactor.
2024-04-08 17:48:11 +08:00
Xin Qiu
1274cba79b
stablelm fp8 kv cache ( #10672 )
...
* stablelm fp8 kvcache
* update
* fix
* change to fp8 matmul
* fix style
* fix
* fix
* meet code review
* add comment
2024-04-08 15:16:46 +08:00
Yishuo Wang
65127622aa
fix UT threshold ( #10689 )
2024-04-08 14:58:20 +08:00
Cengguang Zhang
c0cd238e40
LLM: support llama2 8k input with w4a16. ( #10677 )
...
* LLM: support llama2 8k input with w4a16.
* fix comment and style.
* fix style.
* fix comments and split tensor to quantized attention forward.
* fix style.
* refactor name.
* fix style.
* fix style.
* fix style.
* refactor checker name.
* refactor native sdp split qkv tensor name.
* fix style.
* fix comment rename variables.
* fix co-exist of intermedia results.
2024-04-08 11:43:15 +08:00
Zhicun
321bc69307
Fix llamaindex ut ( #10673 )
...
* fix llamaindex ut
* add GPU ut
2024-04-08 09:47:51 +08:00
yb-peng
2d88bb9b4b
add test api transformer_int4_fp16_gpu ( #10627 )
...
* add test api transformer_int4_fp16_gpu
* update config.yaml and README.md in all-in-one
* modify run.py in all-in-one
* re-order test-api
* re-order test-api in config
* modify README.md in all-in-one
* modify README.md in all-in-one
* modify config.yaml
---------
Co-authored-by: pengyb2001 <arda@arda-arc21.sh.intel.com>
Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
2024-04-07 15:47:17 +08:00
Wang, Jian4
47cabe8fcc
LLM: Fix no return_last_logit running bigdl_ipex chatglm3 ( #10678 )
...
* fix no return_last_logits
* update only for chatglm
2024-04-07 15:27:58 +08:00
Wang, Jian4
9ad4b29697
LLM: CPU benchmark using tcmalloc ( #10675 )
2024-04-07 14:17:01 +08:00
binbin Deng
d9a1153b4e
LLM: upgrade deepspeed in AutoTP on GPU ( #10647 )
2024-04-07 14:05:19 +08:00
Jin Qiao
56dfcb2ade
Migrate portable zip to ipex-llm ( #10617 )
...
* change portable zip prompt to ipex-llm
* fix chat with ui
* add no proxy
2024-04-07 13:58:58 +08:00
Zhicun
9d8ba64c0d
Llamaindex: add tokenizer_id and support chat ( #10590 )
...
* add tokenizer_id
* fix
* modify
* add from_model_id and from_mode_id_low_bit
* fix typo and add comment
* fix python code style
---------
Co-authored-by: pengyb2001 <284261055@qq.com>
2024-04-07 13:51:34 +08:00
Jin Qiao
10ee786920
Replace with IPEX-LLM in example comments ( #10671 )
...
* Replace with IPEX-LLM in example comments
* More replacement
* revert some changes
2024-04-07 13:29:51 +08:00
Xiangyu Tian
08018a18df
Remove not-imported MistralConfig ( #10670 )
2024-04-07 10:32:05 +08:00
Cengguang Zhang
1a9b8204a4
LLM: support int4 fp16 chatglm2-6b 8k input. ( #10648 )
2024-04-07 09:39:21 +08:00
Jiao Wang
69bdbf5806
Fix vllm print error message issue ( #10664 )
...
* update chatglm readme
* Add condition to invalidInputError
* update
* update
* style
2024-04-05 15:08:13 -07:00
Jason Dai
29d97e4678
Update readme ( #10665 )
2024-04-05 18:01:57 +08:00
Xin Qiu
4c3e493b2d
fix stablelm2 1.6b ( #10656 )
...
* fix stablelm2 1.6b
* meet code review
2024-04-03 22:15:32 +08:00
Jin Qiao
cc8b3be11c
Add GPU and CPU example for stablelm-zephyr-3b ( #10643 )
...
* Add example for StableLM
* fix
* add to readme
2024-04-03 16:28:31 +08:00
Heyang Sun
6000241b10
Add Deepspeed Example of FLEX Mistral ( #10640 )
2024-04-03 16:04:17 +08:00
Shaojun Liu
d18dbfb097
update spr perf test ( #10644 )
2024-04-03 15:53:55 +08:00
Yishuo Wang
702e686901
optimize starcoder normal kv cache ( #10642 )
2024-04-03 15:27:02 +08:00
Xin Qiu
3a9ab8f1ae
fix stablelm logits diff ( #10636 )
...
* fix logits diff
* Small fixes
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-04-03 15:08:12 +08:00
Zhicun
b827f534d5
Add tokenizer_id in Langchain ( #10588 )
...
* fix low-bit
* fix
* fix style
---------
Co-authored-by: arda <arda@arda-arc12.sh.intel.com>
2024-04-03 14:25:35 +08:00
Zhicun
f6fef09933
fix prompt format for llama-2 in langchain ( #10637 )
2024-04-03 14:17:34 +08:00
Jiao Wang
330d4b4f4b
update readme ( #10631 )
2024-04-02 23:08:02 -07:00
Kai Huang
c875b3c858
Add seq len check for llama softmax upcast to fp32 ( #10629 )
2024-04-03 12:05:13 +08:00
Jiao Wang
4431134ec5
update readme ( #10632 )
2024-04-02 19:54:30 -07:00
Jiao Wang
23e33a0ca1
Fix qwen-vl style ( #10633 )
...
* update
* update
2024-04-02 18:41:38 -07:00
binbin Deng
2bbd8a1548
LLM: fix llama2 FP16 & bs>1 & autotp on PVC and ARC ( #10611 )
2024-04-03 09:28:04 +08:00
Jiao Wang
654dc5ba57
Fix Qwen-VL example problem ( #10582 )
...
* update
* update
* update
* update
2024-04-02 12:17:30 -07:00
Yuwen Hu
fd384ddfb8
Optimize StableLM ( #10619 )
...
* Initial commit for stablelm optimizations
* Small style fix
* add dependency
* Add mlp optimizations
* Small fix
* add attention forward
* Remove quantize kv for now as head_dim=80
* Add merged qkv
* fix lisence
* Python style fix
---------
Co-authored-by: qiuxin2012 <qiuxin2012cs@gmail.com>
2024-04-02 18:58:38 +08:00
binbin Deng
27be448920
LLM: add cpu_embedding and peak memory record for deepspeed autotp script ( #10621 )
2024-04-02 17:32:50 +08:00
Yishuo Wang
ba8cc6bd68
optimize starcoder2-3b ( #10625 )
2024-04-02 17:16:29 +08:00
Shaojun Liu
a10f5a1b8d
add python style check ( #10620 )
...
* add python style check
* fix style checks
* update runner
* add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow
* update tag to 2.1.0-SNAPSHOT
2024-04-02 16:17:56 +08:00
Cengguang Zhang
58b57177e3
LLM: support bigdl quantize kv cache env and add warning. ( #10623 )
...
* LLM: support bigdl quantize kv cache env and add warnning.
* fix style.
* fix comments.
2024-04-02 15:41:08 +08:00
Kai Huang
0a95c556a1
Fix starcoder first token perf ( #10612 )
...
* add bias check
* update
2024-04-02 09:21:38 +08:00
Cengguang Zhang
e567956121
LLM: add memory optimization for llama. ( #10592 )
...
* add initial memory optimization.
* fix logic.
* fix logic,
* remove env var check in mlp split.
2024-04-02 09:07:50 +08:00
Keyan (Kyrie) Zhang
01f491757a
Modify the link in Langchain-upstream ut ( #10608 )
...
* Modify the link in Langchain-upstream ut
* fix langchain-upstream ut
2024-04-01 17:03:40 +08:00
Ruonan Wang
bfc1caa5e5
LLM: support iq1s for llama2-70b-hf ( #10596 )
2024-04-01 13:13:13 +08:00
Ruonan Wang
d6af4877dd
LLM: remove ipex.optimize for gpt-j ( #10606 )
...
* remove ipex.optimize
* fix
* fix
2024-04-01 12:21:49 +08:00
Yishuo Wang
437a349dd6
fix rwkv with pip installer ( #10591 )
2024-03-29 17:56:45 +08:00
WeiguangHan
9a83f21b86
LLM: check user env ( #10580 )
...
* LLM: check user env
* small fix
* small fix
* small fix
2024-03-29 17:19:34 +08:00
Keyan (Kyrie) Zhang
848fa04dd6
Fix typo in Baichuan2 example ( #10589 )
2024-03-29 13:31:47 +08:00
Ruonan Wang
0136fad1d4
LLM: support iq1_s ( #10564 )
...
* init version
* update utils
* remove unsed code
2024-03-29 09:43:55 +08:00
Qiyuan Gong
f4537798c1
Enable kv cache quantization by default for flex when 1 < batch <= 8 ( #10584 )
...
* Enable kv cache quantization by default for flex when 1 < batch <= 8.
* Change up bound from <8 to <=8.
2024-03-29 09:43:42 +08:00
Cengguang Zhang
b44f7adbad
LLM: Disable esimd sdp for PVC GPU when batch size>1 ( #10579 )
...
* llm: disable esimd sdp for pvc bz>1.
* fix logic.
* fix: avoid call get device name twice.
2024-03-28 22:55:48 +08:00
Xin Qiu
5963239b46
Fix qwen's position_ids no enough ( #10572 )
...
* fix position_ids
* fix position_ids
2024-03-28 17:05:49 +08:00
ZehuaCao
52a2135d83
Replace ipex with ipex-llm ( #10554 )
...
* fix ipex with ipex_llm
* fix ipex with ipex_llm
* update
* update
* update
* update
* update
* update
* update
* update
2024-03-28 13:54:40 +08:00
Cheen Hau, 俊豪
1c5eb14128
Update pip install to use --extra-index-url for ipex package ( #10557 )
...
* Change to 'pip install .. --extra-index-url' for readthedocs
* Change to 'pip install .. --extra-index-url' for examples
* Change to 'pip install .. --extra-index-url' for remaining files
* Fix URL for ipex
* Add links for ipex US and CN servers
* Update ipex cpu url
* remove readme
* Update for github actions
* Update for dockerfiles
2024-03-28 09:56:23 +08:00
binbin Deng
92dfed77be
LLM: fix abnormal output of fp16 deepspeed autotp ( #10558 )
2024-03-28 09:35:48 +08:00
Jason Dai
c450c85489
Delete llm/readme.md ( #10569 )
2024-03-27 20:06:40 +08:00
Xiangyu Tian
51d34ca68e
Fix wrong import in speculative ( #10562 )
2024-03-27 18:21:07 +08:00
Cheen Hau, 俊豪
f239bc329b
Specify oneAPI minor version in documentation ( #10561 )
2024-03-27 17:58:57 +08:00
WeiguangHan
fbeb10c796
LLM: Set different env based on different Linux kernels ( #10566 )
2024-03-27 17:56:33 +08:00
hxsz1997
d86477f14d
Remove native_int4 in LangChain examples ( #10510 )
...
* rebase the modify to ipex-llm
* modify the typo
2024-03-27 17:48:16 +08:00
Guancheng Fu
04baac5a2e
Fix fastchat top_k ( #10560 )
...
* fix -1 top_k
* fix
* done
2024-03-27 16:01:58 +08:00
binbin Deng
fc8c7904f0
LLM: fix torch_dtype setting of apply fp16 optimization through optimize_model ( #10556 )
2024-03-27 14:18:45 +08:00
Ruonan Wang
ea4bc450c4
LLM: add esimd sdp for pvc ( #10543 )
...
* add esimd sdp for pvc
* update
* fix
* fix batch
2024-03-26 19:04:40 +08:00
Jin Qiao
b78289a595
Remove ipex-llm dependency in readme ( #10544 )
2024-03-26 18:25:14 +08:00
Xiangyu Tian
11550d3f25
LLM: Add length check for IPEX-CPU speculative decoding ( #10529 )
...
Add length check for IPEX-CPU speculative decoding.
2024-03-26 17:47:10 +08:00
Guancheng Fu
a3b007f3b1
[Serving] Fix fastchat breaks ( #10548 )
...
* fix fastchat
* fix doc
2024-03-26 17:03:52 +08:00
Yishuo Wang
69a28d6b4c
fix chatglm ( #10540 )
2024-03-26 16:01:00 +08:00
Shaojun Liu
c563b41491
add nightly_build workflow ( #10533 )
...
* add nightly_build workflow
* add create-job-status-badge action
* update
* update
* update
* update setup.py
* release
* revert
2024-03-26 12:47:38 +08:00
binbin Deng
0a3e4e788f
LLM: fix mistral hidden_size setting for deepspeed autotp ( #10527 )
2024-03-26 10:55:44 +08:00
Xin Qiu
1dd40b429c
enable fp4 fused mlp and qkv ( #10531 )
...
* enable fp4 fused mlp and qkv
* update qwen
* update qwen2
2024-03-26 08:34:00 +08:00
Wang, Jian4
16b2ef49c6
Update_document by heyang ( #30 )
2024-03-25 10:06:02 +08:00
Wang, Jian4
a1048ca7f6
Update setup.py and add new actions and add compatible mode ( #25 )
...
* update setup.py
* add new action
* add compatible mode
2024-03-22 15:44:59 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm ( #24 )
...
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00
Jin Qiao
cc5806f4bc
LLM: add save/load example for hf-transformers ( #10432 )
2024-03-22 13:57:47 +08:00
Wang, Jian4
34d0a9328c
LLM: Speed-up mixtral in pipeline parallel inference ( #10472 )
...
* speed-up mixtral
* fix style
2024-03-22 11:06:28 +08:00
Cengguang Zhang
b9d4280892
LLM: fix baichuan7b quantize kv abnormal output. ( #10504 )
...
* fix abnormal output.
* fix style.
* fix style.
2024-03-22 10:00:08 +08:00
Yishuo Wang
f0f317b6cf
fix a typo in yuan ( #10503 )
2024-03-22 09:40:04 +08:00
Guancheng Fu
3a3756b51d
Add FastChat bigdl_worker ( #10493 )
...
* done
* fix format
* add licence
* done
* fix doc
* refactor folder
* add license
2024-03-21 18:35:05 +08:00
Xin Qiu
dba7ddaab3
add sdp fp8 for qwen llama436 baichuan mistral baichuan2 ( #10485 )
...
* add sdp fp8
* fix style
* fix qwen
* fix baichuan 13
* revert baichuan 13b and baichuan2-13b
* fix style
* update
2024-03-21 17:23:05 +08:00
Kai Huang
30f111cd32
lm_head empty_cache for more models ( #10490 )
...
* modify constraint
* fix style
2024-03-21 17:11:43 +08:00
Yuwen Hu
1579ee4421
[LLM] Add nightly igpu perf test for INT4+FP16 1024-128 ( #10496 )
2024-03-21 16:07:06 +08:00
binbin Deng
2958ca49c0
LLM: add patching function for llm finetuning ( #10247 )
2024-03-21 16:01:01 +08:00
Zhicun
5b97fdb87b
update deepseek example readme ( #10420 )
...
* update readme
* update
* update readme
2024-03-21 15:21:48 +08:00
hxsz1997
a5f35757a4
Migrate langchain rag cpu example to gpu ( #10450 )
...
* add langchain rag on gpu
* add rag example in readme
* add trust_remote_code in TransformersEmbeddings.from_model_id
* add trust_remote_code in TransformersEmbeddings.from_model_id in cpu
2024-03-21 15:20:46 +08:00
binbin Deng
85ef3f1d99
LLM: add empty cache in deepspeed autotp benchmark script ( #10488 )
2024-03-21 10:51:23 +08:00
Xiangyu Tian
5a5fd5af5b
LLM: Add speculative benchmark on CPU/XPU ( #10464 )
...
Add speculative benchmark on CPU/XPU.
2024-03-21 09:51:06 +08:00
Ruonan Wang
28c315a5b9
LLM: fix deepspeed error of finetuning on xpu ( #10484 )
2024-03-21 09:46:25 +08:00
Kai Huang
021d77fd22
Remove softmax upcast fp32 in llama ( #10481 )
...
* update
* fix style
2024-03-20 18:17:34 +08:00
Yishuo Wang
cfdf8ad496
Fix modules_not_to_convert argument ( #10483 )
2024-03-20 17:47:03 +08:00
Xiangyu Tian
cbe24cc7e6
LLM: Enable BigDL IPEX Int8 ( #10480 )
...
Enable BigDL IPEX Int8
2024-03-20 15:59:54 +08:00
ZehuaCao
1d062e24db
Update serving doc ( #10475 )
...
* update serving doc
* add tob
* update
* update
* update
* update vllm worker
2024-03-20 14:44:43 +08:00
Cengguang Zhang
4581e4f17f
LLM: fix whiper model missing config. ( #10473 )
...
* fix whiper model missing config.
* fix style.
* fix style.
* style.
2024-03-20 14:22:37 +08:00
Jin Qiao
e41d556436
LLM: change fp16 benchmark to model.half ( #10477 )
...
* LLM: change fp16 benchmark to model.half
* fix
2024-03-20 13:38:39 +08:00
Yishuo Wang
749bedaf1e
fix rwkv v5 fp16 ( #10474 )
2024-03-20 13:15:08 +08:00
Yuwen Hu
72bcc27da9
[LLM] Add TransformersBgeEmbeddings class in bigdl.llm.langchain.embeddings ( #10459 )
...
* Add TransformersBgeEmbeddings class in bigdl.llm.langchain.embeddings
* Small fixes
2024-03-19 18:04:35 +08:00
Cengguang Zhang
463a86cd5d
LLM: fix qwen-vl interpolation gpu abnormal results. ( #10457 )
...
* fix qwen-vl interpolation gpu abnormal results.
* fix style.
* update qwen-vl gpu example.
* fix comment and update example.
* fix style.
2024-03-19 16:59:39 +08:00
Jin Qiao
e9055c32f9
LLM: fix fp16 mem record in benchmark ( #10461 )
...
* LLM: fix fp16 mem record in benchmark
* change style
2024-03-19 16:17:23 +08:00
Jiao Wang
f3fefdc9ce
fix pad_token_id issue ( #10425 )
2024-03-18 23:30:28 -07:00
Yuxuan Xia
74e7490fda
Fix Baichuan2 prompt format ( #10334 )
...
* Fix Baichuan2 prompt format
* Fix Baichuan2 README
* Change baichuan2 prompt info
* Change baichuan2 prompt info
2024-03-19 12:48:07 +08:00
Jin Qiao
0451103a43
LLM: add int4+fp16 benchmark script for windows benchmarking ( #10449 )
...
* LLM: add fp16 for benchmark script
* remove transformer_int4_fp16_loadlowbit_gpu_win
2024-03-19 11:11:25 +08:00
Xin Qiu
bbd749dceb
qwen2 fp8 cache ( #10446 )
...
* qwen2 fp8 cache
* fix style check
2024-03-19 08:32:39 +08:00
Yang Wang
9e763b049c
Support running pipeline parallel inference by vertically partitioning model to different devices ( #10392 )
...
* support pipeline parallel inference
* fix logging
* remove benchmark file
* fic
* need to warmup twice
* support qwen and qwen2
* fix lint
* remove genxir
* refine
2024-03-18 13:04:45 -07:00
Ruonan Wang
66b4bb5c5d
LLM: update setup to provide cpp for windows ( #10448 )
2024-03-18 18:20:55 +08:00
Xiangyu Tian
dbdeaddd6a
LLM: Fix log condition for BIGDL_OPT_IPEX ( #10441 )
...
remove log for BIGDL_OPT_IPEX
2024-03-18 16:03:51 +08:00
Wang, Jian4
1de13ea578
LLM: remove CPU english_quotes dataset and update docker example ( #10399 )
...
* update dataset
* update readme
* update docker cpu
* update xpu docker
2024-03-18 10:45:14 +08:00
Xin Qiu
399843faf0
Baichuan 7b fp16 sdp and qwen2 pvc sdp ( #10435 )
...
* add baichuan sdp
* update
* baichuan2
* fix
* fix style
* revert 13b
* revert
2024-03-18 10:15:34 +08:00
Jiao Wang
5ab52ef5b5
update ( #10424 )
2024-03-15 09:24:26 -07:00
Yishuo Wang
bd64488b2a
add mask support for llama/chatglm fp8 sdp ( #10433 )
...
* add mask support for fp8 sdp
* fix chatglm2 dtype
* update
2024-03-15 17:36:52 +08:00
Keyan (Kyrie) Zhang
444b11af22
Add LangChain upstream ut test for ipynb ( #10387 )
...
* Add LangChain upstream ut test for ipynb
* Integrate unit test for LangChain upstream ut and ipynb into one file
* Modify file name
* Remove LangChain version update in unit test
* Move Langchain upstream ut job to arc
* Modify path in .yml file
* Modify path in llm_unit_tests.yml
* Avoid create directory repeatedly
2024-03-15 16:31:01 +08:00
Jin Qiao
ca372f6dab
LLM: add save/load example for ModelScope ( #10397 )
...
* LLM: add sl example for modelscope
* fix according to comments
* move file
2024-03-15 15:17:50 +08:00
Xin Qiu
24473e331a
Qwen2 fp16 sdp ( #10427 )
...
* qwen2 sdp and refine
* update
* update
* fix style
* remove use_flash_attention
2024-03-15 13:12:03 +08:00
Kai Huang
1315150e64
Add baichuan2-13b 1k to arc nightly perf ( #10406 )
2024-03-15 10:29:11 +08:00
Ruonan Wang
b036205be2
LLM: add fp8 sdp for chatglm2/3 ( #10411 )
...
* add fp8 sdp for chatglm2
* fix style
2024-03-15 09:38:18 +08:00
Wang, Jian4
fe8976a00f
LLM: Support gguf models use low_bit and fix no json( #10408 )
...
* support others model use low_bit
* update readme
* update to add *.json
2024-03-15 09:34:18 +08:00
Xin Qiu
cda38f85a9
Qwen fp16 sdp ( #10401 )
...
* qwen sdp
* fix
* update
* update
* update sdp
* update
* fix style check
* add to origin type
2024-03-15 08:51:50 +08:00
dingbaorong
1c0f7ed3fa
add xpu support ( #10419 )
2024-03-14 17:13:48 +08:00
Heyang Sun
7d29765092
refactor qwen2 forward to enable XPU ( #10409 )
...
* refactor awen2 forward to enable XPU
* Update qwen2.py
2024-03-14 11:03:05 +08:00
Yuxuan Xia
f36224aac4
Fix ceval run.sh ( #10410 )
2024-03-14 10:57:25 +08:00
ZehuaCao
f66329e35d
Fix multiple get_enable_ipex function error ( #10400 )
...
* fix multiple get_enable_ipex function error
* remove get_enable_ipex_low_bit function
2024-03-14 10:14:13 +08:00
Kai Huang
76e30d8ec8
Empty cache for lm_head ( #10317 )
...
* empty cache
* add comments
2024-03-13 20:31:53 +08:00
Ruonan Wang
2be8bbd236
LLM: add cpp option in setup.py ( #10403 )
...
* add llama_cpp option
* meet code review
2024-03-13 20:12:59 +08:00
Ovo233
0dbce53464
LLM: Add decoder/layernorm unit tests ( #10211 )
...
* add decoder/layernorm unit tests
* update tests
* delete decoder tests
* address comments
* remove none type check
* restore nonetype checks
* delete nonetype checks; add decoder tests for Llama
* add gc
* deal with tuple output
2024-03-13 19:41:47 +08:00
Yishuo Wang
06a851afa9
support new baichuan model ( #10404 )
2024-03-13 17:45:50 +08:00
Yuxuan Xia
a90e9b6ec2
Fix C-Eval Workflow ( #10359 )
...
* Fix Baichuan2 prompt format
* Fix ceval workflow errors
* Fix ceval workflow error
* Fix ceval error
* Fix ceval error
* Test ceval
* Fix ceval
* Fix ceval
* Fix ceval
* Fix ceval
* Fix ceval
* Fix ceval
* Fix ceval
* Fix ceval
* Fix ceval
* Fix ceval
* Fix ceval
* Fix ceval
* Add ceval dependency test
* Fix ceval
* Fix ceval
* Test full ceval
* Test full ceval
* Fix ceval
* Fix ceval
2024-03-13 17:23:17 +08:00
Yishuo Wang
b268baafd6
use fp8 sdp in llama ( #10396 )
2024-03-13 16:45:38 +08:00
Xiangyu Tian
60043a3ae8
LLM: Support Baichuan2-13b in BigDL-vLLM ( #10398 )
...
Support Baichuan2-13b in BigDL-vLLM.
2024-03-13 16:21:06 +08:00
Xiangyu Tian
e10de2c42d
[Fix] LLM: Fix condition check error for speculative decoding on CPU ( #10402 )
...
Fix condition check error for speculative decoding on CPU
2024-03-13 16:05:06 +08:00
Keyan (Kyrie) Zhang
f158b49835
[LLM] Recover arc ut test for Falcon ( #10385 )
2024-03-13 13:31:35 +08:00
Heyang Sun
d72c0fad0d
Qwen2 SDPA forward on CPU ( #10395 )
...
* Fix Qwen1.5 CPU forward
* Update convert.py
* Update qwen2.py
2024-03-13 13:10:03 +08:00
Yishuo Wang
ca58a69b97
fix arc rms norm UT ( #10394 )
2024-03-13 13:09:15 +08:00
Wang, Jian4
0193f29411
LLM : Enable gguf float16 and Yuan2 model ( #10372 )
...
* enable float16
* add yun files
* enable yun
* enable set low_bit on yuan2
* update
* update license
* update generate
* update readme
* update python style
* update
2024-03-13 10:19:18 +08:00
Yina Chen
f5d65203c0
First token lm_head optimization ( #10318 )
...
* add lm head linear
* update
* address comments and fix style
* address comment
2024-03-13 10:11:32 +08:00
Keyan (Kyrie) Zhang
7cf01e6ec8
Add LangChain upstream ut test ( #10349 )
...
* Add LangChain upstream ut test
* Add LangChain upstream ut test
* Specify version numbers in yml script
* Correct langchain-community version
2024-03-13 09:52:45 +08:00
Xin Qiu
28c4a8cf5c
Qwen fused qkv ( #10368 )
...
* fused qkv + rope for qwen
* quantized kv cache
* fix
* update qwen
* fixed quantized qkv
* fix
* meet code review
* update split
* convert.py
* extend when no enough kv
* fix
2024-03-12 17:39:00 +08:00
Yishuo Wang
741c2bf1df
use new rms norm ( #10384 )
2024-03-12 17:29:51 +08:00
Xiangyu Tian
0ded0b4b13
LLM: Enable BigDL IPEX optimization for int4 ( #10319 )
...
Enable BigDL IPEX optimization for int4
2024-03-12 17:08:50 +08:00
binbin Deng
5d7e044dbc
LLM: add low bit option in deepspeed autotp example ( #10382 )
2024-03-12 17:07:09 +08:00
binbin Deng
df3bcc0e65
LLM: remove english_quotes dataset ( #10370 )
2024-03-12 16:57:40 +08:00
Zhao Changmin
df2b84f7de
Enable kv cache on arc batch ( #10308 )
2024-03-12 16:46:04 +08:00
Lilac09
5809a3f5fe
Add run-hbm.sh & add user guide for spr and hbm ( #10357 )
...
* add run-hbm.sh
* add spr and hbm guide
* only support quad mode
* only support quad mode
* update special cases
* update special cases
2024-03-12 16:15:27 +08:00
binbin Deng
5d996a5caf
LLM: add benchmark script for deepspeed autotp on gpu ( #10380 )
2024-03-12 15:19:57 +08:00
Keyan (Kyrie) Zhang
f9c144dc4c
Fix final logits ut failure ( #10377 )
...
* Fix final logits ut failure
* Fix final logits ut failure
* Remove Falcon from completion test for now
* Remove Falcon from unit test for now
2024-03-12 14:34:01 +08:00
Guancheng Fu
cc4148636d
[FastChat-integration] Add initial implementation for loader ( #10323 )
...
* add initial implementation for loader
* add test method for model_loader
* data
* Refine
2024-03-12 10:54:59 +08:00
WeiguangHan
17bdb1a60b
LLM: add whisper models into nightly test ( #10193 )
...
* LLM: add whisper models into nightly test
* small fix
* small fix
* add more whisper models
* test all cases
* test specific cases
* collect the csv
* store the resut
* to html
* small fix
* small test
* test all cases
* modify whisper_csv_to_html
2024-03-11 20:00:47 +08:00
binbin Deng
dbcfc5c2fa
LLM: fix error of 'AI-ModelScope/phi-2' hosted by ModelScope hub ( #10364 )
2024-03-11 16:19:17 +08:00
binbin Deng
fe27a6971c
LLM: update modelscope version ( #10367 )
2024-03-11 16:18:27 +08:00
Chen, Zhentao
a425eaabfc
fix from_pretrained when device_map=None ( #10361 )
...
* pr trigger
* fix error when device_map=None
* fix device_map=None
2024-03-11 16:06:12 +08:00
Yina Chen
d7b765fd3f
serving xpu memory opt ( #10358 )
2024-03-11 15:21:22 +08:00
Ruonan Wang
be29833b2b
LLM: fix qwen2 ( #10356 )
2024-03-11 09:29:08 +08:00
Zhicun
9026c08633
Fix llamaindex AutoTokenizer bug ( #10345 )
...
* fix tokenizer
* fix AutoTokenizer bug
* modify code style
2024-03-08 16:24:50 +08:00
Zhicun
2a10b53d73
rename docqa.py->rag.py ( #10353 )
2024-03-08 16:07:09 +08:00
Keyan (Kyrie) Zhang
f1825d7408
Add RMSNorm unit test ( #10190 )
2024-03-08 15:51:03 +08:00
Shengsheng Huang
370c52090c
Langchain readme ( #10348 )
...
* update langchain readme
* update readme
* create new README
* Update README_nativeint4.md
2024-03-08 14:57:24 +08:00
Keyan (Kyrie) Zhang
7a621a4db0
Fix device_map bug by raise an error when using device_map=xpu ( #10340 )
...
* Fix device_map bug by raise an error when using device_map=xpu
* Fix sync error
* Fix python style
* Use invalidInputError instead of invalidOperationError
2024-03-08 13:38:52 +08:00
Yishuo Wang
1ac193ba02
add rope theta argument ( #10343 )
2024-03-07 17:27:19 +08:00
Yuxuan Xia
0c8d3c9830
Add C-Eval HTML report ( #10294 )
...
* Add C-Eval HTML report
* Fix C-Eval workflow pr trigger path
* Fix C-Eval workflow typos
* Add permissions to C-Eval workflow
* Fix C-Eval workflow typo
* Add pandas dependency
* Fix C-Eval workflow typo
2024-03-07 16:44:49 +08:00
Cengguang Zhang
496d18ab6d
LLM: add quantize kv cache support for baichuan 7b and 13b. ( #10330 )
...
* add quantize kv cache for baichuan 7b and 13b.
* fix typo.
* fix.
* fix style.
* fix style.
2024-03-07 16:17:38 +08:00
hxsz1997
b7db21414e
Update llamaindex ut ( #10338 )
...
* add test_llamaindex of gpu
* add llamaindex gpu tests bash
* add llamaindex cpu tests bash
* update name of Run LLM langchain GPU test
* import llama_index in llamaindex gpu ut
* update the dependency of test_llamaindex
* add Run LLM llamaindex GPU test
* modify import dependency of llamaindex cpu test
* add Run LLM llamaindex test
* update llama_model_path
* delete unused model path
* add LLAMA2_7B_ORIGIN_PATH in llamaindex cpu test
2024-03-07 10:06:16 +08:00
ZehuaCao
267de7abc3
fix fschat DEP version error ( #10325 )
2024-03-06 16:15:27 +08:00
Yina Chen
9ea499ca68
Optimize speculative decoding PVC memory usage ( #10329 )
...
* optimize memory
* update
* update
* update
* support other models
* update
* fix style
2024-03-06 09:54:21 +08:00
dingbaorong
cc796848ea
fix typos ( #10274 )
...
Co-authored-by: Ariadne <wyn2000330@126.com>
2024-03-05 18:38:22 +08:00
hxsz1997
af11c53473
Add the installation step of postgresql and pgvector on windows in LlamaIndex GPU support ( #10328 )
...
* add the installation of postgresql and pgvector of windows
* fix some format
2024-03-05 18:31:19 +08:00
Yishuo Wang
0011ff9f64
optimize bge large performance ( #10324 )
2024-03-05 17:06:03 +08:00
Shaojun Liu
178eea5009
upload bigdl-llm wheel to sourceforge for backup ( #10321 )
...
* test: upload to sourceforge
* update scripts
* revert
2024-03-05 16:36:01 +08:00
Cengguang Zhang
30d009bca7
LLM: support quantized kv cache for Mistral in transformers >=4.36.0 ( #10326 )
...
* support quantize kv for mistral in transformers 4.36
* update mistral support.
* fix style.
2024-03-05 16:23:50 +08:00
dingbaorong
1e6f0c6f1a
Add llamaindex gpu example ( #10314 )
...
* add llamaindex example
* fix core dump
* refine readme
* add trouble shooting
* refine readme
---------
Co-authored-by: Ariadne <wyn2000330@126.com>
2024-03-05 13:36:00 +08:00
dingbaorong
fc7f10cd12
add langchain gpu example ( #10277 )
...
* first draft
* fix
* add readme for transformer_int4_gpu
* fix doc
* check device_map
* add arc ut test
* fix ut test
* fix langchain ut
* Refine README
* fix gpu mem too high
* fix ut test
---------
Co-authored-by: Ariadne <wyn2000330@126.com>
2024-03-05 13:33:57 +08:00
Yuwen Hu
5dbbe1a826
[LLM] Support for new arc ut runner ( #10311 )
...
* Support for new arc ut runner
* Comment unnecessary OMP_NUM_THREADS related settings for arc uts
2024-03-04 18:42:02 +08:00
Yuwen Hu
d45e577d8c
[LLM] Test load_low_bit in iGPU perf test on Windows ( #10313 )
2024-03-04 18:03:57 +08:00
WeiguangHan
fd81d66047
LLM: Compress some models to save space ( #10315 )
...
* LLM: compress some models to save space
* add deleted comments
2024-03-04 17:53:03 +08:00
Shaojun Liu
bab2ee5f9e
update nightly spr perf test ( #10178 )
...
* update nightly spr perf test
* update
* update runner lable
* update
* update
* update folder
* revert
2024-03-04 13:46:33 +08:00
Cengguang Zhang
ab9fc2485f
LLM: add quantize kv support for llama transformer 4.36 ( #10298 )
...
* add quantize kv support for llama transformer 4.36
* fix style.
* fix style.
2024-03-04 10:33:35 +08:00
Xin Qiu
58208a5883
Update FAQ document. ( #10300 )
...
* Update install_gpu.md
* Update resolve_error.md
* Update README.md
* Update resolve_error.md
* Update README.md
* Update resolve_error.md
2024-03-04 08:35:11 +08:00
Yuwen Hu
27d9a14989
[LLM] all-on-one update: memory optimize and streaming output ( #10302 )
...
* Memory saving for continous in-out pair run and add support for streaming output on MTL iGPU
* Small fix
* Small fix
* Add things back
2024-03-01 18:02:30 +08:00
SONG Ge
0ab40917fb
[LLM] Split merged_qk to separated q/k linear ( #10299 )
...
* modify merge_qk_linear to separated q/k linear
* update
2024-03-01 16:48:55 +08:00
Yang Wang
f4d7dbcde2
use fused qkv forward in qwen2 ( #10185 )
...
* use fused qkv forward in qwen2
* support both
* fix style
* fix rope
* remove pring
* fix style
* clean up
2024-03-01 16:46:35 +08:00
Xin Qiu
509e206de0
update doc about gemma random and unreadable output. ( #10297 )
...
* Update install_gpu.md
* Update README.md
* Update README.md
2024-03-01 15:41:16 +08:00
Wang, Jian4
beb9433cec
LLM: Reduce speculative _ipex_optimize_model memory use ( #10281 )
...
* use tpp
* update ipex
2024-03-01 13:48:23 +08:00
Yuwen Hu
f0ff0eebe1
[LLM] Support quantize kv cache for Baichuan2 7B ( #10280 )
...
* Add quatized kv cache framework for Baichuan2 7B
* Support quantize kv cache for baichuan2
* Small fix
* Fix python style
2024-03-01 13:35:42 +08:00
SONG Ge
273de341d7
hot-fix silu error import ( #10292 )
2024-03-01 10:11:37 +08:00
Shengsheng Huang
bcfad555df
revise llamaindex readme ( #10283 )
2024-02-29 17:19:23 +08:00
Xin Qiu
232273a1b5
Enable Gemma fused mlp + Gelu ( #10276 )
...
* update llama mlp forward
* add all
* fix style check
* split
* update
* update
* update
* fix style
2024-02-29 16:53:24 +08:00
Guancheng Fu
2d930bdca8
Add vLLM bf16 support ( #10278 )
...
* add argument load_in_low_bit
* add docs
* modify gpu doc
* done
---------
Co-authored-by: ivy-lv11 <lvzc@lamda.nju.edu.cn>
2024-02-29 16:33:42 +08:00
SONG Ge
13b0bc9075
[LLM] Add quantize_kv optimization for yuan2 model ( #10243 )
...
* add initial quantize_kv support for yuan2 model
* fix yuan2 quantize_kv generation
* apply fp16 conv layer optimizations
* disable mlp for quantize_kv
2024-02-29 16:33:26 +08:00
Zhicun
4e6cc424f1
Add LlamaIndex RAG ( #10263 )
...
* run demo
* format code
* add llamaindex
* add custom LLM with bigdl
* update
* add readme
* begin ut
* add unit test
* add license
* add license
* revised
* update
* modify docs
* remove data folder
* update
* modify prompt
* fixed
* fixed
* fixed
2024-02-29 15:21:19 +08:00
Jin Qiao
5d7243067c
LLM: add Baichuan2-13B-Chat 2048-256 to MTL perf ( #10273 )
2024-02-29 13:48:55 +08:00
Ruonan Wang
a9fd20b6ba
LLM: Update qkv fusion for GGUF-IQ2 ( #10271 )
...
* first commit
* update mistral
* fix transformers==4.36.0
* fix
* disable qk for mixtral now
* fix style
2024-02-29 12:49:53 +08:00
Jiao Wang
6fb65bb9d2
fix in transformers 4.36 ( #10150 )
2024-02-28 18:43:01 -08:00
Shengsheng Huang
43dac97e03
Update README.md ( #10260 )
2024-02-29 10:41:14 +08:00
Ruonan Wang
4b08bc1417
LLM: relax batch check of flash atttention by double check attention mask ( #10270 )
...
* relax batch check
* fix
* fix style
2024-02-29 09:39:55 +08:00
Yina Chen
07f36fbfcc
Fix gptj failed to extend ( #10269 )
2024-02-29 09:39:27 +08:00
Yishuo Wang
cccb02dad1
fix baichuan2 13b 2k input ( #10267 )
2024-02-28 17:20:20 +08:00
Heyang Sun
7244fd1ba5
Fix Arc StarCoder wrong query_shape when input is long ( #10268 )
...
* Fix Arc StarCoder wrong query_shape when input is long
* Update gptbigcode.py
2024-02-28 17:07:08 +08:00
Cengguang Zhang
a4de3095f3
LLM: Support quantize kv cache in mistral. ( #10261 )
...
* init
* update quantize kv.
2024-02-28 14:08:08 +08:00
Shengsheng Huang
db0d129226
Revert "Add rwkv example ( #9432 )" ( #10264 )
...
This reverts commit 6930422b42 .
2024-02-28 11:48:31 +08:00
Yining Wang
6930422b42
Add rwkv example ( #9432 )
...
* codeshell fix wrong urls
* restart runner
* add RWKV CPU & GPU example (rwkv-4-world-7b)
* restart runner
* update submodule
* fix runner
* runner-test
---------
Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-02-28 11:41:00 +08:00
Keyan (Kyrie) Zhang
59861f73e5
Add Deepseek-6.7B ( #9991 )
...
* Add new example Deepseek
* Add new example Deepseek
* Add new example Deepseek
* Add new example Deepseek
* Add new example Deepseek
* modify deepseek
* modify deepseek
* Add verified model in README
* Turn cpu_embedding=True in Deepseek example
---------
Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-02-28 11:36:39 +08:00
Yuxuan Xia
2524273198
Update AutoGen README ( #10255 )
...
* Update AutoGen README
* Fix AutoGen README typos
* Update AutoGen README
* Update AutoGen README
2024-02-28 11:34:45 +08:00
Zheng, Yi
2347f611cf
Add cpu and gpu examples of Mamba ( #9797 )
...
* Add mamba cpu example
* Add mamba gpu example
* Use a smaller model as the example
* minor fixes
---------
Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-02-28 11:33:29 +08:00
Zhao Changmin
937e1f7c74
rebase ( #9104 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2024-02-28 11:18:21 +08:00
JunX
4833067489
fix GPU example link in README.md ( #9533 )
...
* fix GPU example link in README.md
* fix GPU links in llm README.md
2024-02-28 11:13:18 +08:00
Zhicun
308e637d0d
Add DeepSeek-MoE-16B-Chat ( #10155 )
...
* dsmoe-hf add
* add dsmoe pytorch
* update README
* modify comment
* remove GPU example
* update model name
* format code
2024-02-28 10:12:09 +08:00
Guoqiong Song
f4a2e32106
Stream llm example for both GPU and CPU ( #9390 )
2024-02-27 15:54:47 -08:00
Yang Wang
c581c6db30
draft mmint4 ( #10031 )
...
change to llm.cpp
support transposed format
revert
implement qkv fuse
fix style
change to vertically pack
change to enable_xetla
fix mlp_fusion_check
remove comments
address comments
add some comments
fix style
2024-02-27 14:55:16 -08:00
hxsz1997
cba61a2909
Add html report of ppl ( #10218 )
...
* remove include and language option, select the corresponding dataset based on the model name in Run
* change the nightly test time
* change the nightly test time of harness and ppl
* save the ppl result to json file
* generate csv file and print table result
* generate html
* modify the way to get parent folder
* update html in parent folder
* add llm-ppl-summary and llm-ppl-summary-html
* modify echo single result
* remove download fp16.csv
* change model name of PR
* move ppl nightly related files to llm/test folder
* reformat
* seperate make_table from make_table_and_csv.py
* separate make_csv from make_table_and_csv.py
* update llm-ppl-html
* remove comment
* add Download fp16.results
2024-02-27 17:37:08 +08:00
Zhicun
6d60982746
Env script: add license ( #10257 )
...
* env script
* update README.md
* modify README
* modify cpu info output
* add env-check.sh
* add env-check.bat
* add windows
* modify bat
* add license
2024-02-27 15:29:20 +08:00
Yishuo Wang
b4fa4ab46f
optimize yuan 2.0 again ( #10252 )
2024-02-27 14:51:42 +08:00
Zhicun
03b9c4930a
UX: Script to print env info ( #10088 )
...
* env script
* update README.md
* modify README
* modify cpu info output
* add env-check.sh
* add env-check.bat
* add windows
* modify bat
2024-02-27 14:45:36 +08:00
Keyan (Kyrie) Zhang
843fe546b0
Add CPU and GPU examples for DeciLM-7B ( #9867 )
...
* Add cpu and gpu examples for DeciLM-7B
* Add cpu and gpu examples for DeciLM-7B
* Add DeciLM-7B to README table
* modify deciLM
* modify deciLM
* modify deciLM
* Add verified model in README
* Add cpu_embedding=True
2024-02-27 13:15:49 +08:00
Yuwen Hu
38ae4b372f
Add yuan2-2b to win igpu perf test ( #10250 )
2024-02-27 11:08:33 +08:00
Heyang Sun
36a9e88104
Speculative Starcoder on CPU ( #10138 )
...
* Speculative Starcoder on CPU
* enable kv-cache pre-allocation
* refine codes
* refine
* fix style
* fix style
* fix style
* refine
* refine
* Update speculative.py
* Update gptbigcode.py
* fix style
* Update speculative.py
* enable mixed-datatype layernorm on top of torch API
* adaptive dtype
* Update README.md
2024-02-27 09:57:29 +08:00
Yishuo Wang
a47989c860
optimize yuan 2.0 performance ( #10244 )
2024-02-26 17:20:10 +08:00
Wang, Jian4
6c74b99a28
LLM: Update qwen readme ( #10245 )
2024-02-26 17:03:09 +08:00
hxsz1997
15ad2fd72e
Merge pull request #10226 from zhentaocc/fix_harness
...
Fix harness
2024-02-26 16:49:27 +08:00
Wang, Jian4
f9b75f900b
LLM: Enable qwen target_model ipex ( #10232 )
...
* change order
* enable qwen ipex
* update qwen example
* update
* fix style
* update
2024-02-26 16:41:12 +08:00
Jin Qiao
3e6d188553
LLM: add baichuan2-13b to mtl perf ( #10238 )
2024-02-26 15:55:56 +08:00
Yuwen Hu
e38e29511c
[LLM] Yuan2 MLP and Rotary optimization ( #10231 )
...
* Add optimization for rotary embedding
* Add mlp fused optimizatgion
* Python style fix
* Fix rotary embedding due to logits difference
* Small fix
2024-02-26 15:10:08 +08:00
Ziteng Zhang
ea23afc8ec
[LLM]update ipex part in mistral example readme ( #10239 )
...
* update ipex part in mistral example readme
2024-02-26 14:35:20 +08:00
SONG Ge
df2f3885ba
[LLM] Enable kv_cache and forward_qkv optimizations for yuan2 ( #10225 )
...
* add init kv_cache support for yuan2
* add forward qkv in yuan
2024-02-26 11:29:48 +08:00
Xiangyu Tian
85a99e13e8
LLM: Fix ChatGLM3 Speculative Example ( #10236 )
...
Fix ChatGLM3 Speculative Example.
2024-02-26 10:57:28 +08:00
Chen, Zhentao
213ef06691
fix readme
2024-02-24 00:38:08 +08:00
Ruonan Wang
28513f3978
LLM: support fp16 embedding & add mlp fusion for iq2_xxs ( #10219 )
...
* add fp16 embed
* small fixes
* fix style
* fix style
* fix comment
2024-02-23 17:26:24 +08:00
Yuwen Hu
eeecd9fc08
Python style fix ( #10230 )
2024-02-23 17:21:23 +08:00
Yuwen Hu
e511bbd8f1
[LLM] Add basic optimization framework for Yuan2 ( #10227 )
...
* Add basic optimization framework for Yuan2
* Small fix
* Python style fix
* Small fix
* Small fix
2024-02-23 17:05:00 +08:00
Xin Qiu
8ef5482da2
update Gemma readme ( #10229 )
...
* Update README.md
* Update README.md
* Update README.md
* Update README.md
2024-02-23 16:57:08 +08:00
Chen, Zhentao
6fe5344fa6
separate make_csv from the file
2024-02-23 16:33:38 +08:00
Chen, Zhentao
bfa98666a6
fall back to make_table.py
2024-02-23 16:33:38 +08:00
Ruonan Wang
19260492c7
LLM: fix action/installation error of mpmath ( #10223 )
...
* fix
* test
* fix
* update
2024-02-23 16:14:53 +08:00
Xin Qiu
aabfc06977
add gemma example ( #10224 )
...
* add gemma gpu example
* Update README.md
* add cpu example
* Update README.md
* Update README.md
* Update generate.py
* Update generate.py
2024-02-23 15:20:57 +08:00
yb-peng
a2c1675546
Add CPU and GPU examples for Yuan2-2B-hf ( #9946 )
...
* Add a new CPU example of Yuan2-2B-hf
* Add a new CPU generate.py of Yuan2-2B-hf example
* Add a new GPU example of Yuan2-2B-hf
* Add Yuan2 to README table
* In CPU example:1.Use English as default prompt; 2.Provide modified files in yuan2-2B-instruct
* In GPU example:1.Use English as default prompt;2.Provide modified files
* GPU example:update README
* update Yuan2-2B-hf in README table
* Add CPU example for Yuan2-2B in Pytorch-Models
* Add GPU example for Yuan2-2B in Pytorch-Models
* Add license in generate.py; Modify README
* In GPU Add license in generate.py; Modify README
* In CPU yuan2 modify README
* In GPU yuan2 modify README
* In CPU yuan2 modify README
* In GPU example, updated the readme for Windows GPU supports
* In GPU torch example, updated the readme for Windows GPU supports
* GPU hf example README modified
* GPU example README modified
2024-02-23 14:09:30 +08:00
yb-peng
f1f4094a09
Add CPU and GPU examples of phi-2 ( #10014 )
...
* Add CPU and GPU examples of phi-2
* In GPU hf example, updated the readme for Windows GPU supports
* In GPU torch example, updated the readme for Windows GPU supports
* update the table in BigDL/README.md
* update the table in BigDL/python/llm/README.md
2024-02-23 14:05:53 +08:00
Chen, Zhentao
f315c7f93a
Move harness nightly related files to llm/test folder ( #10209 )
...
* move harness nightly files to test folder
* change workflow file path accordingly
* use arc01 when pr
* fix path
* fix fp16 csv path
2024-02-23 11:12:36 +08:00
Xin Qiu
30795bdfbc
Gemma optimization: rms_norm, kv_cache, fused_rope, fused_rope+qkv ( #10212 )
...
* gemma optimization
* update
* update
* fix style
* meet code review
2024-02-23 10:07:24 +08:00
Guoqiong Song
63681af97e
falcon for transformers 4.36 ( #9960 )
...
* falcon for transformers 4.36
2024-02-22 17:04:40 -08:00
Jason Dai
84d5f40936
Update README.md ( #10213 )
2024-02-22 17:22:59 +08:00
Yina Chen
ce5840a8b7
GPT-J rope optimization on xpu ( #10182 )
...
* optimize
* update
* fix style & move use_fuse_rope
* add ipex version check
* fix style
* update
* fix style
* meet comments
* address comments
* fix style
2024-02-22 16:25:12 +08:00
Xiangyu Tian
f445217d02
LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize ( #10189 )
...
Update IPEX to 2.2.0+cpu and refactor for _ipex_optimize.
2024-02-22 16:01:11 +08:00
Heyang Sun
c876d9b5ca
Support for MPT rotary embedding ( #10208 )
2024-02-22 15:16:31 +08:00
Ruonan Wang
5e1fee5e05
LLM: add GGUF-IQ2 examples ( #10207 )
...
* add iq2 examples
* small fix
* meet code review
* fix
* meet review
* small fix
2024-02-22 14:18:45 +08:00
Yuwen Hu
21de2613ce
[LLM] Add model loading time record for all-in-one benchmark ( #10201 )
...
* Add model loading time record in csv for all-in-one benchmark
* Small fix
* Small fix to number after .
2024-02-22 13:57:18 +08:00
Ovo233
60e11b6739
LLM: Add mlp layer unit tests ( #10200 )
...
* add mlp layer unit tests
* add download baichuan-13b
* exclude llama for now
* install additional packages
* rename bash file
* switch to Baichuan2
* delete attention related code
* fix name errors in yml file
2024-02-22 13:44:45 +08:00
SONG Ge
ca1166a0e5
[LLM] Add quantize kv_cache for Baichuan2-13B ( #10203 )
...
* add quantize kv_cache for baichuan2-13b
* style fix
2024-02-22 13:43:35 +08:00
Ruonan Wang
34ee1aa91f
LLM: add esimd sdp support for chatglm3 ( #10205 )
...
* add esimd sdp support
* fix style
2024-02-22 13:37:16 +08:00
Yuxuan Xia
7cbc2429a6
Fix C-Eval ChatGLM loading issue ( #10206 )
...
* Add c-eval workflow and modify running files
* Modify the chatglm evaluator file
* Modify the ceval workflow for triggering test
* Modify the ceval workflow file
* Modify the ceval workflow file
* Modify ceval workflow
* Adjust the ceval dataset download
* Add ceval workflow dependencies
* Modify ceval workflow dataset download
* Add ceval test dependencies
* Add ceval test dependencies
* Correct the result print
* Fix the nightly test trigger time
* Fix ChatGLM loading issue
2024-02-22 10:00:43 +08:00
Yuwen Hu
94cb16fe40
[LLM] Small updates to Win GPU Install Doc ( #10199 )
...
* Make Offline installer as default for win gpu doc for oneAPI
* Small other fixes
2024-02-21 17:58:40 +08:00
binbin Deng
9975b029c5
LLM: add qlora finetuning example using trl.SFTTrainer ( #10183 )
2024-02-21 16:40:04 +08:00
Ruonan Wang
f7c96b19ef
LLM: support iq2 for mixtral ( #10191 )
...
* support name mapping for mixtral
* support mixtral mixed quantization
* fix style
* fix
2024-02-21 16:00:29 +08:00
yb-peng
b1a97b71a9
Harness eval: Add is_last parameter and fix logical operator in highlight_vals ( #10192 )
...
* Add is_last parameter and fix logical operator in highlight_vals
* Add script to update HTML files in parent folder
* Add running update_html_in_parent_folder.py in summarize step
* Add licence info
* Remove update_html_in_parent_folder.py in Summarize the results for pull request
2024-02-21 14:45:32 +08:00
Zhicun
c7e839e66c
Add Qwen1.5-7B-Chat ( #10113 )
...
* add Qwen1.5-7B-Chat
* modify Qwen1.5 example
* update README
* update prompt format
* update folder name and example README
* add Chinese prompt sample output
* update link in README
* correct the link
* update transformer version
2024-02-21 13:29:29 +08:00
Xin Qiu
56ad781f2f
qwen2 cpu fix ( #10187 )
2024-02-21 11:23:51 +08:00
Chen, Zhentao
39d37bd042
upgrade harness package version in workflow ( #10188 )
...
* upgrade harness
* update readme
2024-02-21 11:21:30 +08:00
Yuwen Hu
001c13243e
[LLM] Add support for low_low_bit benchmark on Windows GPU ( #10167 )
...
* Add support for low_low_bit performance test on Windows GPU
* Small fix
* Small fix
* Save memory during converting model process
* Drop the results for first time when loading in low bit on mtl igpu for better performance
* Small fix
2024-02-21 10:51:52 +08:00
Ziteng Zhang
276ef0e885
Speculative Ziya on CPU ( #10160 )
...
* Speculative Ziya on CPU
* Without part of Accelerate with BIGDL_OPT_IPEX
2024-02-21 10:30:39 +08:00
Zhao Changmin
4fbf449c2d
for rwkv4 ( #10179 )
2024-02-21 10:11:10 +08:00
yb-peng
de3dc609ee
Modify harness evaluation workflow ( #10174 )
...
* Modify table head in harness
* Specify the file path of fp16.csv
* change run to run nightly and run pr to debug
* Modify the way to get fp16.csv to downloading from github
* Change the method to calculate diff in html table
* Change the method to calculate diff in html table
* Re-arrange job order
* Re-arrange job order
* Change limit
* Change fp16.csv path
* Change highlight rules
* Change limit
2024-02-20 18:55:43 +08:00
Ruonan Wang
3288acb8de
LLM : Support embedding quantization (only q2k now) ( #10170 )
...
* basic logic added
* basic support
* support save&load, update mixed strategy
* fix style
* use int8 for lm_head
* add check for xpu
2024-02-20 16:56:57 +08:00
hxsz1997
6e10d98a8d
Fix some typos ( #10175 )
...
* add llm-ppl workflow
* update the DATASET_DIR
* test multiple precisions
* modify nightly test
* match the updated ppl code
* add matrix.include
* fix the include error
* update the include
* add more model
* update the precision of include
* update nightly time and add more models
* fix the workflow_dispatch description, change default model of pr and modify the env
* modify workflow_dispatch language options
* modify options
* modify language options
* modeify workflow_dispatch type
* modify type
* modify the type of language
* change seq_len type
* fix some typos
* revert changes to stress_test.txt
2024-02-20 14:14:53 +08:00
Zhicun
add3899311
Add ziya CPU example ( #10114 )
...
* ziya on CPU
* add README for ziya
* specify use_cache
* add arc CPU
* update prompt format
* update link
* add comments to emphasize use_cache
* update pip cmd
2024-02-20 13:59:52 +08:00
binbin Deng
2bb96c775c
LLM: fix device setting during saving optimized model ( #10154 )
2024-02-20 09:52:59 +08:00
Xin Qiu
1f6d5b9f30
enable fused rmsnorm and rope qwen2 ( #10163 )
...
* qwen2
* change convert
* cleanup
2024-02-20 08:33:09 +08:00
yb-peng
e31210ba00
Modify html table style and add fp16.csv in harness ( #10169 )
...
* Specify the version of pandas in harness evaluation workflow
* Specify the version of pandas in harness evaluation workflow
* Modify html table style and add fp16.csv in harness
* Modify comments
2024-02-19 18:13:40 +08:00
WeiguangHan
6c09aed90d
LLM: add qwen_1.5_7b model for arc perf test ( #10166 )
...
* LLM: add qwen_1.5_7b model for arc perf test
* small fix
* revert some codes
2024-02-19 17:21:00 +08:00