Lilac09
5842f7530e
upgrade ubuntu version in llm-inference cpu image ( #9307 )
2023-10-30 16:51:38 +08:00
Cheen Hau, 俊豪
cee9eaf542
[LLM] Fix llm arc ut oom ( #9300 )
...
* Move model to cpu after testing so that gpu memory is deallocated
* Add code comment
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2023-10-30 14:38:34 +08:00
dingbaorong
ee5becdd61
use coco image in Qwen-VL ( #9298 )
...
* use coco image
* add output
* address yuwen's comments
2023-10-30 14:32:35 +08:00
Yang Wang
163d033616
Support qlora in CPU ( #9233 )
...
* support qlora in CPU
* revert example
* fix style
2023-10-27 14:01:15 -07:00
Yang Wang
8838707009
Add deepspeed autotp example readme ( #9289 )
...
* Add deepspeed autotp example readme
* change word
2023-10-27 13:04:38 -07:00
dingbaorong
f053688cad
add cpu example of LLaVA ( #9269 )
...
* add LLaVA cpu example
* Small text updates
* update link
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2023-10-27 18:59:20 +08:00
Zheng, Yi
7f2ad182fd
Minor Fixes of README ( #9294 )
2023-10-27 18:25:46 +08:00
Zheng, Yi
1bff54a378
Display demo.jpg n the README.md of HuggingFace Transformers Agent ( #9293 )
...
* Display demo.jpg
* remove demo.jpg
2023-10-27 18:00:03 +08:00
Zheng, Yi
a4a1dec064
Add a cpu example of HuggingFace Transformers Agent (use vicuna-7b-v1.5) ( #9284 )
...
* Add examples of HF Agent
* Modify folder structure and add link of demo.jpg
* Fixes of readme
* Merge applications and Applications
2023-10-27 17:14:12 +08:00
Guoqiong Song
aa319de5e8
Add streaming-llm using llama2 on CPU ( #9265 )
...
Enable streaming-llm to let model take infinite inputs, tested on desktop and SPR10
2023-10-27 01:30:39 -07:00
Yuwen Hu
21631209a9
[LLM] Skip CPU performance test for now ( #9291 )
...
* Skip llm cpu performance test for now
* Add install for wheel package
2023-10-27 12:55:04 +08:00
Ziteng Zhang
46ab0419b8
Merge pull request #9279 from Jasonzzt/main
...
Add bigdl-llm-finetune-cpu to manually_build to upload image on hub
2023-10-27 09:55:08 +08:00
Cheen Hau, 俊豪
6c9ae420a5
Add regression test for optimize_model on gpu ( #9268 )
...
* Add MPT model to transformer API test
* Add regression test for optimize_model on gpu.
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2023-10-27 09:23:19 +08:00
Yuwen Hu
733df28a2b
[LLM] Migrate Arc UT to another runner ( #9286 )
...
* Separate arc llm ut to another runner
* Add dependency for einops
2023-10-26 19:08:57 +08:00
Cengguang Zhang
44b5fcc190
LLM: fix pretraining_tp argument issue. ( #9281 )
2023-10-26 18:43:58 +08:00
WeiguangHan
6b2a32eba2
LLM: add missing function for PyTorch InternLM model ( #9285 )
2023-10-26 18:05:23 +08:00
Yina Chen
f879c48f98
fp8 convert use ggml code ( #9277 )
2023-10-26 17:03:29 +08:00
Ziteng Zhang
916ccc0779
Update manually_build_for_testing.yml
2023-10-26 16:26:14 +08:00
Ziteng Zhang
14a23015f8
Update manually_build.yml
2023-10-26 16:24:03 +08:00
Jasonzzt
37b1708d16
Add bigdl-llm-finetune-cpu to manually_build
2023-10-26 15:53:44 +08:00
Jasonzzt
f2d1f5349c
Merge branch 'main' of https://github.com/Jasonzzt/BigDL
2023-10-26 15:46:50 +08:00
Lilac09
4ed7f066d3
add bigdl-llm-finetune-xpu to manually_build ( #9278 )
2023-10-26 15:30:05 +08:00
Yina Chen
e2264e8845
Support arc fp4 ( #9266 )
...
* support arc fp4
* fix style
* fix style
2023-10-25 15:42:48 +08:00
Cheen Hau, 俊豪
ab40607b87
Enable unit test workflow on Arc ( #9213 )
...
* Add gpu workflow and a transformers API inference test
* Set device-specific env variables in script instead of workflow
* Fix status message
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2023-10-25 15:17:18 +08:00
SONG Ge
160a1e5ee7
[WIP] Add UT for Mistral Optimized Model ( #9248 )
...
* add ut for mistral model
* update
* fix model path
* upgrade transformers version for mistral model
* refactor correctness ut for mustral model
* refactor mistral correctness ut
* revert test_optimize_model back
* remove mistral from test_optimize_model
* add to revert transformers version back to 4.31.0
2023-10-25 15:14:17 +08:00
Yang Wang
067c7e8098
Support deepspeed AutoTP ( #9230 )
...
* Support deepspeed
* add test script
* refactor convert
* refine example
* refine
* refine example
* fix style
* refine example and adapte latest ipex
* fix style
2023-10-24 23:46:28 -07:00
Yining Wang
a6a8afc47e
Add qwen vl CPU example ( #9221 )
...
* eee
* add examples on CPU and GPU
* fix
* fix
* optimize model examples
* add Qwen-VL-Chat CPU example
* Add Qwen-VL CPU example
* fix optimize problem
* fix error
* Have updated, benchmark fix removed from this PR
* add generate API example
* Change formats in qwen-vl example
* Add CPU transformer int4 example for qwen-vl
* fix repo-id problem and add Readme
* change picture url
* Remove unnecessary file
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2023-10-25 13:22:12 +08:00
binbin Deng
f597a9d4f5
LLM: update perf test configuration ( #9264 )
2023-10-25 12:35:48 +08:00
binbin Deng
770ac70b00
LLM: add low_bit option in benchmark scripts ( #9257 )
2023-10-25 10:27:48 +08:00
WeiguangHan
ec9195da42
LLM: using html to visualize the perf result for Arc ( #9228 )
...
* LLM: using html to visualize the perf result for Arc
* deploy the html file
* add python license
* reslove some comments
2023-10-24 18:05:25 +08:00
Jin Qiao
90162264a3
LLM: replace torch.float32 with auto type ( #9261 )
2023-10-24 17:12:13 +08:00
SONG Ge
bd5215d75b
[LLM] Reimplement chatglm fuse rms optimization ( #9260 )
...
* re-implement chatglm rope rms
* update
2023-10-24 16:35:12 +08:00
dingbaorong
5a2ce421af
add cpu and gpu examples of flan-t5 ( #9171 )
...
* add cpu and gpu examples of flan-t5
* address yuwen's comments
* Add explanation why we add modules to not convert
* Refine prompt and add a translation example
* Add a empty line at the end of files
* add examples of flan-t5 using optimize_mdoel api
* address bin's comments
* address binbin's comments
* add flan-t5 in readme
2023-10-24 15:24:01 +08:00
Yining Wang
4a19f50d16
phi-1_5 CPU and GPU examples ( #9173 )
...
* eee
* add examples on CPU and GPU
* fix
* fix
* optimize model examples
* have updated
* Warmup and configs added
* Update two tables
2023-10-24 15:08:04 +08:00
Ziteng Zhang
ca2965fb9f
hosted k8s.png on readthedocs ( #9258 )
2023-10-24 15:07:16 +08:00
Jasonzzt
ed76205e0b
hosted k8s.png on readthedocs
2023-10-24 14:44:55 +08:00
SONG Ge
bfc1e2d733
add fused rms optimization for chatglm model ( #9256 )
2023-10-24 14:40:58 +08:00
Ruonan Wang
b15656229e
LLM: fix benchmark issue ( #9255 )
2023-10-24 14:15:05 +08:00
Guancheng Fu
f37547249d
Refine README/CICD ( #9253 )
2023-10-24 12:56:03 +08:00
Guancheng Fu
9faa2f1eef
Fix bigdl-llm-serving-tdx image ( #9251 )
2023-10-24 10:49:35 +08:00
binbin Deng
db37edae8a
LLM: update langchain api document page ( #9222 )
2023-10-24 10:13:41 +08:00
Xin Qiu
0c5055d38c
add position_ids and fuse embedding for falcon ( #9242 )
...
* add position_ids for falcon
* add cpu
* add cpu
* add license
2023-10-24 09:58:20 +08:00
Guancheng Fu
7f66bc5c14
Fix bigdl-llm-serving-cpu Dockerfile ( #9247 )
2023-10-23 16:51:30 +08:00
Guancheng Fu
6cb884d82d
Fix missing manually_build_for_testing entry ( #9245 )
2023-10-23 16:35:09 +08:00
Guancheng Fu
2ead3f7d54
add manually build ( #9244 )
2023-10-23 15:53:30 +08:00
Wang, Jian4
c14a61681b
Add load low-bit in model-serving for reduce EPC ( #9239 )
...
* init load low-bit
* fix
* fix
2023-10-23 11:28:20 +08:00
Yina Chen
0383306688
Add arc fp8 support ( #9232 )
...
* add fp8 support
* add log
* fix style
2023-10-20 17:15:07 +08:00
Jason Dai
26850ebd36
Update readme ( #9237 )
2023-10-20 16:13:25 +08:00
Yang Wang
118249b011
support transformers 4.34+ for llama ( #9229 )
2023-10-19 22:36:30 -07:00
binbin Deng
7e96d3e79a
LLM: improve gpu supports key feature doc page ( #9212 )
2023-10-19 18:40:48 +08:00