Xin Qiu
0f9a440b06
doc for Multi gpu selection ( #9414 )
2023-11-20 09:25:58 +08:00
Xin Qiu
50b01058f1
enable new q4_1 ( #9479 )
2023-11-17 14:58:57 +08:00
binbin Deng
3dac21ac7b
LLM: add more example usages about alpaca qlora on different hardware ( #9458 )
2023-11-17 09:56:43 +08:00
Heyang Sun
921b263d6a
update deepspeed install and run guide in README ( #9441 )
2023-11-17 09:11:39 +08:00
Zhao Changmin
30abd304a7
LLM: Fix baichuan pre-normalize model tensor assigning issue when loading ( #9481 )
...
* No need to normalized when loading
2023-11-16 21:57:28 +08:00
WeiguangHan
bc06bec90e
LLM: modify the script to generate html results more accurately ( #9445 )
...
* modify the script to generate html results more accurately
* resolve some comments
* revert some codes
2023-11-16 19:50:23 +08:00
Ruonan Wang
c0ef70df02
llm: quick fix of fast_rms_norm ( #9480 )
2023-11-16 14:42:16 +08:00
Yina Chen
d5263e6681
Add awq load support ( #9453 )
...
* Support directly loading GPTQ models from huggingface
* fix style
* fix tests
* change example structure
* address comments
* fix style
* init
* address comments
* add examples
* fix style
* fix style
* fix style
* fix style
* update
* remove
* meet comments
* fix style
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com>
2023-11-16 14:06:25 +08:00
Ruonan Wang
d2c064124a
LLM: update rms related usage to suport ipex 2.1 new api ( #9466 )
...
* update rms related usage
* fix style
2023-11-16 11:21:50 +08:00
Yuwen Hu
731b0aaade
Empty cache after embedding to cpu ( #9477 )
2023-11-16 10:52:30 +08:00
WeiguangHan
c487b53f21
LLM: only run arc perf test nightly ( #9448 )
...
* LLM: only run arc perf test nightly
* deleted unused python scripts
* rebase main
2023-11-15 19:38:14 +08:00
WeiguangHan
0d55bbd9f1
LLM: ajust the order of some models ( #9470 )
2023-11-15 17:04:59 +08:00
Lilac09
13f6eb77b4
Add exec bash to entrypoint.sh to keep container running after being booted. ( #9471 )
...
* add bigdl-llm-init
* boot bash
2023-11-15 16:09:16 +08:00
Xin Qiu
170e0072af
chatglm2 correctness test ( #9450 )
...
* chatglm2 ut
* some update
* chatglm2 path
* fix
* add print
2023-11-15 15:44:56 +08:00
Lilac09
24146d108f
add bigdl-llm-init ( #9468 )
2023-11-15 14:55:33 +08:00
Ruonan Wang
0f82b8c3a0
LLM: update qlora example ( #9454 )
...
* update qlora example
* fix loss=0
2023-11-15 09:24:15 +08:00
Chen, Zhentao
dbbdb53a18
fix multiple gpu usage ( #9459 )
2023-11-14 17:06:27 +08:00
Chen, Zhentao
d19ca21957
patch bigdl-llm model to harness by binding instead of patch file ( #9420 )
...
* add run_llb.py
* fix args interpret
* modify outputs
* update workflow
* add license
* test mixed 4 bit
* update readme
* use autotokenizer
* add timeout
* refactor workflow file
* fix working directory
* fix env
* throw exception if some jobs failed
* improve terminal outputs
* Disable var which cause the run stuck
* fix unknown precision
* fix key error
* directly output config instead
* rm harness submodule
2023-11-14 12:51:39 +08:00
Yang Wang
51d07a9fd8
Support directly loading gptq models from huggingface ( #9391 )
...
* Support directly loading GPTQ models from huggingface
* fix style
* fix tests
* change example structure
* address comments
* fix style
* address comments
2023-11-13 20:48:12 -08:00
Lilac09
b2b085550b
Remove bigdl-nano and add ipex into inference-cpu image ( #9452 )
...
* remove bigdl-nano and add ipex into inference-cpu image
* remove bigdl-nano in docker
* remove bigdl-nano in docker
2023-11-14 10:50:52 +08:00
Wang, Jian4
0f78ebe35e
LLM : Add qlora cpu finetune docker image ( #9271 )
...
* init qlora cpu docker image
* update
* remove ipex and update
* update
* update readme
* update example and readme
2023-11-14 10:36:53 +08:00
WeiguangHan
d109275333
temporarily disable the test of some models ( #9434 )
2023-11-13 18:50:53 +08:00
Chen, Zhentao
0ecb9efb05
use AutoTokenizer to enable more models ( #9446 )
2023-11-13 17:47:43 +08:00
Cengguang Zhang
ece5805572
LLM: add chatglm3-6b to latency benchmark test. ( #9442 )
2023-11-13 17:24:37 +08:00
Shaojun Liu
0e5ab5ebfc
update docker tag to 2.5.0-SNAPSHOT ( #9443 )
2023-11-13 16:53:40 +08:00
Chen, Zhentao
5747e2fe69
fix multiple gpu usage of harness ( #9444 )
2023-11-13 16:53:23 +08:00
Heyang Sun
da6bbc8c11
fix deepspeed dependencies to install ( #9400 )
...
* remove reductant parameter from deepspeed install
* Update install.sh
* Update install.sh
2023-11-13 16:42:50 +08:00
Shaojun Liu
8c603014ac
update 2.5.0-snapshot readthedoc ( #9440 )
2023-11-13 16:31:43 +08:00
Chen, Zhentao
f36d7b2d59
Fix harness stuck ( #9435 )
...
* remove env to avoid being stuck
* use small model for test
2023-11-13 15:29:53 +08:00
Yuwen Hu
4faf5af8f1
[LLM] Add perf test for core on Windows ( #9397 )
...
* temporary stop other perf test
* Add framework for core performance test with one test model
* Small fix and add platform control
* Comment out lp for now
* Add missing ymal file
* Small fix
* Fix sed contents
* Small fix
* Small path fixes
* Small fix
* Add update to ftp
* Small upload fix
* add chatglm3-6b
* LLM: add model names
* Keep repo id same as ftp and temporary make baichuan2 first priority
* change order
* Remove temp if false and separate pr and nightly results
* Small fix
---------
Co-authored-by: jinbridge <2635480475@qq.com>
2023-11-13 13:58:40 +08:00
Lilac09
5d4ec44488
Add all-in-one benchmark into inference-cpu docker image ( #9433 )
...
* add all-in-one into inference-cpu image
* manually_build
* revise files
2023-11-13 13:07:56 +08:00
Zheng, Yi
9b5d0e9c75
Add examples for Yi-6B ( #9421 )
2023-11-13 10:53:15 +08:00
SONG Ge
2888818b3a
[LLM] Support mixed_fp8 on Arc ( #9415 )
...
* ut gpu allocation memory fix
* support mix_8bit on arc
* rename mixed_4bit to mixed_fp4 and mixed_8bit to mixed_fp8
* revert unexpected changes
* revert unexpected changes
* unify common logits
* rename in llm xmx_checker
* fix typo error and re-unify
2023-11-13 09:26:30 +08:00
Wang, Jian4
ac7fbe77e2
Update qlora readme ( #9416 )
2023-11-12 19:29:29 +08:00
Yining Wang
d7334513e1
codeshell: fix wrong links ( #9417 )
2023-11-12 19:22:33 +08:00
WeiguangHan
2cfef5ef1e
LLM: store the nightly test and pr results separately ( #9404 )
...
* LLM: store the csv results separately
* modify the trigger files of LLM Performance Test
2023-11-11 06:35:27 +08:00
Zheng, Yi
0674146cfb
Add cpu and gpu examples of distil-whisper ( #9374 )
...
* Add distil-whisper examples
* Fixes based on comments
* Minor fixes
---------
Co-authored-by: Ariadne330 <wyn2000330@126.com>
2023-11-10 16:09:55 +08:00
Ziteng Zhang
ad81b5d838
Update qlora README.md ( #9422 )
2023-11-10 15:19:25 +08:00
Heyang Sun
b23b91407c
fix llm-init on deepspeed missing lib ( #9419 )
2023-11-10 13:51:24 +08:00
SONG Ge
dfb00e37e9
[LLM] Add model correctness test on ARC for llama and falcon ( #9347 )
...
* add correctness test on arc for llama model
* modify layer name
* add falcon ut
* refactor and add ut for falcon model
* modify lambda positions and update docs
* replace loading pre input with last decodelayer output
* switch lower bound to single model instead of using the common one
* make the code implementation simple
* fix gpu action allocation memory issue
2023-11-10 13:48:57 +08:00
Yuwen Hu
3d107f6d25
[LLM] Separate windows build UT and build runner ( #9403 )
...
* Separate windows build UT and build runner
* Small fix
2023-11-09 18:47:38 +08:00
dingbaorong
36fbe2144d
Add CPU examples of fuyu ( #9393 )
...
* add fuyu cpu examples
* add gpu example
* add comments
* add license
* remove gpu example
* fix inference time
2023-11-09 15:29:19 +08:00
Heyang Sun
df8e4d7889
[LLM] apply allreduce and bias to training in LowBitLinear ( #9395 )
2023-11-09 14:35:54 +08:00
Wang, Jian4
40cead6b5b
LLM: Fix CPU qlora dtype convert issue ( #9394 )
2023-11-09 14:34:01 +08:00
WeiguangHan
34449cb4bb
LLM: add remaining models to the arc perf test ( #9384 )
...
* add remaining models
* modify the filepath which stores the test result on ftp server
* resolve some comments
2023-11-09 14:28:42 +08:00
Yuwen Hu
d4b248fcd4
Add windows binary build label AVX_VNNI ( #9387 )
2023-11-08 18:13:35 +08:00
Ruonan Wang
bfca76dfa7
LLM: optimize QLoRA by updating lora convert logic ( #9372 )
...
* update convert logic of qlora
* update
* refactor and further improve performance
* fix style
* meet code review
2023-11-08 17:46:49 +08:00
binbin Deng
54d95e4907
LLM: add alpaca qlora finetuning example ( #9276 )
2023-11-08 16:25:17 +08:00
binbin Deng
97316bbb66
LLM: highlight transformers version requirement in mistral examples ( #9380 )
2023-11-08 16:05:03 +08:00
Ruonan Wang
7e8fb29b7c
LLM: optimize QLoRA by reducing convert time ( #9370 )
2023-11-08 13:14:34 +08:00