Ziteng Zhang
65934c9f4f
[LLM] Fix Qwen causal_mask and attention_mask size mismatching ( #9600 )
...
* Fix #9582 , caused by Qwen modified modeling_qwen.py 7f62181c94 (d2h-049182)
2023-12-05 15:15:54 +08:00
Jinyi Wan
b721138132
Add cpu and gpu examples for BlueLM ( #9589 )
...
* Add cpu int4 example for BlueLM
* addexample optimize_model cpu for bluelm
* add example gpu int4 blueLM
* add example optimiza_model GPU for bluelm
* Fixing naming issues and BigDL package version.
* Fixing naming issues...
* Add BlueLM in README.md "Verified Models"
2023-12-05 13:59:02 +08:00
Guancheng Fu
8b00653039
fix doc ( #9599 )
2023-12-05 13:49:31 +08:00
Qiyuan Gong
f211f136b6
Configurable TORCH_LINEAR_THRESHOLD from env ( #9588 )
...
* Add TORCH_LINEAR_THRESHOLD from env (BIGDL_LLM_LINEAR_THRESHOLD)
* Change default to 512
2023-12-05 13:19:47 +08:00
Yuwen Hu
1012507a40
[LLM] Fix performance tests ( #9596 )
...
* Fix missing key for cpu_embedding
* Remove 512 as it stuck for now
* Small fix
2023-12-05 10:59:28 +08:00
Chen, Zhentao
8c8a27ded7
Add harness summary job ( #9457 )
...
* format yml
* add make_table_results
* add summary job
* add a job to print single result
* upload full directory
2023-12-05 10:04:10 +08:00
Yuwen Hu
3f4ad97929
[LLM] Add performance tests for windows iGPU ( #9584 )
...
* Add support for win gpu benchmark with peak gpu memory monitoring
* Add win igpu tests
* Small fix
* Forward outputs
* Small fix
* Test and small fixes
* Small fix
* Small fix and test
* Small fixes
* Add tests for 512-64 and change back to nightly tests
* Small fix
2023-12-04 20:50:02 +08:00
Chen, Zhentao
9557aa9c21
Fix harness nightly ( #9586 )
...
* update golden
* loose the restriction of diff
* only compare results when scheduled
2023-12-04 11:45:00 +08:00
Xiangyu Tian
5c03651309
[LLM] vLLM: Add Preempt for scheduler ( #9568 )
...
Implement Preempt_by_recompute method for vllm.
2023-12-03 20:16:25 +08:00
Chen, Zhentao
cb228c70ea
Add harness nightly ( #9552 )
...
* modify output_path as a directory
* schedule nightly at 21 on Friday
* add tasks and models for nightly
* add accuracy regression
* comment out if to test
* mixed fp4
* for test
* add missing delimiter
* remove comma
* fixed golden results
* add mixed 4 golden result
* add more options
* add mistral results
* get golden result of stable lm
* move nightly scripts and results to test folder
* add license
* add fp8 stable lm golden
* run on all available devices
* trigger only when ready for review
* fix new line
* update golden
* add mistral
2023-12-01 14:16:35 +08:00
Chen, Zhentao
4d7d5d4c59
Add 3 leaderboard tasks ( #9566 )
...
* update leaderboard map
* download model and dataset without overwritten
* fix task drop
* run on all available devices
2023-12-01 14:01:14 +08:00
Wang, Jian4
ed0dc57c6e
LLM: Add cpu qlora support other models guide ( #9567 )
...
* use bf16 flag
* add using baichuan model
* update merge
* remove
* update
2023-12-01 11:18:04 +08:00
Jason Dai
bda404fc8f
Update readme ( #9575 )
2023-11-30 22:45:52 +08:00
Xin Qiu
69c49d21f5
use fused rms norm ( #9572 )
...
* use fused rms norm
* meet code review
2023-11-30 21:47:41 +08:00
Yishuo Wang
66f5b45f57
[LLM] add a llama2 gguf example ( #9553 )
2023-11-30 16:37:17 +08:00
Yishuo Wang
7f6465518a
support loading llama tokenizer from gguf model ( #9565 )
2023-11-30 14:56:12 +08:00
Wang, Jian4
a0a80d232e
LLM: Add qlora cpu distributed readme ( #9561 )
...
* init readme
* add distributed guide
* update
2023-11-30 13:42:30 +08:00
Chen, Zhentao
c8e0c2ed48
Fixed dumped logs in harness ( #9549 )
...
* install transformers==4.34.0
* modify output_path as a directory
* add device and task to output dir parents
2023-11-30 12:47:56 +08:00
Qiyuan Gong
d85a430a8c
Uing bigdl-llm-init instead of bigdl-nano-init ( #9558 )
...
* Replace `bigdl-nano-init` with `bigdl-llm-init`.
* Install `bigdl-llm` instead of `bigdl-nano`.
* Remove nano in README.
2023-11-30 10:10:29 +08:00
Yuwen Hu
34503efa6a
Fix cpu pinned embedding ( #9556 )
2023-11-29 18:27:56 +08:00
binbin Deng
4ff2ca9d0d
LLM: fix loss error on Arc ( #9550 )
2023-11-29 15:16:18 +08:00
Yishuo Wang
65121c7997
support loading q4_1/q5_0/q5_1/q8_0 gguf model ( #9546 )
2023-11-29 14:40:37 +08:00
Wang, Jian4
b824754256
LLM: Update for cpu qlora mpirun ( #9548 )
2023-11-29 10:56:17 +08:00
Yuwen Hu
5f5ca38b74
[LLM Doc] Fix api doc rendering error ( #9542 )
...
* Fix api rendering error
* Fix python style
2023-11-29 09:17:09 +08:00
Yishuo Wang
a86c6e0b56
[LLM] support loading gguf model ( #9544 )
2023-11-28 15:51:15 +08:00
Xiangyu Tian
916c338772
fix bugs in vllm length check ( #9543 )
2023-11-28 11:09:54 +08:00
WeiguangHan
5098bc3544
LLM: enable previous models ( #9505 )
...
* enable previous models
* test mistral model
* for test
* run models separately
* test all models
* for test
* revert the llm_performance_test.yaml
2023-11-28 10:21:07 +08:00
Zhao Changmin
e7e0cd3b5e
CPU Pinned embedding Layer ( #9538 )
...
* CPU Pinned embedding
2023-11-28 09:46:31 +08:00
Guancheng Fu
963a5c8d79
Add vLLM-XPU version's README/examples ( #9536 )
...
* test
* test
* fix last kv cache
* add xpu readme
* remove numactl for xpu example
* fix link error
* update max_num_batched_tokens logic
* add explaination
* add xpu environement version requirement
* refine gpu memory
* fix
* fix style
2023-11-28 09:44:03 +08:00
Guancheng Fu
b6c3520748
Remove xformers from vLLM-CPU ( #9535 )
2023-11-27 11:21:25 +08:00
binbin Deng
2b9c7d2a59
LLM: quick fix alpaca qlora finetuning script ( #9534 )
2023-11-27 11:04:27 +08:00
Yuwen Hu
11fa3de290
Add sutup support of win gpu for bigdl-llm ( #9512 )
2023-11-24 17:49:21 +08:00
Chen, Zhentao
45820cf3b9
add optimize model option ( #9530 )
2023-11-24 17:10:49 +08:00
binbin Deng
6bec0faea5
LLM: support Mistral AWQ models ( #9520 )
2023-11-24 16:20:22 +08:00
Ruonan Wang
914a5a5a27
LLM: fix abnormal Mistral GPU accuracy by updating rms_norm ( #9529 )
2023-11-24 15:37:50 +08:00
SONG Ge
3d24823cda
hot-fix mistral kv_cache ( #9528 )
2023-11-24 14:33:04 +08:00
Zhao Changmin
42b7a16bc5
Replace torch.bmm with safe_bmm ( #9519 )
...
* replace bmm with safe one
* rename args and deprecated warning
2023-11-24 12:16:48 +08:00
Jason Dai
b3178d449f
Update README.md ( #9525 )
2023-11-23 21:45:20 +08:00
Jason Dai
82898a4203
Update GPU example README ( #9524 )
2023-11-23 21:20:26 +08:00
Jason Dai
064848028f
Update README.md ( #9523 )
2023-11-23 21:16:21 +08:00
Ruonan Wang
b63aae8a8e
LLM: add flash attention support for llama ( #9518 )
...
* add initial flash attention for llama
* accelerate fp32 first token by changing to fp16 in advance
* support fp32
2023-11-23 18:40:18 +08:00
Guancheng Fu
bf579507c2
Integrate vllm ( #9310 )
...
* done
* Rename structure
* add models
* Add structure/sampling_params,sequence
* add input_metadata
* add outputs
* Add policy,logger
* add and update
* add parallelconfig back
* core/scheduler.py
* Add llm_engine.py
* Add async_llm_engine.py
* Add tested entrypoint
* fix minor error
* Fix everything
* fix kv cache view
* fix
* fix
* fix
* format&refine
* remove logger from repo
* try to add token latency
* remove logger
* Refine config.py
* finish worker.py
* delete utils.py
* add license
* refine
* refine sequence.py
* remove sampling_params.py
* finish
* add license
* format
* add license
* refine
* refine
* Refine line too long
* remove exception
* so dumb style-check
* refine
* refine
* refine
* refine
* refine
* refine
* add README
* refine README
* add warning instead error
* fix padding
* add license
* format
* format
* format fix
* Refine vllm dependency (#1 )
vllm dependency clear
* fix licence
* fix format
* fix format
* fix
* adapt LLM engine
* fix
* add license
* fix format
* fix
* Moving README.md to the correct position
* Fix readme.md
* done
* guide for adding models
* fix
* Fix README.md
* Add new model readme
* remove ray-logic
* refactor arg_utils.py
* remove distributed_init_method logic
* refactor entrypoints
* refactor input_metadata
* refactor model_loader
* refactor utils.py
* refactor models
* fix api server
* remove vllm.stucture
* revert by txy 1120
* remove utils
* format
* fix license
* add bigdl model
* Refer to a specfic commit
* Change code base
* add comments
* add async_llm_engine comment
* refine
* formatted
* add worker comments
* add comments
* add comments
* fix style
* add changes
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2023-11-23 16:46:45 +08:00
Heyang Sun
48fbb1eb94
support ccl (MPI) distributed mode in alpaca_qlora_finetuning_cpu ( #9507 )
2023-11-23 10:58:09 +08:00
Qiyuan Gong
0f0c6bb631
[LLM] Fix Qwen registered_causal_mask is None ( #9513 )
...
* Add registered_causal_mask init based on 2abd8e5777 .
2023-11-23 09:28:04 +08:00
Heyang Sun
11fa5a8a0e
Fix QLoRA CPU dispatch_model issue about accelerate ( #9506 )
2023-11-23 08:41:25 +08:00
Heyang Sun
1453046938
install bigdl-llm in deepspeed cpu inference example ( #9508 )
2023-11-23 08:39:21 +08:00
binbin Deng
86743fb57b
LLM: fix transformers version in CPU finetuning example ( #9511 )
2023-11-22 15:53:07 +08:00
binbin Deng
1a2129221d
LLM: support resume from checkpoint in Alpaca QLoRA ( #9502 )
2023-11-22 13:49:14 +08:00
Ruonan Wang
139e98aa18
LLM: quick fix benchmark ( #9509 )
2023-11-22 10:19:57 +08:00
WeiguangHan
c2aeb4d1e8
del model after test ( #9504 )
2023-11-21 18:41:50 +08:00
Ruonan Wang
076d106ef5
LLM: GPU QLoRA update to bf16 to accelerate gradient checkpointing ( #9499 )
...
* update to bf16 to accelerate gradient checkpoint
* add utils and fix ut
2023-11-21 17:08:36 +08:00
Cheen Hau, 俊豪
3e39828420
Update all in one benchmark readme ( #9496 )
...
* Add gperftools install to all in one benchmark readme
* Update readme
2023-11-21 14:57:16 +08:00
binbin Deng
b7ae572ac3
LLM: update Alpaca QLoRA finetuning example on GPU ( #9492 )
2023-11-21 14:22:19 +08:00
Wang, Jian4
c5cb3ab82e
LLM : Add CPU alpaca qlora example ( #9469 )
...
* init
* update xpu to cpu
* update
* update readme
* update example
* update
* add refer
* add guide to train different datasets
* update readme
* update
2023-11-21 09:19:58 +08:00
binbin Deng
96fd26759c
LLM: fix QLoRA finetuning example on CPU ( #9489 )
2023-11-20 14:31:24 +08:00
Xin Qiu
50b01058f1
enable new q4_1 ( #9479 )
2023-11-17 14:58:57 +08:00
binbin Deng
3dac21ac7b
LLM: add more example usages about alpaca qlora on different hardware ( #9458 )
2023-11-17 09:56:43 +08:00
Heyang Sun
921b263d6a
update deepspeed install and run guide in README ( #9441 )
2023-11-17 09:11:39 +08:00
Zhao Changmin
30abd304a7
LLM: Fix baichuan pre-normalize model tensor assigning issue when loading ( #9481 )
...
* No need to normalized when loading
2023-11-16 21:57:28 +08:00
WeiguangHan
bc06bec90e
LLM: modify the script to generate html results more accurately ( #9445 )
...
* modify the script to generate html results more accurately
* resolve some comments
* revert some codes
2023-11-16 19:50:23 +08:00
Ruonan Wang
c0ef70df02
llm: quick fix of fast_rms_norm ( #9480 )
2023-11-16 14:42:16 +08:00
Yina Chen
d5263e6681
Add awq load support ( #9453 )
...
* Support directly loading GPTQ models from huggingface
* fix style
* fix tests
* change example structure
* address comments
* fix style
* init
* address comments
* add examples
* fix style
* fix style
* fix style
* fix style
* update
* remove
* meet comments
* fix style
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com>
2023-11-16 14:06:25 +08:00
Ruonan Wang
d2c064124a
LLM: update rms related usage to suport ipex 2.1 new api ( #9466 )
...
* update rms related usage
* fix style
2023-11-16 11:21:50 +08:00
Yuwen Hu
731b0aaade
Empty cache after embedding to cpu ( #9477 )
2023-11-16 10:52:30 +08:00
WeiguangHan
c487b53f21
LLM: only run arc perf test nightly ( #9448 )
...
* LLM: only run arc perf test nightly
* deleted unused python scripts
* rebase main
2023-11-15 19:38:14 +08:00
WeiguangHan
0d55bbd9f1
LLM: ajust the order of some models ( #9470 )
2023-11-15 17:04:59 +08:00
Xin Qiu
170e0072af
chatglm2 correctness test ( #9450 )
...
* chatglm2 ut
* some update
* chatglm2 path
* fix
* add print
2023-11-15 15:44:56 +08:00
Ruonan Wang
0f82b8c3a0
LLM: update qlora example ( #9454 )
...
* update qlora example
* fix loss=0
2023-11-15 09:24:15 +08:00
Chen, Zhentao
dbbdb53a18
fix multiple gpu usage ( #9459 )
2023-11-14 17:06:27 +08:00
Chen, Zhentao
d19ca21957
patch bigdl-llm model to harness by binding instead of patch file ( #9420 )
...
* add run_llb.py
* fix args interpret
* modify outputs
* update workflow
* add license
* test mixed 4 bit
* update readme
* use autotokenizer
* add timeout
* refactor workflow file
* fix working directory
* fix env
* throw exception if some jobs failed
* improve terminal outputs
* Disable var which cause the run stuck
* fix unknown precision
* fix key error
* directly output config instead
* rm harness submodule
2023-11-14 12:51:39 +08:00
Yang Wang
51d07a9fd8
Support directly loading gptq models from huggingface ( #9391 )
...
* Support directly loading GPTQ models from huggingface
* fix style
* fix tests
* change example structure
* address comments
* fix style
* address comments
2023-11-13 20:48:12 -08:00
WeiguangHan
d109275333
temporarily disable the test of some models ( #9434 )
2023-11-13 18:50:53 +08:00
Chen, Zhentao
0ecb9efb05
use AutoTokenizer to enable more models ( #9446 )
2023-11-13 17:47:43 +08:00
Cengguang Zhang
ece5805572
LLM: add chatglm3-6b to latency benchmark test. ( #9442 )
2023-11-13 17:24:37 +08:00
Chen, Zhentao
5747e2fe69
fix multiple gpu usage of harness ( #9444 )
2023-11-13 16:53:23 +08:00
Heyang Sun
da6bbc8c11
fix deepspeed dependencies to install ( #9400 )
...
* remove reductant parameter from deepspeed install
* Update install.sh
* Update install.sh
2023-11-13 16:42:50 +08:00
Yuwen Hu
4faf5af8f1
[LLM] Add perf test for core on Windows ( #9397 )
...
* temporary stop other perf test
* Add framework for core performance test with one test model
* Small fix and add platform control
* Comment out lp for now
* Add missing ymal file
* Small fix
* Fix sed contents
* Small fix
* Small path fixes
* Small fix
* Add update to ftp
* Small upload fix
* add chatglm3-6b
* LLM: add model names
* Keep repo id same as ftp and temporary make baichuan2 first priority
* change order
* Remove temp if false and separate pr and nightly results
* Small fix
---------
Co-authored-by: jinbridge <2635480475@qq.com>
2023-11-13 13:58:40 +08:00
Zheng, Yi
9b5d0e9c75
Add examples for Yi-6B ( #9421 )
2023-11-13 10:53:15 +08:00
SONG Ge
2888818b3a
[LLM] Support mixed_fp8 on Arc ( #9415 )
...
* ut gpu allocation memory fix
* support mix_8bit on arc
* rename mixed_4bit to mixed_fp4 and mixed_8bit to mixed_fp8
* revert unexpected changes
* revert unexpected changes
* unify common logits
* rename in llm xmx_checker
* fix typo error and re-unify
2023-11-13 09:26:30 +08:00
Wang, Jian4
ac7fbe77e2
Update qlora readme ( #9416 )
2023-11-12 19:29:29 +08:00
Yining Wang
d7334513e1
codeshell: fix wrong links ( #9417 )
2023-11-12 19:22:33 +08:00
Zheng, Yi
0674146cfb
Add cpu and gpu examples of distil-whisper ( #9374 )
...
* Add distil-whisper examples
* Fixes based on comments
* Minor fixes
---------
Co-authored-by: Ariadne330 <wyn2000330@126.com>
2023-11-10 16:09:55 +08:00
Ziteng Zhang
ad81b5d838
Update qlora README.md ( #9422 )
2023-11-10 15:19:25 +08:00
Heyang Sun
b23b91407c
fix llm-init on deepspeed missing lib ( #9419 )
2023-11-10 13:51:24 +08:00
SONG Ge
dfb00e37e9
[LLM] Add model correctness test on ARC for llama and falcon ( #9347 )
...
* add correctness test on arc for llama model
* modify layer name
* add falcon ut
* refactor and add ut for falcon model
* modify lambda positions and update docs
* replace loading pre input with last decodelayer output
* switch lower bound to single model instead of using the common one
* make the code implementation simple
* fix gpu action allocation memory issue
2023-11-10 13:48:57 +08:00
dingbaorong
36fbe2144d
Add CPU examples of fuyu ( #9393 )
...
* add fuyu cpu examples
* add gpu example
* add comments
* add license
* remove gpu example
* fix inference time
2023-11-09 15:29:19 +08:00
Heyang Sun
df8e4d7889
[LLM] apply allreduce and bias to training in LowBitLinear ( #9395 )
2023-11-09 14:35:54 +08:00
Wang, Jian4
40cead6b5b
LLM: Fix CPU qlora dtype convert issue ( #9394 )
2023-11-09 14:34:01 +08:00
WeiguangHan
34449cb4bb
LLM: add remaining models to the arc perf test ( #9384 )
...
* add remaining models
* modify the filepath which stores the test result on ftp server
* resolve some comments
2023-11-09 14:28:42 +08:00
Ruonan Wang
bfca76dfa7
LLM: optimize QLoRA by updating lora convert logic ( #9372 )
...
* update convert logic of qlora
* update
* refactor and further improve performance
* fix style
* meet code review
2023-11-08 17:46:49 +08:00
binbin Deng
54d95e4907
LLM: add alpaca qlora finetuning example ( #9276 )
2023-11-08 16:25:17 +08:00
binbin Deng
97316bbb66
LLM: highlight transformers version requirement in mistral examples ( #9380 )
2023-11-08 16:05:03 +08:00
Ruonan Wang
7e8fb29b7c
LLM: optimize QLoRA by reducing convert time ( #9370 )
2023-11-08 13:14:34 +08:00
Chen, Zhentao
298b64217e
add auto triggered acc test ( #9364 )
...
* add auto triggered acc test
* use llama 7b instead
* fix env
* debug download
* fix download prefix
* add cut dirs
* fix env of model path
* fix dataset download
* full job
* source xpu env vars
* use matrix to trigger model run
* reset batch=1
* remove redirect
* remove some trigger
* add task matrix
* add precision list
* test llama-7b-chat
* use /mnt/disk1 to store model and datasets
* remove installation test
* correct downloading path
* fix HF vars
* add bigdl-llm env vars
* rename file
* fix hf_home
* fix script path
* rename as harness evalution
* rerun
2023-11-08 10:22:27 +08:00
Yishuo Wang
bfd9f88f0d
[LLM] Use fp32 as dtype when batch_size <=8 and qtype is q4_0/q8_0/fp8 ( #9365 )
2023-11-08 09:54:53 +08:00
WeiguangHan
84ab614aab
LLM: add more models and skip runtime error ( #9349 )
...
* add more models and skip runtime error
* upgrade transformers
* temporarily removed Mistral-7B-v0.1
* temporarily disable the upload of arc perf result
2023-11-08 09:45:53 +08:00
Heyang Sun
fae6db3ddc
[LLM] refactor cpu low-bit forward logic ( #9366 )
...
* [LLM] refactor cpu low-bit forward logic
* fix style
* Update low_bit_linear.py
* Update low_bit_linear.py
* refine
2023-11-07 15:09:16 +08:00
Heyang Sun
af94058203
[LLM] Support CPU deepspeed distributed inference ( #9259 )
...
* [LLM] Support CPU Deepspeed distributed inference
* Update run_deepspeed.py
* Rename
* fix style
* add new codes
* refine
* remove annotated codes
* refine
* Update README.md
* refine doc and example code
2023-11-06 17:56:42 +08:00
Jin Qiao
f9bf5382ff
Fix: add aquila2 in README ( #9362 )
2023-11-06 16:37:57 +08:00
Jin Qiao
e6b6afa316
LLM: add aquila2 model example ( #9356 )
2023-11-06 15:47:39 +08:00
Xin Qiu
1420e45cc0
Chatglm2 rope optimization on xpu ( #9350 )
2023-11-06 13:56:34 +08:00
Yining Wang
9377b9c5d7
add CodeShell CPU example ( #9345 )
...
* add CodeShell CPU example
* fix some problems
2023-11-03 13:15:54 +08:00
ZehuaCao
ef83c3302e
Use to test llm-performance on spr-perf ( #9316 )
...
* Update llm_performance_tests.yml
* Update llm_performance_tests.yml
* Update action.yml
* Create cpu-perf-test.yaml
* Update action.yml
* Update action.yml
* Update llm_performance_tests.yml
* Update llm_performance_tests.yml
* Update llm_performance_tests.yml
* Update llm_performance_tests.yml
* Update llm_performance_tests.yml
* Update llm_performance_tests.yml
* Update llm_performance_tests.yml
* Update llm_performance_tests.yml
* Update llm_performance_tests.yml
* Update llm_performance_tests.yml
* Update llm_performance_tests.yml
2023-11-03 11:17:16 +08:00
Yuwen Hu
a0150bb205
[LLM] Move embedding layer to CPU for iGPU inference ( #9343 )
...
* Move embedding layer to CPU for iGPU llm inference
* Empty cache after to cpu
* Remove empty cache as it seems to have some negative effect to first token
2023-11-03 11:13:45 +08:00
Cheen Hau, 俊豪
8f23fb04dc
Add inference test for Whisper model on Arc ( #9330 )
...
* Add inference test for Whisper model
* Remove unnecessary inference time measurement
2023-11-03 10:15:52 +08:00
Zheng, Yi
63411dff75
Add cpu examples of WizardCoder ( #9344 )
...
* Add wizardcoder example
* Minor fixes
2023-11-02 20:22:43 +08:00
dingbaorong
2e3bfbfe1f
Add internlm_xcomposer cpu examples ( #9337 )
...
* add internlm-xcomposer cpu examples
* use chat
* some fixes
* add license
* address shengsheng's comments
* use demo.jpg
2023-11-02 15:50:02 +08:00
Jin Qiao
97a38958bd
LLM: add CodeLlama CPU and GPU examples ( #9338 )
...
* LLM: add codellama CPU pytorch examples
* LLM: add codellama CPU transformers examples
* LLM: add codellama GPU transformers examples
* LLM: add codellama GPU pytorch examples
* LLM: add codellama in readme
* LLM: add LLaVA link
2023-11-02 15:34:25 +08:00
Chen, Zhentao
d4dffbdb62
Merge harness ( #9319 )
...
* add harness patch and llb script
* add readme
* add license
* use patch instead
* update readme
* rename tests to evaluation
* fix typo
* remove nano dependency
* add original harness link
* rename title of usage
* rename BigDLGPULM as BigDLLM
* empty commit to rerun job
2023-11-02 15:14:19 +08:00
Zheng, Yi
63b2556ce2
Add cpu examples of skywork ( #9340 )
2023-11-02 15:10:45 +08:00
dingbaorong
f855a864ef
add llava gpu example ( #9324 )
...
* add llava gpu example
* use 7b model
* fix typo
* add in README
2023-11-02 14:48:29 +08:00
Ziteng Zhang
dd3cf2f153
LLM: Add python 3.10 & 3.11 UT
...
LLM: Add python 3.10 & 3.11 UT
2023-11-02 14:09:29 +08:00
Wang, Jian4
149146004f
LLM: Add qlora finetunning CPU example ( #9275 )
...
* add qlora finetunning example
* update readme
* update example
* remove merge.py and update readme
2023-11-02 09:45:42 +08:00
WeiguangHan
9722e811be
LLM: add more models to the arc perf test ( #9297 )
...
* LLM: add more models to the arc perf test
* remove some old models
* install some dependencies
2023-11-01 16:56:32 +08:00
Jin Qiao
6a128aee32
LLM: add ui for portable-zip ( #9262 )
2023-11-01 15:36:59 +08:00
Jasonzzt
cb7ef38e86
rerun
2023-11-01 15:30:34 +08:00
Jasonzzt
ba148ff3ff
test py311
2023-11-01 14:08:49 +08:00
Yishuo Wang
726203d778
[LLM] Replace Embedding layer to fix it on CPU ( #9254 )
2023-11-01 13:58:10 +08:00
Jasonzzt
7c7a7f2ec1
spr & arc ut with python3,9&3.10&3.11
2023-11-01 13:17:13 +08:00
Yang Wang
e1bc18f8eb
fix import ipex problem ( #9323 )
...
* fix import ipex problem
* fix style
2023-10-31 20:31:34 -07:00
Cengguang Zhang
9f3d4676c6
LLM: Add qwen-vl gpu example ( #9290 )
...
* create qwen-vl gpu example.
* add readme.
* fix.
* change input figure and update outputs.
* add qwen-vl pytorch model gpu example.
* fix.
* add readme.
2023-11-01 11:01:39 +08:00
Ruonan Wang
7e73c354a6
LLM: decoupling bigdl-llm and bigdl-nano ( #9306 )
2023-11-01 11:00:54 +08:00
Yina Chen
2262ae4d13
Support MoFQ4 on arc ( #9301 )
...
* init
* update
* fix style
* fix style
* fix style
* meet comments
2023-11-01 10:59:46 +08:00
binbin Deng
8ef8e25178
LLM: improve response speed in multi-turn chat ( #9299 )
...
* update
* fix stop word and add chatglm2 support
* remove system prompt
2023-11-01 10:30:44 +08:00
Cengguang Zhang
d4ab5904ef
LLM: Add python 3.10 llm UT ( #9302 )
...
* add py310 test for llm-unit-test.
* add py310 llm-unit-tests
* add llm-cpp-build-py310
* test
* test
* test.
* test
* test
* fix deactivate.
* fix
* fix.
* fix
* test
* test
* test
* add build chatglm for win.
* test.
* fix
2023-11-01 10:15:32 +08:00
WeiguangHan
03aa368776
LLM: add the comparison between latest arc perf test and last one ( #9296 )
...
* add the comparison between latest test and last one to html
* resolve some comments
* modify some code logics
2023-11-01 09:53:02 +08:00
Jin Qiao
96f8158fe2
LLM: adjust dolly v2 GPU example README ( #9318 )
2023-11-01 09:50:22 +08:00
Jin Qiao
c44c6dc43a
LLM: add chatglm3 examples ( #9305 )
2023-11-01 09:50:05 +08:00
Xin Qiu
06447a3ef6
add malloc and intel openmp to llm deps ( #9322 )
2023-11-01 09:47:45 +08:00
Cheen Hau, 俊豪
d638b93dfe
Add test script and workflow for qlora fine-tuning ( #9295 )
...
* Add test script and workflow for qlora fine-tuning
* Test fix export model
* Download dataset
* Fix export model issue
* Reduce number of training steps
* Rename script
* Correction
2023-11-01 09:39:53 +08:00
Ruonan Wang
d383ee8efb
LLM: update QLoRA example about accelerate version( #9314 )
2023-10-31 13:54:38 +08:00
Cheen Hau, 俊豪
cee9eaf542
[LLM] Fix llm arc ut oom ( #9300 )
...
* Move model to cpu after testing so that gpu memory is deallocated
* Add code comment
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2023-10-30 14:38:34 +08:00
dingbaorong
ee5becdd61
use coco image in Qwen-VL ( #9298 )
...
* use coco image
* add output
* address yuwen's comments
2023-10-30 14:32:35 +08:00
Yang Wang
163d033616
Support qlora in CPU ( #9233 )
...
* support qlora in CPU
* revert example
* fix style
2023-10-27 14:01:15 -07:00
Yang Wang
8838707009
Add deepspeed autotp example readme ( #9289 )
...
* Add deepspeed autotp example readme
* change word
2023-10-27 13:04:38 -07:00
dingbaorong
f053688cad
add cpu example of LLaVA ( #9269 )
...
* add LLaVA cpu example
* Small text updates
* update link
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2023-10-27 18:59:20 +08:00
Zheng, Yi
7f2ad182fd
Minor Fixes of README ( #9294 )
2023-10-27 18:25:46 +08:00
Zheng, Yi
1bff54a378
Display demo.jpg n the README.md of HuggingFace Transformers Agent ( #9293 )
...
* Display demo.jpg
* remove demo.jpg
2023-10-27 18:00:03 +08:00
Zheng, Yi
a4a1dec064
Add a cpu example of HuggingFace Transformers Agent (use vicuna-7b-v1.5) ( #9284 )
...
* Add examples of HF Agent
* Modify folder structure and add link of demo.jpg
* Fixes of readme
* Merge applications and Applications
2023-10-27 17:14:12 +08:00
Guoqiong Song
aa319de5e8
Add streaming-llm using llama2 on CPU ( #9265 )
...
Enable streaming-llm to let model take infinite inputs, tested on desktop and SPR10
2023-10-27 01:30:39 -07:00
Cheen Hau, 俊豪
6c9ae420a5
Add regression test for optimize_model on gpu ( #9268 )
...
* Add MPT model to transformer API test
* Add regression test for optimize_model on gpu.
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2023-10-27 09:23:19 +08:00
Cengguang Zhang
44b5fcc190
LLM: fix pretraining_tp argument issue. ( #9281 )
2023-10-26 18:43:58 +08:00
WeiguangHan
6b2a32eba2
LLM: add missing function for PyTorch InternLM model ( #9285 )
2023-10-26 18:05:23 +08:00
Yina Chen
f879c48f98
fp8 convert use ggml code ( #9277 )
2023-10-26 17:03:29 +08:00
Yina Chen
e2264e8845
Support arc fp4 ( #9266 )
...
* support arc fp4
* fix style
* fix style
2023-10-25 15:42:48 +08:00
Cheen Hau, 俊豪
ab40607b87
Enable unit test workflow on Arc ( #9213 )
...
* Add gpu workflow and a transformers API inference test
* Set device-specific env variables in script instead of workflow
* Fix status message
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2023-10-25 15:17:18 +08:00
SONG Ge
160a1e5ee7
[WIP] Add UT for Mistral Optimized Model ( #9248 )
...
* add ut for mistral model
* update
* fix model path
* upgrade transformers version for mistral model
* refactor correctness ut for mustral model
* refactor mistral correctness ut
* revert test_optimize_model back
* remove mistral from test_optimize_model
* add to revert transformers version back to 4.31.0
2023-10-25 15:14:17 +08:00
Yang Wang
067c7e8098
Support deepspeed AutoTP ( #9230 )
...
* Support deepspeed
* add test script
* refactor convert
* refine example
* refine
* refine example
* fix style
* refine example and adapte latest ipex
* fix style
2023-10-24 23:46:28 -07:00
Yining Wang
a6a8afc47e
Add qwen vl CPU example ( #9221 )
...
* eee
* add examples on CPU and GPU
* fix
* fix
* optimize model examples
* add Qwen-VL-Chat CPU example
* Add Qwen-VL CPU example
* fix optimize problem
* fix error
* Have updated, benchmark fix removed from this PR
* add generate API example
* Change formats in qwen-vl example
* Add CPU transformer int4 example for qwen-vl
* fix repo-id problem and add Readme
* change picture url
* Remove unnecessary file
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2023-10-25 13:22:12 +08:00
binbin Deng
f597a9d4f5
LLM: update perf test configuration ( #9264 )
2023-10-25 12:35:48 +08:00
binbin Deng
770ac70b00
LLM: add low_bit option in benchmark scripts ( #9257 )
2023-10-25 10:27:48 +08:00
WeiguangHan
ec9195da42
LLM: using html to visualize the perf result for Arc ( #9228 )
...
* LLM: using html to visualize the perf result for Arc
* deploy the html file
* add python license
* reslove some comments
2023-10-24 18:05:25 +08:00
Jin Qiao
90162264a3
LLM: replace torch.float32 with auto type ( #9261 )
2023-10-24 17:12:13 +08:00
SONG Ge
bd5215d75b
[LLM] Reimplement chatglm fuse rms optimization ( #9260 )
...
* re-implement chatglm rope rms
* update
2023-10-24 16:35:12 +08:00
dingbaorong
5a2ce421af
add cpu and gpu examples of flan-t5 ( #9171 )
...
* add cpu and gpu examples of flan-t5
* address yuwen's comments
* Add explanation why we add modules to not convert
* Refine prompt and add a translation example
* Add a empty line at the end of files
* add examples of flan-t5 using optimize_mdoel api
* address bin's comments
* address binbin's comments
* add flan-t5 in readme
2023-10-24 15:24:01 +08:00
Yining Wang
4a19f50d16
phi-1_5 CPU and GPU examples ( #9173 )
...
* eee
* add examples on CPU and GPU
* fix
* fix
* optimize model examples
* have updated
* Warmup and configs added
* Update two tables
2023-10-24 15:08:04 +08:00
SONG Ge
bfc1e2d733
add fused rms optimization for chatglm model ( #9256 )
2023-10-24 14:40:58 +08:00
Ruonan Wang
b15656229e
LLM: fix benchmark issue ( #9255 )
2023-10-24 14:15:05 +08:00
Guancheng Fu
f37547249d
Refine README/CICD ( #9253 )
2023-10-24 12:56:03 +08:00
binbin Deng
db37edae8a
LLM: update langchain api document page ( #9222 )
2023-10-24 10:13:41 +08:00
Xin Qiu
0c5055d38c
add position_ids and fuse embedding for falcon ( #9242 )
...
* add position_ids for falcon
* add cpu
* add cpu
* add license
2023-10-24 09:58:20 +08:00
Wang, Jian4
c14a61681b
Add load low-bit in model-serving for reduce EPC ( #9239 )
...
* init load low-bit
* fix
* fix
2023-10-23 11:28:20 +08:00
Yina Chen
0383306688
Add arc fp8 support ( #9232 )
...
* add fp8 support
* add log
* fix style
2023-10-20 17:15:07 +08:00
Yang Wang
118249b011
support transformers 4.34+ for llama ( #9229 )
2023-10-19 22:36:30 -07:00
Chen, Zhentao
5850241423
correct Readme GPU example and API docstring ( #9225 )
...
* update readme to correct GPU usage
* update from_pretrained supported low bit options
* fix stype check
2023-10-19 16:08:47 +08:00
WeiguangHan
f87f67ee1c
LLM: arc perf test for some popular models ( #9188 )
2023-10-19 15:56:15 +08:00
Yang Wang
b0ddde0410
Fix removing convert dtype bug ( #9216 )
...
* Fix removing convert dtype bug
* fix style
2023-10-18 11:24:22 -07:00
Ruonan Wang
942d6418e7
LLM: fix chatglm kv cache ( #9215 )
2023-10-18 19:09:53 +08:00
SONG Ge
0765f94770
[LLM] Optimize kv_cache for mistral model family ( #9189 )
...
* add kv_cache optimization for mistral model
* kv_cache optimize for mistral
* update stylr
* update
2023-10-18 15:13:37 +08:00
Ruonan Wang
3555ebc148
LLM: fix wrong length in gptj kv_cache optimization ( #9210 )
...
* fix wrong length in gptj kv cache
* update
2023-10-18 14:59:02 +08:00
Shengsheng Huang
6dad8d16df
optimize NormHead for Baichuan2 ( #9205 )
...
* optimize NormHead for Baichuan2
* fix ut and change name
* rename functions
2023-10-18 14:05:07 +08:00
Jin Qiao
a3b664ed03
LLM: add GPU More-Data-Types and Save/Load example ( #9199 )
2023-10-18 13:13:45 +08:00
WeiguangHan
b9194c5786
LLM: skip some model tests using certain api ( #9163 )
...
* LLM: Skip some model tests using certain api
* initialize variable named result
2023-10-18 09:39:27 +08:00
Ruonan Wang
09815f7064
LLM: fix RMSNorm optimization of Baichuan2-13B/Baichuan-13B ( #9204 )
...
* fix rmsnorm of baichuan2-13B
* update baichuan1-13B too
* fix style
2023-10-17 18:40:34 +08:00
Jin Qiao
d7ce78edf0
LLM: fix portable zip README image link ( #9201 )
...
* LLM: fix portable zip readme img link
* LLM: make README first image center align
2023-10-17 16:38:22 +08:00
Cheen Hau, 俊豪
66c2e45634
Add unit tests for optimized model correctness ( #9151 )
...
* Add test to check correctness of optimized model
* Refactor optimized model test
* Use models in llm-unit-test
* Use AutoTokenizer for bloom
* Print out each passed test
* Remove unused tokenizer from import
2023-10-17 14:46:41 +08:00
Jin Qiao
d946bd7c55
LLM: add CPU More-Data-Types and Save-Load examples ( #9179 )
2023-10-17 14:38:52 +08:00
Ruonan Wang
c0497ab41b
LLM: support kv_cache optimization for Qwen-VL-Chat ( #9193 )
...
* dupport qwen_vl_chat
* fix style
2023-10-17 13:33:56 +08:00
binbin Deng
1cd9ab15b8
LLM: fix ChatGLMConfig check ( #9191 )
2023-10-17 11:52:56 +08:00
Yang Wang
7160afd4d1
Support XPU DDP training and autocast for LowBitMatmul ( #9167 )
...
* support autocast in low bit matmul
* Support XPU DDP training
* fix amp
2023-10-16 20:47:19 -07:00
Ruonan Wang
77afb8796b
LLM: fix convert of chatglm ( #9190 )
2023-10-17 10:48:13 +08:00
dingbaorong
af3b575c7e
expose modules_to_not_convert in optimize_model ( #9180 )
...
* expose modules_to_not_convert in optimize_model
* some fixes
2023-10-17 09:50:26 +08:00
Cengguang Zhang
5ca8a851e9
LLM: add fuse optimization for Mistral. ( #9184 )
...
* add fuse optimization for mistral.
* fix.
* fix
* fix style.
* fix.
* fix error.
* fix style.
* fix style.
2023-10-16 16:50:31 +08:00
Jiao Wang
49e1381c7f
update rope ( #9155 )
2023-10-15 21:51:45 -07:00
Jason Dai
b192a8032c
Update llm-readme ( #9176 )
2023-10-16 10:54:52 +08:00
binbin Deng
a164c24746
LLM: add kv_cache optimization for chatglm2-6b-32k ( #9165 )
2023-10-16 10:43:15 +08:00
Yang Wang
7a2de00b48
Fixes for xpu Bf16 training ( #9156 )
...
* Support bf16 training
* Use a stable transformer version
* remove env
* fix style
2023-10-14 21:28:59 -07:00
Cengguang Zhang
51a133de56
LLM: add fuse rope and norm optimization for Baichuan. ( #9166 )
...
* add fuse rope optimization.
* add rms norm optimization.
2023-10-13 17:36:52 +08:00
Jin Qiao
db7f938fdc
LLM: add replit and starcoder to gpu pytorch model example ( #9154 )
2023-10-13 15:44:17 +08:00
Jin Qiao
797b156a0d
LLM: add dolly-v1 and dolly-v2 to gpu pytorch model example ( #9153 )
2023-10-13 15:43:35 +08:00
Yishuo Wang
259cbb4126
[LLM] add initial bigdl-llm-init ( #9150 )
2023-10-13 15:31:45 +08:00
Cengguang Zhang
433f408081
LLM: Add fuse rope and norm optimization for Aquila. ( #9161 )
...
* add fuse norm optimization.
* add fuse rope optimization
2023-10-13 14:18:37 +08:00
SONG Ge
e7aa67e141
[LLM] Add rope optimization for internlm ( #9159 )
...
* add rope and norm optimization for internlm and gptneox
* revert gptneox back and split with pr#9155 #
* add norm_forward
* style fix
* update
* update
2023-10-13 14:18:28 +08:00
Jin Qiao
f754ab3e60
LLM: add baichuan and baichuan2 to gpu pytorch model example ( #9152 )
2023-10-13 13:44:31 +08:00
Ruonan Wang
b8aee7bb1b
LLM: Fix Qwen kv_cache optimization ( #9148 )
...
* first commit
* ut pass
* accelerate rotate half by using common util function
* fix style
2023-10-12 15:49:42 +08:00
binbin Deng
69942d3826
LLM: fix model check before attention optimization ( #9149 )
2023-10-12 15:21:51 +08:00
JIN Qiao
1a1ddc4144
LLM: Add Replit CPU and GPU example ( #9028 )
2023-10-12 13:42:14 +08:00
JIN Qiao
d74834ff4c
LLM: add gpu pytorch-models example llama2 and chatglm2 ( #9142 )
2023-10-12 13:41:48 +08:00
Ruonan Wang
4f34557224
LLM: support num_beams in all-in-one benchmark ( #9141 )
...
* support num_beams
* fix
2023-10-12 13:35:12 +08:00
Ruonan Wang
62ac7ae444
LLM: fix inaccurate input / output tokens of current all-in-one benchmark ( #9137 )
...
* first fix
* fix all apis
* fix
2023-10-11 17:13:34 +08:00