Cengguang Zhang
cca84b0a64
LLM: update llm benchmark scripts. ( #8943 )
...
* update llm benchmark scripts.
* change tranformer_bf16 to pytorch_autocast_bf16.
* add autocast in transformer int4.
* revert autocast.
* add "pytorch_autocast_bf16" to doc
* fix comments.
2023-09-13 12:23:28 +08:00
SONG Ge
7132ef6081
[LLM Doc] Add optimize_model doc in transformers api ( #8957 )
...
* add optimize in from_pretrained
* add api doc for load_low_bit
* update api docs following comments
* update api docs
* update
* reord comments
2023-09-13 10:42:33 +08:00
Zhao Changmin
c32c260ce2
LLM: Add save/load API in optimize_model to support general pytorch model ( #8956 )
...
* support hf format SL
2023-09-13 10:22:00 +08:00
Ruonan Wang
4de73f592e
LLM: add gpu example of chinese-llama-2-7b ( #8960 )
...
* add gpu example of chinese -llama2
* update model name and link
* update name
2023-09-13 10:16:51 +08:00
Guancheng Fu
0bf5857908
[LLM] Integrate FastChat as a serving framework for BigDL-LLM ( #8821 )
...
* Finish changing
* format
* add licence
* Add licence
* fix
* fix
* Add xpu support for fschat
* Fix patch
* Also install webui dependencies
* change setup.py dependency installs
* fiox
* format
* final test
2023-09-13 09:28:05 +08:00
Yuwen Hu
cb534ed5c4
[LLM] Add Arc demo gif to readme and readthedocs ( #8958 )
...
* Add arc demo in main readme
* Small style fix
* Realize using table
* Update based on comments
* Small update
* Try to solve with height problem
* Small fix
* Update demo for inner llm readme
* Update demo video for readthedocs
* Small fix
* Update based on comments
2023-09-13 09:23:52 +08:00
Zhao Changmin
dcaa4dc130
LLM: Support GQA on llama kvcache ( #8938 )
...
* support GQA
2023-09-12 12:18:40 +08:00
binbin Deng
2d81521019
LLM: add optimize_model examples for llama2 and chatglm ( #8894 )
...
* add llama2 and chatglm optimize_model examples
* update default usage
* update command and some descriptions
* move folder and remove general_int4 descriptions
* change folder name
2023-09-12 10:36:29 +08:00
Zhao Changmin
f00c442d40
fix accelerate ( #8946 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-12 09:27:58 +08:00
Yang Wang
16761c58be
Make llama attention stateless ( #8928 )
...
* Make llama attention stateless
* fix style
* fix chatglm
* fix chatglm xpu
2023-09-11 18:21:50 -07:00
Zhao Changmin
e62eda74b8
refine ( #8912 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-11 16:40:33 +08:00
Yina Chen
df165ad165
init ( #8933 )
2023-09-11 14:30:55 +08:00
Ruonan Wang
b3f5dd5b5d
LLM: update q8 convert xpu&cpu ( #8930 )
2023-09-08 16:01:17 +08:00
Yina Chen
33d75adadf
[LLM]Support q5_0 on arc ( #8926 )
...
* support q5_0
* delete
* fix style
2023-09-08 15:52:36 +08:00
Yuwen Hu
ca35c93825
[LLM] Fix langchain UT ( #8929 )
...
* Change dependency version for langchain uts
* Downgrade pandas version instead; and update example readme accordingly
2023-09-08 13:51:04 +08:00
Xin Qiu
ea0853c0b5
update benchmark_utils readme ( #8925 )
...
* update readme
* meet code review
2023-09-08 10:30:26 +08:00
Yang Wang
ee98cdd85c
Support latest transformer version ( #8923 )
...
* Support latest transformer version
* fix style
2023-09-07 19:01:32 -07:00
Yang Wang
25428b22b4
Fix chatglm2 attention and kv cache ( #8924 )
...
* fix chatglm2 attention
* fix bf16 bug
* make model stateless
* add utils
* cleanup
* fix style
2023-09-07 18:54:29 -07:00
Yina Chen
b209b8f7b6
[LLM] Fix arc qtype != q4_0 generate issue ( #8920 )
...
* Fix arc precision!=q4_0 generate issue
* meet comments
2023-09-07 08:56:36 -07:00
Cengguang Zhang
3d2efe9608
LLM: update llm latency benchmark. ( #8922 )
2023-09-07 19:00:19 +08:00
binbin Deng
7897eb4b51
LLM: add benchmark scripts on GPU ( #8916 )
2023-09-07 18:08:17 +08:00
Xin Qiu
d8a01d7c4f
fix chatglm in run.pu ( #8919 )
2023-09-07 16:44:10 +08:00
Xin Qiu
e9de9d9950
benchmark for native int4 ( #8918 )
...
* native4
* update
* update
* update
2023-09-07 15:56:15 +08:00
Ruonan Wang
c0797ea232
LLM: update setup to specify bigdl-core-xe version ( #8913 )
2023-09-07 15:11:55 +08:00
Ruonan Wang
057e77e229
LLM: update benchmark_utils.py to handle do_sample=True ( #8903 )
2023-09-07 14:20:47 +08:00
Yang Wang
c34400e6b0
Use new layout for xpu qlinear ( #8896 )
...
* use new layout for xpu qlinear
* fix style
2023-09-06 21:55:33 -07:00
Zhao Changmin
8bc1d8a17c
LLM: Fix discards in optimize_model with non-hf models and add openai whisper example ( #8877 )
...
* openai-whisper
2023-09-07 10:35:59 +08:00
Xin Qiu
5d9942a3ca
transformer int4 and native int4's benchmark script for 32 256 1k 2k input ( #8871 )
...
* transformer
* move
* update
* add header
* update all-in-one
* clean up
2023-09-07 09:49:55 +08:00
Yina Chen
bfc71fbc15
Add known issue in arc voice assistant example ( #8902 )
...
* add known issue in voice assistant example
* update cpu
2023-09-07 09:28:26 +08:00
Yuwen Hu
db26c7b84d
[LLM] Update readme gif & image url to the ones hosted on readthedocs ( #8900 )
2023-09-06 20:04:17 +08:00
SONG Ge
7a71ced78f
[LLM Docs] Remain API Docs Issues Solution ( #8780 )
...
* langchain readthedocs update
* solve langchain.llms.transformersllm issues
* langchain.embeddings.transformersembeddings/transfortmersllms issues
* update docs for get_num_tokens
* add low_bit api doc
* add optimizer model api doc
* update rst index
* fix coomments style
* update docs following the comments
* update api doc
2023-09-06 16:29:34 +08:00
Xin Qiu
49a39452c6
update benchmark ( #8899 )
2023-09-06 15:11:43 +08:00
Kai Huang
4a9ff050a1
Add qlora nf4 ( #8782 )
...
* add nf4
* dequant nf4
* style
2023-09-06 09:39:22 +08:00
xingyuan li
704a896e90
[LLM] Add perf test on xpu for bigdl-llm ( #8866 )
...
* add xpu latency job
* update install way
* remove duplicated workflow
* add perf upload
2023-09-05 17:36:24 +09:00
Zhao Changmin
95271f10e0
LLM: Rename low bit layer ( #8875 )
...
* rename lowbit
---------
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-05 13:21:12 +08:00
Yina Chen
74a2c2ddf5
Update optimize_model=True in llama2 chatglm2 arc examples ( #8878 )
...
* add optimize_model=True in llama2 chatglm2 examples
* add ipex optimize in gpt-j example
2023-09-05 10:35:37 +08:00
Jason Dai
5e58f698cd
Update readthedocs ( #8882 )
2023-09-04 15:42:16 +08:00
Song Jiaming
7b3ac66e17
[LLM] auto performance test fix specific settings to template ( #8876 )
2023-09-01 15:49:04 +08:00
Yang Wang
242c9d6036
Fix chatglm2 multi-turn streamchat ( #8867 )
2023-08-31 22:13:49 -07:00
Song Jiaming
c06f1ca93e
[LLM] auto perf test to output to csv ( #8846 )
2023-09-01 10:48:00 +08:00
Zhao Changmin
9c652fbe95
LLM: Whisper long segment recognize example ( #8826 )
...
* LLM: Long segment recognize example
2023-08-31 16:41:25 +08:00
Yishuo Wang
a232c5aa21
[LLM] add protobuf in bigdl-llm dependency ( #8861 )
2023-08-31 15:23:31 +08:00
xingyuan li
de6c6bb17f
[LLM] Downgrade amx build gcc version and remove avx flag display ( #8856 )
...
* downgrade to gcc 11
* remove avx display
2023-08-31 14:08:13 +09:00
Yang Wang
3b4f4e1c3d
Fix llama attention optimization for XPU ( #8855 )
...
* Fix llama attention optimization fo XPU
* fix chatglm2
* fix typo
2023-08-30 21:30:49 -07:00
Shengsheng Huang
7b566bf686
[LLM] add new API for optimize any pytorch models ( #8827 )
...
* add new API for optimize any pytorch models
* change test util name
* revise API and update UT
* fix python style
* update ut config, change default value
* change defaults, disable ut transcribe
2023-08-30 19:41:53 +08:00
Xin Qiu
8eca982301
windows add env ( #8852 )
2023-08-30 15:54:52 +08:00
Zhao Changmin
731916c639
LLM: Enable attempting loading method automatically ( #8841 )
...
* enable auto load method
* warning error
* logger info
---------
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-08-30 15:41:55 +08:00
Yishuo Wang
bba73ec9d2
[LLM] change chatglm native int4 checkpoint name ( #8851 )
2023-08-30 15:05:19 +08:00
Yina Chen
55e705a84c
[LLM] Support the rest of AutoXXX classes in Transformers API ( #8815 )
...
* add transformers auto models
* fix
2023-08-30 11:16:14 +08:00
Zhao Changmin
887018b0f2
Update ut save&load ( #8847 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-08-30 10:32:57 +08:00
Yina Chen
3462fd5c96
Add arc gpt-j example ( #8840 )
2023-08-30 10:31:24 +08:00
Ruonan Wang
f42c0bad1b
LLM: update GPU doc ( #8845 )
2023-08-30 09:24:19 +08:00
Jason Dai
aab7deab1f
Reorganize GPU examples ( #8844 )
2023-08-30 08:32:08 +08:00
Yang Wang
a386ad984e
Add Data Center GPU Flex Series to Readme ( #8835 )
...
* Add Data Center GPU Flex Series to Readme
* remove
* update starcoder
2023-08-29 11:19:09 -07:00
Yishuo Wang
7429ea0606
[LLM] support transformer int4 + amx int4 ( #8838 )
2023-08-29 17:27:18 +08:00
Ruonan Wang
ddff7a6f05
Update readme of GPU to specify oneapi version( #8820 )
2023-08-29 13:14:22 +08:00
Zhao Changmin
bb31d4fe80
LLM: Implement hf low_cpu_mem_usage with 1xbinary file peak memory on transformer int4 ( #8731 )
...
* 1x peak memory
2023-08-29 09:33:17 +08:00
Yina Chen
35fdf94031
[LLM]Arc starcoder example ( #8814 )
...
* arc starcoder example init
* add log
* meet comments
2023-08-28 16:48:00 +08:00
xingyuan li
6a902b892e
[LLM] Add amx build step ( #8822 )
...
* add amx build step
2023-08-28 17:41:18 +09:00
Ruonan Wang
eae92bc7da
llm: quick fix path ( #8810 )
2023-08-25 16:02:31 +08:00
Ruonan Wang
0186f3ab2f
llm: update all ARC int4 examples ( #8809 )
...
* update GPU examples
* update other examples
* fix
* update based on comment
2023-08-25 15:26:10 +08:00
Song Jiaming
b8b1b6888b
[LLM] Performance test ( #8796 )
2023-08-25 14:31:45 +08:00
Yang Wang
9d0f6a8cce
rename math.py in example to avoid conflict ( #8805 )
2023-08-24 21:06:31 -07:00
SONG Ge
d2926c7672
[LLM] Unify Langchain Native and Transformers LLM API ( #8752 )
...
* deprecate BigDLNativeTransformers and add specific LMEmbedding method
* deprecate and add LM methods for langchain llms
* add native params to native langchain
* new imple for embedding
* move ut from bigdlnative to casual llm
* rename embeddings api and examples update align with usage updating
* docqa example hot-fix
* add more api docs
* add langchain ut for starcoder
* support model_kwargs for transformer methods when calling causalLM and add ut
* ut fix for transformers embedding
* update for langchain causal supporting transformers
* remove model_family in readme doc
* add model_families params to support more models
* update api docs and remove chatglm embeddings for now
* remove chatglm embeddings in examples
* new refactor for ut to add bloom and transformers llama ut
* disable llama transformers embedding ut
2023-08-25 11:14:21 +08:00
binbin Deng
5582872744
LLM: update chatglm example to be more friendly for beginners ( #8795 )
2023-08-25 10:55:01 +08:00
Yina Chen
7c37424a63
Fix voice assistant example input error on Linux ( #8799 )
...
* fix linux error
* update
* remove alsa log
2023-08-25 10:47:27 +08:00
Yang Wang
bf3591e2ff
Optimize chatglm2 for bf16 ( #8725 )
...
* make chatglm works with bf16
* fix style
* support chatglm v1
* fix style
* fix style
* add chatglm2 file
2023-08-24 10:04:25 -07:00
xingyuan li
c94bdd3791
[LLM] Merge windows & linux nightly test ( #8756 )
...
* fix download statement
* add check before build wheel
* use curl to upload files
* windows unittest won't upload converted model
* split llm-cli test into windows & linux versions
* update tempdir create way
* fix nightly converted model name
* windows llm-cli starcoder test temply disabled
* remove taskset dependency
* rename llm_unit_tests_linux to llm_unit_tests
2023-08-23 12:48:41 +09:00
Jason Dai
dcadd09154
Update llm document ( #8784 )
2023-08-21 22:34:44 +08:00
Yishuo Wang
611c1fb628
[LLM] change default n_threads of native int4 langchain API ( #8779 )
2023-08-21 13:30:12 +08:00
Yishuo Wang
3d1f2b44f8
LLM: change default n_threads of native int4 models ( #8776 )
2023-08-18 15:46:19 +08:00
Yishuo Wang
2ba2133613
fix starcoder chinese output ( #8773 )
2023-08-18 13:37:02 +08:00
binbin Deng
548f7a6cf7
LLM: update convert of llama family to support llama2-70B ( #8747 )
2023-08-18 09:30:35 +08:00
Yina Chen
4afea496ab
support q8_0 ( #8765 )
2023-08-17 15:06:36 +08:00
Ruonan Wang
e9aa2bd890
LLM: reduce GPU 1st token latency and update example ( #8763 )
...
* reduce 1st token latency
* update example
* fix
* fix style
* update readme of gpu benchmark
2023-08-16 18:01:23 +08:00
binbin Deng
06609d9260
LLM: add qwen example on arc ( #8757 )
2023-08-16 17:11:08 +08:00
SONG Ge
f4164e4492
[BigDL LLM] Update readme for unifying transformers API ( #8737 )
...
* update readme doc
* fix readthedocs error
* update comment
* update exception error info
* invalidInputError instead
* fix readme typo error and remove import error
* fix more typo
2023-08-16 14:22:32 +08:00
Song Jiaming
c1f9af6d97
[LLM] chatglm example and transformers low-bit examples ( #8751 )
2023-08-16 11:41:44 +08:00
Ruonan Wang
8805186f2f
LLM: add benchmark tool for gpu ( #8760 )
...
* add benchmark tool for gpu
* update
2023-08-16 11:22:10 +08:00
binbin Deng
97283c033c
LLM: add falcon example on arc ( #8742 )
2023-08-15 17:38:38 +08:00
binbin Deng
8c55911308
LLM: add baichuan-13B on arc example ( #8755 )
2023-08-15 15:07:04 +08:00
binbin Deng
be2ae6eb7c
LLM: fix langchain native int4 voiceasistant example ( #8750 )
2023-08-14 17:23:33 +08:00
Ruonan Wang
d28ad8f7db
LLM: add whisper example for arc transformer int4 ( #8749 )
...
* add whisper example for arc int4
* fix
2023-08-14 17:05:48 +08:00
Yishuo Wang
77844125f2
[LLM] Support chatglm cache ( #8745 )
2023-08-14 15:10:46 +08:00
Ruonan Wang
faaccb64a2
LLM: add chatglm2 example for Arc ( #8741 )
...
* add chatglm2 example
* update
* fix readme
2023-08-14 10:43:08 +08:00
binbin Deng
b10d7e1adf
LLM: add mpt example on arc ( #8723 )
2023-08-14 09:40:01 +08:00
binbin Deng
e9a1afffc5
LLM: add internlm example on arc ( #8722 )
2023-08-14 09:39:39 +08:00
SONG Ge
aceea4dc29
[LLM] Unify Transformers and Native API ( #8713 )
...
* re-open pr to run on latest runner
* re-add examples and ut
* rename ut and move deprecate to warning instead of raising an error info
* ut fix
2023-08-11 19:45:47 +08:00
Yishuo Wang
f91035c298
[LLM] fix chatglm native int4 emoji output ( #8739 )
2023-08-11 15:38:41 +08:00
binbin Deng
77efcf7b1d
LLM: fix ChatGLM2 native int4 stream output ( #8733 )
2023-08-11 14:51:50 +08:00
Ruonan Wang
ca3e59a1dc
LLM: support stop for starcoder native int4 stream ( #8734 )
2023-08-11 14:51:30 +08:00
Song Jiaming
e292dfd970
[WIP] LLM transformers api for langchain ( #8642 )
2023-08-11 13:32:35 +08:00
Yishuo Wang
3d5a7484a2
[LLM] fix bloom and starcoder memory release ( #8728 )
2023-08-11 11:18:19 +08:00
xingyuan li
02ec01cb48
[LLM] Add bigdl-core-xe dependency when installing bigdl-llm[xpu] ( #8716 )
...
* add bigdl-core-xe dependency
2023-08-10 17:41:42 +09:00
Shengsheng Huang
7c56c39e36
Fix GPU examples READ to use bigdl-core-xe ( #8714 )
...
* Update README.md
* Update README.md
2023-08-10 12:53:49 +08:00
Yina Chen
6d1ca88aac
add voice assistant example ( #8711 )
2023-08-10 12:42:14 +08:00
Song Jiaming
e717e304a6
LLM first example test and template ( #8658 )
2023-08-10 10:03:11 +08:00
Ruonan Wang
1a7b698a83
[LLM] support ipex arc int4 & add basic llama2 example ( #8700 )
...
* first support of xpu
* make it works on gpu
update setup
update
add GPU llama2 examples
add use_optimize flag to disbale optimize for gpu
fix style
update gpu exmaple readme
fix
* update example, and update env
* fix setup to add cpp files
* replace jit with aot to avoid data leak
* rename to bigdl-core-xe
* update installation in example readme
2023-08-09 22:20:32 +08:00
Jason Dai
d03218674a
Update llm readme ( #8703 )
2023-08-09 14:47:26 +08:00
Kai Huang
1b65288bdb
Add api doc for LLM ( #8605 )
...
* api doc initial
* update desc
2023-08-08 18:17:16 +08:00
binbin Deng
4c44153584
LLM: add Qwen transformers int4 example ( #8699 )
2023-08-08 11:23:09 +08:00
Yishuo Wang
710b9b8982
[LLM] add linux chatglm pybinding binary file ( #8698 )
2023-08-08 11:16:30 +08:00
binbin Deng
ea5d7aff5b
LLM: add chatglm native int4 transformers API ( #8695 )
2023-08-07 17:52:47 +08:00
Yishuo Wang
6da830cf7e
[LLM] add chaglm pybinding binary file in setup.py ( #8692 )
2023-08-07 09:41:03 +08:00
Cengguang Zhang
ebcf75d506
feat: set transformers lib version. ( #8683 )
2023-08-04 15:01:59 +08:00
Yishuo Wang
ef08250c21
[LLM] chatglm pybinding support ( #8672 )
2023-08-04 14:27:29 +08:00
Yishuo Wang
5837cc424a
[LLM] add chatglm pybinding binary file release ( #8677 )
2023-08-04 11:45:27 +08:00
Yang Wang
b6468bac43
optimize chatglm2 long sequence ( #8662 )
...
* add chatglm2
* optimize a little
* optimize chatglm long sequence
* fix style
* address comments and fix style
* fix bug
2023-08-03 17:56:24 -07:00
Yang Wang
3407f87075
Fix llama kv cache bug ( #8674 )
2023-08-03 17:54:55 -07:00
Yina Chen
59903ea668
llm linux support avx & avx2 ( #8669 )
2023-08-03 17:10:59 +08:00
xingyuan li
110cfb5546
[LLM] Remove old windows nightly test code ( #8668 )
...
Remove old Windows nightly test code triggered by task scheduler
Add new Windows nightly workflow for nightly testing
2023-08-03 17:12:23 +09:00
xingyuan li
610084e3c0
[LLM] Complete windows unittest ( #8611 )
...
* add windows nightly test workflow
* use github runner to run pr test
* model load should use lowbit
* remove tmp dir after testing
2023-08-03 14:48:42 +09:00
binbin Deng
a15a2516e6
add ( #8659 )
2023-08-03 10:12:10 +08:00
Xin Qiu
0714888705
build windows avx dll ( #8657 )
...
* windows avx
* add to actions
2023-08-03 02:06:24 +08:00
Yina Chen
119bf6d710
[LLM] Support linux cpp dynamic load .so ( #8655 )
...
* support linux cpp dynamic load .so
* update cli
2023-08-02 20:15:45 +08:00
Zhao Changmin
ca998cc6f2
LLM: Mute shape mismatch output ( #8601 )
...
* LLM: Mute shape mismatch output
2023-08-02 16:46:22 +08:00
Zhao Changmin
04c713ef06
LLM: Disable transformer api pretraining_tp ( #8645 )
...
* disable pretraining_tp
2023-08-02 11:26:01 +08:00
binbin Deng
6fc31bb4cf
LLM: first update descriptions for ChatGLM transformers int4 example ( #8646 )
2023-08-02 11:00:56 +08:00
Yang Wang
cbeae97a26
Optimize Llama Attention to to reduce KV cache memory copy ( #8580 )
...
* Optimize llama attention to reduce KV cache memory copy
* fix bug
* fix style
* remove git
* fix style
* fix style
* fix style
* fix tests
* move llama attention to another file
* revert
* fix style
* remove jit
* fix
2023-08-01 16:37:58 -07:00
binbin Deng
39994738d1
LLM: add chat & stream chat example for ChatGLM2 transformers int4 ( #8636 )
2023-08-01 14:57:45 +08:00
xingyuan li
cdfbe652ca
[LLM] Add chatglm support for llm-cli ( #8641 )
...
* add chatglm build
* add llm-cli support
* update git
* install cmake
* add ut for chatglm
* add files to setup
* fix bug cause permission error when sf lack file
2023-08-01 14:30:17 +09:00
Zhao Changmin
d6cbfc6d2c
LLM: Add requirements in whisper example ( #8644 )
...
* LLM: Add requirements in whisper example
2023-08-01 12:07:14 +08:00
Zhao Changmin
3e10260c6d
LLM: llm-convert support chatglm family ( #8643 )
...
* convert chatglm
2023-08-01 11:16:18 +08:00
Yina Chen
a607972c0b
[LLM]LLM windows load -api.dll ( #8631 )
...
* temp
* update
* revert setup.py
2023-07-31 13:47:20 +08:00
xingyuan li
3361b66449
[LLM] Revert llm-cli to disable selecting executables on Windows ( #8630 )
...
* revert vnni file select
* revert setup.py
* add model-api.dll
2023-07-31 11:15:44 +09:00
binbin Deng
3dbab9087b
LLM: add llama2-7b native int4 example ( #8629 )
2023-07-28 10:56:16 +08:00
binbin Deng
fb32fefcbe
LLM: support tensor input of native int4 generate ( #8620 )
2023-07-27 17:59:49 +08:00
Zhao Changmin
5b484ab48d
LLM: Support load_low_bit loading models in shards format ( #8612 )
...
* shards_model
---------
Co-authored-by: leonardozcm <leonaordo1997zcm@gmail.com>
2023-07-26 13:30:01 +08:00
binbin Deng
fcf8c085e3
LLM: add llama2-13b native int4 example ( #8613 )
2023-07-26 10:12:52 +08:00
Song Jiaming
650b82fa6e
[LLM] add CausalLM and Speech UT ( #8597 )
2023-07-25 11:22:36 +08:00
Zhao Changmin
af201052db
avoid malloc all missing keys in fp32 ( #8600 )
2023-07-25 09:48:51 +08:00
binbin Deng
3f24202e4c
[LLM] Add more transformers int4 example (Llama 2) ( #8602 )
2023-07-25 09:21:12 +08:00
Jason Dai
0f8201c730
llm readme update ( #8595 )
2023-07-24 09:47:49 +08:00
Yuwen Hu
ba42a6da63
[LLM] Set torch_dtype default value to 'auto' for transformers low bit from_pretrained API
2023-07-21 17:55:00 +08:00
Yuwen Hu
bbde423349
[LLM] Add current Linux UT inference tests to nightly tests ( #8578 )
...
* Add current inference uts to nightly tests
* Change test model from chatglm-6b to chatglm2-6b
* Add thread num env variable for nightly test
* Fix urls
* Small fix
2023-07-21 13:26:38 +08:00
Yang Wang
feb3af0567
Optimize transformer int4 memory footprint ( #8579 )
2023-07-20 20:22:13 -07:00
Yang Wang
57e880f63a
[LLM] use pytorch linear for large input matrix ( #8492 )
...
* use pytorch linear for large input matrix
* only works on server
* fix style
* optimize memory
* first check server
* revert
* address comments
* fix style
2023-07-20 09:54:25 -07:00
Yuwen Hu
6504e31a97
Small fix ( #8577 )
2023-07-20 16:37:04 +08:00
Yuwen Hu
2266ca7d2b
[LLM] Small updates to transformers int4 ut ( #8574 )
...
* Small fix to transformers int4 ut
* Small fix
2023-07-20 13:20:25 +08:00
xingyuan li
7b8d9c1b0d
[LLM] Add dependency file check in setup.py ( #8565 )
...
* add package file check
2023-07-20 14:20:08 +09:00
Song Jiaming
411d896636
LLM first transformers UT ( #8514 )
...
* ut
* transformers api first ut
* name
* dir issue
* use chatglm instead of chatglm2
* omp
* set omp in sh
* source
* taskset
* test
* test omp
* add test
2023-07-20 10:16:27 +08:00
Yuwen Hu
cad78740a7
[LLM] Small fixes to the Whisper transformers INT4 example ( #8573 )
...
* Small fixes to the whisper example
* Small fix
* Small fix
2023-07-20 10:11:33 +08:00
binbin Deng
7a9fdf74df
[LLM] Add more transformers int4 example (Dolly v2) ( #8571 )
...
* add
* add trust_remote_mode
2023-07-19 18:20:16 +08:00
Zhao Changmin
e680af45ea
LLM: Optimize Langchain Pipeline ( #8561 )
...
* LLM: Optimize Langchain Pipeline
* load in low bit
2023-07-19 17:43:13 +08:00
Shengsheng Huang
616b7cb0a2
add more langchain examples ( #8542 )
...
* update langchain descriptions
* add mathchain example
* update readme
* update readme
2023-07-19 17:42:18 +08:00
binbin Deng
457571b44e
[LLM] Add more transformers int4 example (InternLM) ( #8557 )
2023-07-19 15:15:38 +08:00
xingyuan li
b6510fa054
fix move/download dll step ( #8564 )
2023-07-19 12:17:07 +09:00
xingyuan li
c52ed37745
fix starcoder dll name ( #8563 )
2023-07-19 11:55:06 +09:00
Zhao Changmin
3dbe3bf18e
transformer_int4 ( #8553 )
2023-07-19 08:33:58 +08:00
Zhao Changmin
49d636e295
[LLM] whisper model transformer int4 verification and example ( #8511 )
...
* LLM: transformer api support
* va
* example
* revert
* pep8
* pep8
2023-07-19 08:33:20 +08:00
Yina Chen
9a7bc17ca1
[LLM] llm supports vnni link on windows ( #8543 )
...
* support win vnni link
* fix style
* fix style
* use isa_checker
* fix
* typo
* fix
* update
2023-07-18 16:43:45 +08:00
Yina Chen
4582b6939d
[LLM]llm gptneox chat ( #8527 )
...
* linux
* support win
* merge upstream & support vnni lib in chat
2023-07-18 11:17:17 +08:00
Jason Dai
1ebc43b151
Update READMEs ( #8554 )
2023-07-18 11:06:06 +08:00
Yuwen Hu
ee70977c07
[LLM] Transformers int4 example small typo fixes ( #8550 )
2023-07-17 18:15:32 +08:00
Yuwen Hu
1344f50f75
[LLM] Add more transformers int4 examples (Falcon) ( #8546 )
...
* Initial commit
* Add Falcon examples and other small fix
* Small fix
* Small fix
* Update based on comments
* Small fix
2023-07-17 17:36:21 +08:00
Yuwen Hu
de772e7a80
Update mpt for prompt tuning ( #8547 )
2023-07-17 17:33:54 +08:00
binbin Deng
f1fd746722
[LLM] Add more transformers int4 example (vicuna) ( #8544 )
2023-07-17 16:59:55 +08:00
Xin Qiu
fccae91461
Add load_low_bit save_load_bit to AutoModelForCausalLM ( #8531 )
...
* transformers save_low_bit load_low_bit
* update example and add readme
* update
* update
* update
* add ut
* update
2023-07-17 15:29:55 +08:00
binbin Deng
808a64d53a
[LLM] Add more transformers int4 example (starcoder) ( #8540 )
2023-07-17 14:41:19 +08:00
xingyuan li
e57db777e0
[LLM] Setup.py & llm-cli update for windows vnni binary files ( #8537 )
...
* update setup.py
* update llm-cli
2023-07-17 12:28:38 +09:00
binbin Deng
f56b5ade4c
[LLM] Add more transformers int4 example (chatglm2) ( #8539 )
2023-07-14 17:58:33 +08:00
binbin Deng
92d33cf35a
[LLM] Add more transformers int4 example (phoenix) ( #8520 )
2023-07-14 17:58:04 +08:00
Yuwen Hu
e0f0def279
Remove unused example for now ( #8538 )
2023-07-14 17:32:50 +08:00
binbin Deng
b397e40015
[LLM] Add more transformers int4 example (RedPajama) ( #8523 )
2023-07-14 17:30:28 +08:00
Yuwen Hu
7bf3e10415
[LLM] Add more int4 transformers examples (MOSS) ( #8532 )
...
* Add Moss example
* Small fix
2023-07-14 16:41:41 +08:00
Yuwen Hu
59b7287ef5
[LLM] Add more transformers int4 example (Baichuan) ( #8522 )
...
* Add example model Baichuan
* Small updates to client windows settings
* Small refactor
* Small fix
2023-07-14 16:41:29 +08:00
Yuwen Hu
ca6e38607c
[LLM] Add more transformers examples (ChatGLM) ( #8521 )
...
* Add example for chatglm v1 and other small fixes
* Small fix
* Small further fix
* Small fix
* Update based on comments & updates for client windows recommended settingts
* Small fix
* Small refactor
* Small fix
* Small fix
* Small fix to dolly v1
* Small fix
2023-07-14 16:41:13 +08:00
xingyuan li
c87853233b
[LLM] Add windows vnni binary build step ( #8518 )
...
* add windows vnni build step
* update build info
* add download command
2023-07-14 17:24:39 +09:00
Yishuo Wang
6320bf201e
LLM: fix memory access violation ( #8519 )
2023-07-13 17:08:08 +08:00
xingyuan li
60c2c0c3dc
Bug fix for merged pr #8503 ( #8516 )
2023-07-13 17:26:30 +09:00
Yuwen Hu
349bcb4bae
[LLM] Add more transformers int4 example (Dolly v1) ( #8517 )
...
* Initial commit for dolly v1
* Add example for Dolly v1 and other small fix
* Small output updates
* Small fix
* fix based on comments
2023-07-13 16:13:47 +08:00
Xin Qiu
90e3d86bce
rename low bit type name ( #8512 )
...
* change qx_0 to sym_intx
* update
* fix typo
* update
* fix type
* fix style
* add python doc
* meet code review
* fix style
2023-07-13 15:53:31 +08:00
xingyuan li
4f152b4e3a
[LLM] Merge the llm.cpp build and the pypi release ( #8503 )
...
* checkout llm.cpp to build new binary
* use artifact to get latest built binary files
* rename quantize
* modify all release workflow
2023-07-13 16:34:24 +09:00
Yuwen Hu
bcde8ec83e
[LLM] Small fix to MPT Example ( #8513 )
2023-07-13 14:33:21 +08:00
Zhao Changmin
ba0da17b40
LLM: Support AutoModelForSeq2SeqLM transformer API ( #8449 )
...
* LLM: support AutoModelForSeq2SeqLM transformer API
2023-07-13 13:33:51 +08:00
Yishuo Wang
86b5938075
LLM: fix llm pybinding ( #8509 )
2023-07-13 10:27:08 +08:00
Yuwen Hu
fcc352eee3
[LLM] Add more transformers_int4 examples (MPT) ( #8498 )
...
* Update transformers_int4 readme, and initial commit for mpt
* Update example for mpt
* Small fix and recover transformers_int4_pipeline_readme.md for now
* Update based on comments
* Small fix
* Small fix
* Update based on comments
2023-07-13 09:41:16 +08:00
Zhao Changmin
23f6a4c21f
LLM: Optimize transformer int4 loading ( #8499 )
...
* LLM: Optimize transformer int4 loading
2023-07-12 15:25:42 +08:00
Yishuo Wang
dd3f953288
Support vnni check ( #8497 )
2023-07-12 10:11:15 +08:00
Xin Qiu
cd7a980ec4
Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 ( #8481 )
...
* quant in Q4 5 8
* meet code review
* update readme
* style
* update
* fix error
* fix error
* update
* fix style
* update
* Update README.md
* Add load_in_low_bit
2023-07-12 08:23:08 +08:00
Yishuo Wang
db39d0a6b3
LLM: disable mmap by default for better performance ( #8467 )
2023-07-11 09:26:26 +08:00
Yuwen Hu
52c6b057d6
Initial LLM Transformers example refactor ( #8491 )
2023-07-10 17:53:57 +08:00
Junwei Deng
254a7aa3c4
bigdl-llm: add voice-assistant example that are migrated from langchain use-case document ( #8468 )
2023-07-10 16:51:45 +08:00
Yishuo Wang
98bac815e4
specify numpy version ( #8489 )
2023-07-10 16:50:16 +08:00
Zhao Changmin
81d655cda9
LLM: transformer int4 save and load ( #8462 )
...
* LLM: transformer int4 save and load
2023-07-10 16:34:41 +08:00
binbin Deng
d489775d2c
LLM: fix inconsistency between output token number and max_new_token ( #8479 )
2023-07-07 17:31:05 +08:00
Jason Dai
bcc1eae322
Llm readme update ( #8472 )
2023-07-06 20:04:04 +08:00
Ruonan Wang
2f77d485d8
Llm: Initial support of langchain transformer int4 API ( #8459 )
...
* first commit of transformer int4 and pipeline
* basic examples
temp save for embeddings
support embeddings and docqa exaple
* fix based on comment
* small fix
2023-07-06 17:50:05 +08:00
binbin Deng
14626fe05b
LLM: refactor transformers and langchain class name ( #8470 )
2023-07-06 17:16:44 +08:00
binbin Deng
70bc8ea8ae
LLM: update langchain and cpp-python style API examples ( #8456 )
2023-07-06 14:36:42 +08:00
Ruonan Wang
64b38e1dc8
llm: benchmark tool for transformers int4 (separate 1st token and rest) ( #8460 )
...
* add benchmark utils
* fix
* fix bug and add readme
* hidden latency data
2023-07-06 09:49:52 +08:00
binbin Deng
77808fa124
LLM: fix n_batch in starcoder pybinding ( #8461 )
2023-07-05 17:06:50 +08:00
Yina Chen
f2bb469847
[WIP] LLm llm-cli chat mode ( #8440 )
...
* fix timezone
* temp
* Update linux interactive mode
* modify init text for interactive mode
* meet comments
* update
* win script
* meet comments
2023-07-05 14:04:17 +08:00
binbin Deng
1970bcf14e
LLM: add readme for transformer examples ( #8444 )
2023-07-04 17:25:58 +08:00
binbin Deng
e54e52b438
LLM: fix n_batch in bloom pybinding ( #8454 )
2023-07-04 15:10:32 +08:00
Yuwen Hu
372c775cb4
[LLM] Change default runner for LLM Linux tests to the ones with AVX512 ( #8448 )
...
* Basic change for AVX512 runner
* Remove conda channel and action rename
* Small fix
* Small fix and reduce peak convert disk space
* Define n_threads based on runner status
* Small thread num fix
* Define thread_num for cli
* test
* Add self-hosted label and other small fix
2023-07-04 14:53:03 +08:00
Jason Dai
edf23a95be
Update llm readme ( #8446 )
2023-07-03 16:58:44 +08:00
Jason Dai
a38f927fc0
Update README.md ( #8439 )
2023-07-03 14:59:55 +08:00
binbin Deng
c956a46c40
LLM: first fix example/transformers ( #8438 )
2023-07-03 14:13:33 +08:00
Jason Dai
e5b384aaa2
Update README.md ( #8437 )
2023-07-03 10:54:29 +08:00