Commit graph

429 commits

Author SHA1 Message Date
Cengguang Zhang
cca84b0a64 LLM: update llm benchmark scripts. (#8943)
* update llm benchmark scripts.

* change tranformer_bf16 to pytorch_autocast_bf16.

* add autocast in transformer int4.

* revert autocast.

* add "pytorch_autocast_bf16" to doc

* fix comments.
2023-09-13 12:23:28 +08:00
SONG Ge
7132ef6081 [LLM Doc] Add optimize_model doc in transformers api (#8957)
* add optimize in from_pretrained

* add api doc for load_low_bit

* update api docs following comments

* update api docs

* update

* reord comments
2023-09-13 10:42:33 +08:00
Zhao Changmin
c32c260ce2 LLM: Add save/load API in optimize_model to support general pytorch model (#8956)
* support hf format SL
2023-09-13 10:22:00 +08:00
Ruonan Wang
4de73f592e LLM: add gpu example of chinese-llama-2-7b (#8960)
* add gpu example of chinese -llama2

* update model name and link

* update name
2023-09-13 10:16:51 +08:00
Guancheng Fu
0bf5857908 [LLM] Integrate FastChat as a serving framework for BigDL-LLM (#8821)
* Finish changing

* format

* add licence

* Add licence

* fix

* fix

* Add xpu support for fschat

* Fix patch

* Also install webui dependencies

* change setup.py dependency installs

* fiox

* format

* final test
2023-09-13 09:28:05 +08:00
Yuwen Hu
cb534ed5c4 [LLM] Add Arc demo gif to readme and readthedocs (#8958)
* Add arc demo in main readme

* Small style fix

* Realize using table

* Update based on comments

* Small update

* Try to solve with height problem

* Small fix

* Update demo for inner llm readme

* Update demo video for readthedocs

* Small fix

* Update based on comments
2023-09-13 09:23:52 +08:00
Zhao Changmin
dcaa4dc130 LLM: Support GQA on llama kvcache (#8938)
* support GQA
2023-09-12 12:18:40 +08:00
binbin Deng
2d81521019 LLM: add optimize_model examples for llama2 and chatglm (#8894)
* add llama2 and chatglm optimize_model examples

* update default usage

* update command and some descriptions

* move folder and remove general_int4 descriptions

* change folder name
2023-09-12 10:36:29 +08:00
Zhao Changmin
f00c442d40 fix accelerate (#8946)
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-12 09:27:58 +08:00
Yang Wang
16761c58be Make llama attention stateless (#8928)
* Make llama attention stateless

* fix style

* fix chatglm

* fix chatglm xpu
2023-09-11 18:21:50 -07:00
Zhao Changmin
e62eda74b8 refine (#8912)
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-11 16:40:33 +08:00
Yina Chen
df165ad165 init (#8933) 2023-09-11 14:30:55 +08:00
Ruonan Wang
b3f5dd5b5d LLM: update q8 convert xpu&cpu (#8930) 2023-09-08 16:01:17 +08:00
Yina Chen
33d75adadf [LLM]Support q5_0 on arc (#8926)
* support q5_0

* delete

* fix style
2023-09-08 15:52:36 +08:00
Yuwen Hu
ca35c93825 [LLM] Fix langchain UT (#8929)
* Change dependency version for langchain uts

* Downgrade pandas version instead; and update example readme accordingly
2023-09-08 13:51:04 +08:00
Xin Qiu
ea0853c0b5 update benchmark_utils readme (#8925)
* update readme

* meet code review
2023-09-08 10:30:26 +08:00
Yang Wang
ee98cdd85c Support latest transformer version (#8923)
* Support latest transformer version

* fix style
2023-09-07 19:01:32 -07:00
Yang Wang
25428b22b4 Fix chatglm2 attention and kv cache (#8924)
* fix chatglm2 attention

* fix bf16 bug

* make model stateless

* add utils

* cleanup

* fix style
2023-09-07 18:54:29 -07:00
Yina Chen
b209b8f7b6 [LLM] Fix arc qtype != q4_0 generate issue (#8920)
* Fix arc precision!=q4_0 generate issue

* meet comments
2023-09-07 08:56:36 -07:00
Cengguang Zhang
3d2efe9608 LLM: update llm latency benchmark. (#8922) 2023-09-07 19:00:19 +08:00
binbin Deng
7897eb4b51 LLM: add benchmark scripts on GPU (#8916) 2023-09-07 18:08:17 +08:00
Xin Qiu
d8a01d7c4f fix chatglm in run.pu (#8919) 2023-09-07 16:44:10 +08:00
Xin Qiu
e9de9d9950 benchmark for native int4 (#8918)
* native4

* update

* update

* update
2023-09-07 15:56:15 +08:00
Ruonan Wang
c0797ea232 LLM: update setup to specify bigdl-core-xe version (#8913) 2023-09-07 15:11:55 +08:00
Ruonan Wang
057e77e229 LLM: update benchmark_utils.py to handle do_sample=True (#8903) 2023-09-07 14:20:47 +08:00
Yang Wang
c34400e6b0 Use new layout for xpu qlinear (#8896)
* use new layout for xpu qlinear

* fix style
2023-09-06 21:55:33 -07:00
Zhao Changmin
8bc1d8a17c LLM: Fix discards in optimize_model with non-hf models and add openai whisper example (#8877)
* openai-whisper
2023-09-07 10:35:59 +08:00
Xin Qiu
5d9942a3ca transformer int4 and native int4's benchmark script for 32 256 1k 2k input (#8871)
* transformer

* move

* update

* add header

* update all-in-one

* clean up
2023-09-07 09:49:55 +08:00
Yina Chen
bfc71fbc15 Add known issue in arc voice assistant example (#8902)
* add known issue in voice assistant example

* update cpu
2023-09-07 09:28:26 +08:00
Yuwen Hu
db26c7b84d [LLM] Update readme gif & image url to the ones hosted on readthedocs (#8900) 2023-09-06 20:04:17 +08:00
SONG Ge
7a71ced78f [LLM Docs] Remain API Docs Issues Solution (#8780)
* langchain readthedocs update

* solve langchain.llms.transformersllm issues

* langchain.embeddings.transformersembeddings/transfortmersllms issues

* update docs for get_num_tokens

* add low_bit api doc

* add optimizer model api doc

* update rst index

* fix coomments style

* update docs following the comments

* update api doc
2023-09-06 16:29:34 +08:00
Xin Qiu
49a39452c6 update benchmark (#8899) 2023-09-06 15:11:43 +08:00
Kai Huang
4a9ff050a1 Add qlora nf4 (#8782)
* add nf4

* dequant nf4

* style
2023-09-06 09:39:22 +08:00
xingyuan li
704a896e90 [LLM] Add perf test on xpu for bigdl-llm (#8866)
* add xpu latency job
* update install way
* remove duplicated workflow
* add perf upload
2023-09-05 17:36:24 +09:00
Zhao Changmin
95271f10e0 LLM: Rename low bit layer (#8875)
* rename lowbit

---------

Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-05 13:21:12 +08:00
Yina Chen
74a2c2ddf5 Update optimize_model=True in llama2 chatglm2 arc examples (#8878)
* add optimize_model=True in llama2 chatglm2 examples

* add ipex optimize in gpt-j example
2023-09-05 10:35:37 +08:00
Jason Dai
5e58f698cd Update readthedocs (#8882) 2023-09-04 15:42:16 +08:00
Song Jiaming
7b3ac66e17 [LLM] auto performance test fix specific settings to template (#8876) 2023-09-01 15:49:04 +08:00
Yang Wang
242c9d6036 Fix chatglm2 multi-turn streamchat (#8867) 2023-08-31 22:13:49 -07:00
Song Jiaming
c06f1ca93e [LLM] auto perf test to output to csv (#8846) 2023-09-01 10:48:00 +08:00
Zhao Changmin
9c652fbe95 LLM: Whisper long segment recognize example (#8826)
* LLM: Long segment recognize example
2023-08-31 16:41:25 +08:00
Yishuo Wang
a232c5aa21 [LLM] add protobuf in bigdl-llm dependency (#8861) 2023-08-31 15:23:31 +08:00
xingyuan li
de6c6bb17f [LLM] Downgrade amx build gcc version and remove avx flag display (#8856)
* downgrade to gcc 11
* remove avx display
2023-08-31 14:08:13 +09:00
Yang Wang
3b4f4e1c3d Fix llama attention optimization for XPU (#8855)
* Fix llama attention optimization fo XPU

* fix chatglm2

* fix typo
2023-08-30 21:30:49 -07:00
Shengsheng Huang
7b566bf686 [LLM] add new API for optimize any pytorch models (#8827)
* add new API for optimize any pytorch models

* change test util name

* revise API and update UT

* fix python style

* update ut config, change default value

* change defaults, disable ut transcribe
2023-08-30 19:41:53 +08:00
Xin Qiu
8eca982301 windows add env (#8852) 2023-08-30 15:54:52 +08:00
Zhao Changmin
731916c639 LLM: Enable attempting loading method automatically (#8841)
* enable auto load method

* warning error

* logger info

---------

Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-08-30 15:41:55 +08:00
Yishuo Wang
bba73ec9d2 [LLM] change chatglm native int4 checkpoint name (#8851) 2023-08-30 15:05:19 +08:00
Yina Chen
55e705a84c [LLM] Support the rest of AutoXXX classes in Transformers API (#8815)
* add transformers auto models

* fix
2023-08-30 11:16:14 +08:00
Zhao Changmin
887018b0f2 Update ut save&load (#8847)
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-08-30 10:32:57 +08:00
Yina Chen
3462fd5c96 Add arc gpt-j example (#8840) 2023-08-30 10:31:24 +08:00
Ruonan Wang
f42c0bad1b LLM: update GPU doc (#8845) 2023-08-30 09:24:19 +08:00
Jason Dai
aab7deab1f Reorganize GPU examples (#8844) 2023-08-30 08:32:08 +08:00
Yang Wang
a386ad984e Add Data Center GPU Flex Series to Readme (#8835)
* Add Data Center GPU Flex Series to Readme

* remove

* update starcoder
2023-08-29 11:19:09 -07:00
Yishuo Wang
7429ea0606 [LLM] support transformer int4 + amx int4 (#8838) 2023-08-29 17:27:18 +08:00
Ruonan Wang
ddff7a6f05 Update readme of GPU to specify oneapi version(#8820) 2023-08-29 13:14:22 +08:00
Zhao Changmin
bb31d4fe80 LLM: Implement hf low_cpu_mem_usage with 1xbinary file peak memory on transformer int4 (#8731)
* 1x peak memory
2023-08-29 09:33:17 +08:00
Yina Chen
35fdf94031 [LLM]Arc starcoder example (#8814)
* arc starcoder example init

* add log

* meet comments
2023-08-28 16:48:00 +08:00
xingyuan li
6a902b892e [LLM] Add amx build step (#8822)
* add amx build step
2023-08-28 17:41:18 +09:00
Ruonan Wang
eae92bc7da llm: quick fix path (#8810) 2023-08-25 16:02:31 +08:00
Ruonan Wang
0186f3ab2f llm: update all ARC int4 examples (#8809)
* update GPU examples

* update other examples

* fix

* update based on comment
2023-08-25 15:26:10 +08:00
Song Jiaming
b8b1b6888b [LLM] Performance test (#8796) 2023-08-25 14:31:45 +08:00
Yang Wang
9d0f6a8cce rename math.py in example to avoid conflict (#8805) 2023-08-24 21:06:31 -07:00
SONG Ge
d2926c7672 [LLM] Unify Langchain Native and Transformers LLM API (#8752)
* deprecate BigDLNativeTransformers and add specific LMEmbedding method

* deprecate and add LM methods for langchain llms

* add native params to native langchain

* new imple for embedding

* move ut from bigdlnative to casual llm

* rename embeddings api and examples update align with usage updating

* docqa example hot-fix

* add more api docs

* add langchain ut for starcoder

* support model_kwargs for transformer methods when calling causalLM and add ut

* ut fix for transformers embedding

* update for langchain causal supporting transformers

* remove model_family in readme doc

* add model_families params to support more models

* update api docs and remove chatglm embeddings for now

* remove chatglm embeddings in examples

* new refactor for ut to add bloom and transformers llama ut

* disable llama transformers embedding ut
2023-08-25 11:14:21 +08:00
binbin Deng
5582872744 LLM: update chatglm example to be more friendly for beginners (#8795) 2023-08-25 10:55:01 +08:00
Yina Chen
7c37424a63 Fix voice assistant example input error on Linux (#8799)
* fix linux error

* update

* remove alsa log
2023-08-25 10:47:27 +08:00
Yang Wang
bf3591e2ff Optimize chatglm2 for bf16 (#8725)
* make chatglm works with bf16

* fix style

* support chatglm v1

* fix style

* fix style

* add chatglm2 file
2023-08-24 10:04:25 -07:00
xingyuan li
c94bdd3791 [LLM] Merge windows & linux nightly test (#8756)
* fix download statement
* add check before build wheel
* use curl to upload files
* windows unittest won't upload converted model
* split llm-cli test into windows & linux versions
* update tempdir create way
* fix nightly converted model name
* windows llm-cli starcoder test temply disabled
* remove taskset dependency
* rename llm_unit_tests_linux to llm_unit_tests
2023-08-23 12:48:41 +09:00
Jason Dai
dcadd09154 Update llm document (#8784) 2023-08-21 22:34:44 +08:00
Yishuo Wang
611c1fb628 [LLM] change default n_threads of native int4 langchain API (#8779) 2023-08-21 13:30:12 +08:00
Yishuo Wang
3d1f2b44f8 LLM: change default n_threads of native int4 models (#8776) 2023-08-18 15:46:19 +08:00
Yishuo Wang
2ba2133613 fix starcoder chinese output (#8773) 2023-08-18 13:37:02 +08:00
binbin Deng
548f7a6cf7 LLM: update convert of llama family to support llama2-70B (#8747) 2023-08-18 09:30:35 +08:00
Yina Chen
4afea496ab support q8_0 (#8765) 2023-08-17 15:06:36 +08:00
Ruonan Wang
e9aa2bd890 LLM: reduce GPU 1st token latency and update example (#8763)
* reduce 1st token latency

* update example

* fix

* fix style

* update readme of gpu benchmark
2023-08-16 18:01:23 +08:00
binbin Deng
06609d9260 LLM: add qwen example on arc (#8757) 2023-08-16 17:11:08 +08:00
SONG Ge
f4164e4492 [BigDL LLM] Update readme for unifying transformers API (#8737)
* update readme doc

* fix readthedocs error

* update comment

* update exception error info

* invalidInputError instead

* fix readme typo error and remove import error

* fix more typo
2023-08-16 14:22:32 +08:00
Song Jiaming
c1f9af6d97 [LLM] chatglm example and transformers low-bit examples (#8751) 2023-08-16 11:41:44 +08:00
Ruonan Wang
8805186f2f LLM: add benchmark tool for gpu (#8760)
* add benchmark tool for gpu

* update
2023-08-16 11:22:10 +08:00
binbin Deng
97283c033c LLM: add falcon example on arc (#8742) 2023-08-15 17:38:38 +08:00
binbin Deng
8c55911308 LLM: add baichuan-13B on arc example (#8755) 2023-08-15 15:07:04 +08:00
binbin Deng
be2ae6eb7c LLM: fix langchain native int4 voiceasistant example (#8750) 2023-08-14 17:23:33 +08:00
Ruonan Wang
d28ad8f7db LLM: add whisper example for arc transformer int4 (#8749)
* add whisper example for arc int4

* fix
2023-08-14 17:05:48 +08:00
Yishuo Wang
77844125f2 [LLM] Support chatglm cache (#8745) 2023-08-14 15:10:46 +08:00
Ruonan Wang
faaccb64a2 LLM: add chatglm2 example for Arc (#8741)
* add chatglm2 example

* update

* fix readme
2023-08-14 10:43:08 +08:00
binbin Deng
b10d7e1adf LLM: add mpt example on arc (#8723) 2023-08-14 09:40:01 +08:00
binbin Deng
e9a1afffc5 LLM: add internlm example on arc (#8722) 2023-08-14 09:39:39 +08:00
SONG Ge
aceea4dc29 [LLM] Unify Transformers and Native API (#8713)
* re-open pr to run on latest runner

* re-add examples and ut

* rename ut and move deprecate to warning instead of raising an error info

* ut fix
2023-08-11 19:45:47 +08:00
Yishuo Wang
f91035c298 [LLM] fix chatglm native int4 emoji output (#8739) 2023-08-11 15:38:41 +08:00
binbin Deng
77efcf7b1d LLM: fix ChatGLM2 native int4 stream output (#8733) 2023-08-11 14:51:50 +08:00
Ruonan Wang
ca3e59a1dc LLM: support stop for starcoder native int4 stream (#8734) 2023-08-11 14:51:30 +08:00
Song Jiaming
e292dfd970 [WIP] LLM transformers api for langchain (#8642) 2023-08-11 13:32:35 +08:00
Yishuo Wang
3d5a7484a2 [LLM] fix bloom and starcoder memory release (#8728) 2023-08-11 11:18:19 +08:00
xingyuan li
02ec01cb48 [LLM] Add bigdl-core-xe dependency when installing bigdl-llm[xpu] (#8716)
* add bigdl-core-xe dependency
2023-08-10 17:41:42 +09:00
Shengsheng Huang
7c56c39e36 Fix GPU examples READ to use bigdl-core-xe (#8714)
* Update README.md

* Update README.md
2023-08-10 12:53:49 +08:00
Yina Chen
6d1ca88aac add voice assistant example (#8711) 2023-08-10 12:42:14 +08:00
Song Jiaming
e717e304a6 LLM first example test and template (#8658) 2023-08-10 10:03:11 +08:00
Ruonan Wang
1a7b698a83 [LLM] support ipex arc int4 & add basic llama2 example (#8700)
* first support of xpu

* make it works on gpu

update setup

update

add GPU llama2 examples

add use_optimize flag to disbale optimize for gpu

fix style

update gpu exmaple readme

fix

* update example, and update env

* fix setup to add cpp files

* replace jit with aot to avoid data leak

* rename to bigdl-core-xe

* update installation in example readme
2023-08-09 22:20:32 +08:00
Jason Dai
d03218674a Update llm readme (#8703) 2023-08-09 14:47:26 +08:00
Kai Huang
1b65288bdb Add api doc for LLM (#8605)
* api doc initial

* update desc
2023-08-08 18:17:16 +08:00
binbin Deng
4c44153584 LLM: add Qwen transformers int4 example (#8699) 2023-08-08 11:23:09 +08:00
Yishuo Wang
710b9b8982 [LLM] add linux chatglm pybinding binary file (#8698) 2023-08-08 11:16:30 +08:00
binbin Deng
ea5d7aff5b LLM: add chatglm native int4 transformers API (#8695) 2023-08-07 17:52:47 +08:00
Yishuo Wang
6da830cf7e [LLM] add chaglm pybinding binary file in setup.py (#8692) 2023-08-07 09:41:03 +08:00
Cengguang Zhang
ebcf75d506 feat: set transformers lib version. (#8683) 2023-08-04 15:01:59 +08:00
Yishuo Wang
ef08250c21 [LLM] chatglm pybinding support (#8672) 2023-08-04 14:27:29 +08:00
Yishuo Wang
5837cc424a [LLM] add chatglm pybinding binary file release (#8677) 2023-08-04 11:45:27 +08:00
Yang Wang
b6468bac43 optimize chatglm2 long sequence (#8662)
* add chatglm2

* optimize a little

* optimize chatglm long sequence

* fix style

* address comments and fix style

* fix bug
2023-08-03 17:56:24 -07:00
Yang Wang
3407f87075 Fix llama kv cache bug (#8674) 2023-08-03 17:54:55 -07:00
Yina Chen
59903ea668 llm linux support avx & avx2 (#8669) 2023-08-03 17:10:59 +08:00
xingyuan li
110cfb5546 [LLM] Remove old windows nightly test code (#8668)
Remove old Windows nightly test code triggered by task scheduler
Add new Windows nightly workflow for nightly testing
2023-08-03 17:12:23 +09:00
xingyuan li
610084e3c0 [LLM] Complete windows unittest (#8611)
* add windows nightly test workflow
* use github runner to run pr test
* model load should use lowbit
* remove tmp dir after testing
2023-08-03 14:48:42 +09:00
binbin Deng
a15a2516e6 add (#8659) 2023-08-03 10:12:10 +08:00
Xin Qiu
0714888705 build windows avx dll (#8657)
* windows avx

* add to actions
2023-08-03 02:06:24 +08:00
Yina Chen
119bf6d710 [LLM] Support linux cpp dynamic load .so (#8655)
* support linux cpp dynamic load .so

* update cli
2023-08-02 20:15:45 +08:00
Zhao Changmin
ca998cc6f2 LLM: Mute shape mismatch output (#8601)
* LLM: Mute shape mismatch output
2023-08-02 16:46:22 +08:00
Zhao Changmin
04c713ef06 LLM: Disable transformer api pretraining_tp (#8645)
* disable pretraining_tp
2023-08-02 11:26:01 +08:00
binbin Deng
6fc31bb4cf LLM: first update descriptions for ChatGLM transformers int4 example (#8646) 2023-08-02 11:00:56 +08:00
Yang Wang
cbeae97a26 Optimize Llama Attention to to reduce KV cache memory copy (#8580)
* Optimize llama attention to reduce KV cache memory copy

* fix bug

* fix style

* remove git

* fix style

* fix style

* fix style

* fix tests

* move llama attention to another file

* revert

* fix style

* remove jit

* fix
2023-08-01 16:37:58 -07:00
binbin Deng
39994738d1 LLM: add chat & stream chat example for ChatGLM2 transformers int4 (#8636) 2023-08-01 14:57:45 +08:00
xingyuan li
cdfbe652ca [LLM] Add chatglm support for llm-cli (#8641)
* add chatglm build
* add llm-cli support
* update git
* install cmake
* add ut for chatglm
* add files to setup
* fix bug cause permission error when sf lack file
2023-08-01 14:30:17 +09:00
Zhao Changmin
d6cbfc6d2c LLM: Add requirements in whisper example (#8644)
* LLM: Add requirements in whisper example
2023-08-01 12:07:14 +08:00
Zhao Changmin
3e10260c6d LLM: llm-convert support chatglm family (#8643)
* convert chatglm
2023-08-01 11:16:18 +08:00
Yina Chen
a607972c0b [LLM]LLM windows load -api.dll (#8631)
* temp

* update

* revert setup.py
2023-07-31 13:47:20 +08:00
xingyuan li
3361b66449 [LLM] Revert llm-cli to disable selecting executables on Windows (#8630)
* revert vnni file select
* revert setup.py
* add model-api.dll
2023-07-31 11:15:44 +09:00
binbin Deng
3dbab9087b LLM: add llama2-7b native int4 example (#8629) 2023-07-28 10:56:16 +08:00
binbin Deng
fb32fefcbe LLM: support tensor input of native int4 generate (#8620) 2023-07-27 17:59:49 +08:00
Zhao Changmin
5b484ab48d LLM: Support load_low_bit loading models in shards format (#8612)
* shards_model

---------

Co-authored-by: leonardozcm <leonaordo1997zcm@gmail.com>
2023-07-26 13:30:01 +08:00
binbin Deng
fcf8c085e3 LLM: add llama2-13b native int4 example (#8613) 2023-07-26 10:12:52 +08:00
Song Jiaming
650b82fa6e [LLM] add CausalLM and Speech UT (#8597) 2023-07-25 11:22:36 +08:00
Zhao Changmin
af201052db avoid malloc all missing keys in fp32 (#8600) 2023-07-25 09:48:51 +08:00
binbin Deng
3f24202e4c [LLM] Add more transformers int4 example (Llama 2) (#8602) 2023-07-25 09:21:12 +08:00
Jason Dai
0f8201c730 llm readme update (#8595) 2023-07-24 09:47:49 +08:00
Yuwen Hu
ba42a6da63 [LLM] Set torch_dtype default value to 'auto' for transformers low bit from_pretrained API 2023-07-21 17:55:00 +08:00
Yuwen Hu
bbde423349 [LLM] Add current Linux UT inference tests to nightly tests (#8578)
* Add current inference uts to nightly tests

* Change test model from chatglm-6b to chatglm2-6b

* Add thread num env variable for nightly test

* Fix urls

* Small fix
2023-07-21 13:26:38 +08:00
Yang Wang
feb3af0567 Optimize transformer int4 memory footprint (#8579) 2023-07-20 20:22:13 -07:00
Yang Wang
57e880f63a [LLM] use pytorch linear for large input matrix (#8492)
* use pytorch linear for large input matrix

* only works on server

* fix style

* optimize memory

* first check server

* revert

* address comments

* fix style
2023-07-20 09:54:25 -07:00
Yuwen Hu
6504e31a97 Small fix (#8577) 2023-07-20 16:37:04 +08:00
Yuwen Hu
2266ca7d2b [LLM] Small updates to transformers int4 ut (#8574)
* Small fix to transformers int4 ut

* Small fix
2023-07-20 13:20:25 +08:00
xingyuan li
7b8d9c1b0d [LLM] Add dependency file check in setup.py (#8565)
* add package file check
2023-07-20 14:20:08 +09:00
Song Jiaming
411d896636 LLM first transformers UT (#8514)
* ut

* transformers api first ut

* name

* dir issue

* use chatglm instead of chatglm2

* omp

* set omp in sh

* source

* taskset

* test

* test omp

* add test
2023-07-20 10:16:27 +08:00
Yuwen Hu
cad78740a7 [LLM] Small fixes to the Whisper transformers INT4 example (#8573)
* Small fixes to the whisper example

* Small fix

* Small fix
2023-07-20 10:11:33 +08:00
binbin Deng
7a9fdf74df [LLM] Add more transformers int4 example (Dolly v2) (#8571)
* add

* add trust_remote_mode
2023-07-19 18:20:16 +08:00
Zhao Changmin
e680af45ea LLM: Optimize Langchain Pipeline (#8561)
* LLM: Optimize Langchain Pipeline

* load in low bit
2023-07-19 17:43:13 +08:00
Shengsheng Huang
616b7cb0a2 add more langchain examples (#8542)
* update langchain descriptions

* add mathchain example

* update readme

* update readme
2023-07-19 17:42:18 +08:00
binbin Deng
457571b44e [LLM] Add more transformers int4 example (InternLM) (#8557) 2023-07-19 15:15:38 +08:00
xingyuan li
b6510fa054 fix move/download dll step (#8564) 2023-07-19 12:17:07 +09:00
xingyuan li
c52ed37745 fix starcoder dll name (#8563) 2023-07-19 11:55:06 +09:00
Zhao Changmin
3dbe3bf18e transformer_int4 (#8553) 2023-07-19 08:33:58 +08:00
Zhao Changmin
49d636e295 [LLM] whisper model transformer int4 verification and example (#8511)
* LLM: transformer api support

* va

* example

* revert

* pep8

* pep8
2023-07-19 08:33:20 +08:00
Yina Chen
9a7bc17ca1 [LLM] llm supports vnni link on windows (#8543)
* support win vnni link

* fix style

* fix style

* use isa_checker

* fix

* typo

* fix

* update
2023-07-18 16:43:45 +08:00
Yina Chen
4582b6939d [LLM]llm gptneox chat (#8527)
* linux

* support win

* merge upstream & support vnni lib in chat
2023-07-18 11:17:17 +08:00
Jason Dai
1ebc43b151 Update READMEs (#8554) 2023-07-18 11:06:06 +08:00
Yuwen Hu
ee70977c07 [LLM] Transformers int4 example small typo fixes (#8550) 2023-07-17 18:15:32 +08:00
Yuwen Hu
1344f50f75 [LLM] Add more transformers int4 examples (Falcon) (#8546)
* Initial commit

* Add Falcon examples and other small fix

* Small fix

* Small fix

* Update based on comments

* Small fix
2023-07-17 17:36:21 +08:00
Yuwen Hu
de772e7a80 Update mpt for prompt tuning (#8547) 2023-07-17 17:33:54 +08:00
binbin Deng
f1fd746722 [LLM] Add more transformers int4 example (vicuna) (#8544) 2023-07-17 16:59:55 +08:00
Xin Qiu
fccae91461 Add load_low_bit save_load_bit to AutoModelForCausalLM (#8531)
* transformers save_low_bit load_low_bit

* update example and add readme

* update

* update

* update

* add ut

* update
2023-07-17 15:29:55 +08:00
binbin Deng
808a64d53a [LLM] Add more transformers int4 example (starcoder) (#8540) 2023-07-17 14:41:19 +08:00
xingyuan li
e57db777e0 [LLM] Setup.py & llm-cli update for windows vnni binary files (#8537)
* update setup.py
* update llm-cli
2023-07-17 12:28:38 +09:00
binbin Deng
f56b5ade4c [LLM] Add more transformers int4 example (chatglm2) (#8539) 2023-07-14 17:58:33 +08:00
binbin Deng
92d33cf35a [LLM] Add more transformers int4 example (phoenix) (#8520) 2023-07-14 17:58:04 +08:00
Yuwen Hu
e0f0def279 Remove unused example for now (#8538) 2023-07-14 17:32:50 +08:00
binbin Deng
b397e40015 [LLM] Add more transformers int4 example (RedPajama) (#8523) 2023-07-14 17:30:28 +08:00
Yuwen Hu
7bf3e10415 [LLM] Add more int4 transformers examples (MOSS) (#8532)
* Add Moss example

* Small fix
2023-07-14 16:41:41 +08:00
Yuwen Hu
59b7287ef5 [LLM] Add more transformers int4 example (Baichuan) (#8522)
* Add example model Baichuan

* Small updates to client windows settings

* Small refactor

* Small fix
2023-07-14 16:41:29 +08:00
Yuwen Hu
ca6e38607c [LLM] Add more transformers examples (ChatGLM) (#8521)
* Add example for chatglm v1 and other small fixes

* Small fix

* Small further fix

* Small fix

* Update based on comments & updates for client windows recommended settingts

* Small fix

* Small refactor

* Small fix

* Small fix

* Small fix to dolly v1

* Small fix
2023-07-14 16:41:13 +08:00
xingyuan li
c87853233b [LLM] Add windows vnni binary build step (#8518)
* add windows vnni build step
* update build info
* add download command
2023-07-14 17:24:39 +09:00
Yishuo Wang
6320bf201e LLM: fix memory access violation (#8519) 2023-07-13 17:08:08 +08:00
xingyuan li
60c2c0c3dc Bug fix for merged pr #8503 (#8516) 2023-07-13 17:26:30 +09:00
Yuwen Hu
349bcb4bae [LLM] Add more transformers int4 example (Dolly v1) (#8517)
* Initial commit for dolly v1

* Add example for Dolly v1 and other small fix

* Small output updates

* Small fix

* fix based on comments
2023-07-13 16:13:47 +08:00
Xin Qiu
90e3d86bce rename low bit type name (#8512)
* change qx_0 to sym_intx

* update

* fix typo

* update

* fix type

* fix style

* add python doc

* meet code review

* fix style
2023-07-13 15:53:31 +08:00
xingyuan li
4f152b4e3a [LLM] Merge the llm.cpp build and the pypi release (#8503)
* checkout llm.cpp to build new binary
* use artifact to get latest built binary files
* rename quantize
* modify all release workflow
2023-07-13 16:34:24 +09:00
Yuwen Hu
bcde8ec83e [LLM] Small fix to MPT Example (#8513) 2023-07-13 14:33:21 +08:00
Zhao Changmin
ba0da17b40 LLM: Support AutoModelForSeq2SeqLM transformer API (#8449)
* LLM: support AutoModelForSeq2SeqLM transformer API
2023-07-13 13:33:51 +08:00
Yishuo Wang
86b5938075 LLM: fix llm pybinding (#8509) 2023-07-13 10:27:08 +08:00
Yuwen Hu
fcc352eee3 [LLM] Add more transformers_int4 examples (MPT) (#8498)
* Update transformers_int4 readme, and initial commit for mpt

* Update example for mpt

* Small fix and recover transformers_int4_pipeline_readme.md for now

* Update based on comments

* Small fix

* Small fix

* Update based on comments
2023-07-13 09:41:16 +08:00
Zhao Changmin
23f6a4c21f LLM: Optimize transformer int4 loading (#8499)
* LLM: Optimize transformer int4 loading
2023-07-12 15:25:42 +08:00
Yishuo Wang
dd3f953288 Support vnni check (#8497) 2023-07-12 10:11:15 +08:00
Xin Qiu
cd7a980ec4 Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 (#8481)
* quant in Q4 5 8

* meet code review

* update readme

* style

* update

* fix error

* fix error

* update

* fix style

* update

* Update README.md

* Add load_in_low_bit
2023-07-12 08:23:08 +08:00
Yishuo Wang
db39d0a6b3 LLM: disable mmap by default for better performance (#8467) 2023-07-11 09:26:26 +08:00
Yuwen Hu
52c6b057d6 Initial LLM Transformers example refactor (#8491) 2023-07-10 17:53:57 +08:00
Junwei Deng
254a7aa3c4 bigdl-llm: add voice-assistant example that are migrated from langchain use-case document (#8468) 2023-07-10 16:51:45 +08:00
Yishuo Wang
98bac815e4 specify numpy version (#8489) 2023-07-10 16:50:16 +08:00
Zhao Changmin
81d655cda9 LLM: transformer int4 save and load (#8462)
* LLM: transformer int4 save and load
2023-07-10 16:34:41 +08:00
binbin Deng
d489775d2c LLM: fix inconsistency between output token number and max_new_token (#8479) 2023-07-07 17:31:05 +08:00
Jason Dai
bcc1eae322 Llm readme update (#8472) 2023-07-06 20:04:04 +08:00
Ruonan Wang
2f77d485d8 Llm: Initial support of langchain transformer int4 API (#8459)
* first commit of transformer int4 and pipeline

* basic examples

temp save for embeddings

support embeddings and docqa exaple

* fix based on comment

* small fix
2023-07-06 17:50:05 +08:00
binbin Deng
14626fe05b LLM: refactor transformers and langchain class name (#8470) 2023-07-06 17:16:44 +08:00
binbin Deng
70bc8ea8ae LLM: update langchain and cpp-python style API examples (#8456) 2023-07-06 14:36:42 +08:00
Ruonan Wang
64b38e1dc8 llm: benchmark tool for transformers int4 (separate 1st token and rest) (#8460)
* add benchmark utils

* fix

* fix bug and add readme

* hidden latency data
2023-07-06 09:49:52 +08:00
binbin Deng
77808fa124 LLM: fix n_batch in starcoder pybinding (#8461) 2023-07-05 17:06:50 +08:00
Yina Chen
f2bb469847 [WIP] LLm llm-cli chat mode (#8440)
* fix timezone

* temp

* Update linux interactive mode

* modify init text for interactive mode

* meet comments

* update

* win script

* meet comments
2023-07-05 14:04:17 +08:00
binbin Deng
1970bcf14e LLM: add readme for transformer examples (#8444) 2023-07-04 17:25:58 +08:00
binbin Deng
e54e52b438 LLM: fix n_batch in bloom pybinding (#8454) 2023-07-04 15:10:32 +08:00
Yuwen Hu
372c775cb4 [LLM] Change default runner for LLM Linux tests to the ones with AVX512 (#8448)
* Basic change for AVX512 runner

* Remove conda channel and action rename

* Small fix

* Small fix and reduce peak convert disk space

* Define n_threads based on runner status

* Small thread num fix

* Define thread_num for cli

* test

* Add self-hosted label and other small fix
2023-07-04 14:53:03 +08:00
Jason Dai
edf23a95be Update llm readme (#8446) 2023-07-03 16:58:44 +08:00
Jason Dai
a38f927fc0 Update README.md (#8439) 2023-07-03 14:59:55 +08:00
binbin Deng
c956a46c40 LLM: first fix example/transformers (#8438) 2023-07-03 14:13:33 +08:00
Jason Dai
e5b384aaa2 Update README.md (#8437) 2023-07-03 10:54:29 +08:00