Shengsheng Huang
7b566bf686
[LLM] add new API for optimize any pytorch models ( #8827 )
...
* add new API for optimize any pytorch models
* change test util name
* revise API and update UT
* fix python style
* update ut config, change default value
* change defaults, disable ut transcribe
2023-08-30 19:41:53 +08:00
Xin Qiu
8eca982301
windows add env ( #8852 )
2023-08-30 15:54:52 +08:00
Zhao Changmin
731916c639
LLM: Enable attempting loading method automatically ( #8841 )
...
* enable auto load method
* warning error
* logger info
---------
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-08-30 15:41:55 +08:00
Yishuo Wang
bba73ec9d2
[LLM] change chatglm native int4 checkpoint name ( #8851 )
2023-08-30 15:05:19 +08:00
Yina Chen
55e705a84c
[LLM] Support the rest of AutoXXX classes in Transformers API ( #8815 )
...
* add transformers auto models
* fix
2023-08-30 11:16:14 +08:00
Yishuo Wang
7429ea0606
[LLM] support transformer int4 + amx int4 ( #8838 )
2023-08-29 17:27:18 +08:00
Zhao Changmin
bb31d4fe80
LLM: Implement hf low_cpu_mem_usage with 1xbinary file peak memory on transformer int4 ( #8731 )
...
* 1x peak memory
2023-08-29 09:33:17 +08:00
SONG Ge
d2926c7672
[LLM] Unify Langchain Native and Transformers LLM API ( #8752 )
...
* deprecate BigDLNativeTransformers and add specific LMEmbedding method
* deprecate and add LM methods for langchain llms
* add native params to native langchain
* new imple for embedding
* move ut from bigdlnative to casual llm
* rename embeddings api and examples update align with usage updating
* docqa example hot-fix
* add more api docs
* add langchain ut for starcoder
* support model_kwargs for transformer methods when calling causalLM and add ut
* ut fix for transformers embedding
* update for langchain causal supporting transformers
* remove model_family in readme doc
* add model_families params to support more models
* update api docs and remove chatglm embeddings for now
* remove chatglm embeddings in examples
* new refactor for ut to add bloom and transformers llama ut
* disable llama transformers embedding ut
2023-08-25 11:14:21 +08:00
Yang Wang
bf3591e2ff
Optimize chatglm2 for bf16 ( #8725 )
...
* make chatglm works with bf16
* fix style
* support chatglm v1
* fix style
* fix style
* add chatglm2 file
2023-08-24 10:04:25 -07:00
Yishuo Wang
611c1fb628
[LLM] change default n_threads of native int4 langchain API ( #8779 )
2023-08-21 13:30:12 +08:00
Yishuo Wang
3d1f2b44f8
LLM: change default n_threads of native int4 models ( #8776 )
2023-08-18 15:46:19 +08:00
Yishuo Wang
2ba2133613
fix starcoder chinese output ( #8773 )
2023-08-18 13:37:02 +08:00
binbin Deng
548f7a6cf7
LLM: update convert of llama family to support llama2-70B ( #8747 )
2023-08-18 09:30:35 +08:00
Yina Chen
4afea496ab
support q8_0 ( #8765 )
2023-08-17 15:06:36 +08:00
Ruonan Wang
e9aa2bd890
LLM: reduce GPU 1st token latency and update example ( #8763 )
...
* reduce 1st token latency
* update example
* fix
* fix style
* update readme of gpu benchmark
2023-08-16 18:01:23 +08:00
SONG Ge
f4164e4492
[BigDL LLM] Update readme for unifying transformers API ( #8737 )
...
* update readme doc
* fix readthedocs error
* update comment
* update exception error info
* invalidInputError instead
* fix readme typo error and remove import error
* fix more typo
2023-08-16 14:22:32 +08:00
Yishuo Wang
77844125f2
[LLM] Support chatglm cache ( #8745 )
2023-08-14 15:10:46 +08:00
SONG Ge
aceea4dc29
[LLM] Unify Transformers and Native API ( #8713 )
...
* re-open pr to run on latest runner
* re-add examples and ut
* rename ut and move deprecate to warning instead of raising an error info
* ut fix
2023-08-11 19:45:47 +08:00
Yishuo Wang
f91035c298
[LLM] fix chatglm native int4 emoji output ( #8739 )
2023-08-11 15:38:41 +08:00
binbin Deng
77efcf7b1d
LLM: fix ChatGLM2 native int4 stream output ( #8733 )
2023-08-11 14:51:50 +08:00
Ruonan Wang
ca3e59a1dc
LLM: support stop for starcoder native int4 stream ( #8734 )
2023-08-11 14:51:30 +08:00
Yishuo Wang
3d5a7484a2
[LLM] fix bloom and starcoder memory release ( #8728 )
2023-08-11 11:18:19 +08:00
Ruonan Wang
1a7b698a83
[LLM] support ipex arc int4 & add basic llama2 example ( #8700 )
...
* first support of xpu
* make it works on gpu
update setup
update
add GPU llama2 examples
add use_optimize flag to disbale optimize for gpu
fix style
update gpu exmaple readme
fix
* update example, and update env
* fix setup to add cpp files
* replace jit with aot to avoid data leak
* rename to bigdl-core-xe
* update installation in example readme
2023-08-09 22:20:32 +08:00
Kai Huang
1b65288bdb
Add api doc for LLM ( #8605 )
...
* api doc initial
* update desc
2023-08-08 18:17:16 +08:00
binbin Deng
ea5d7aff5b
LLM: add chatglm native int4 transformers API ( #8695 )
2023-08-07 17:52:47 +08:00
Yishuo Wang
ef08250c21
[LLM] chatglm pybinding support ( #8672 )
2023-08-04 14:27:29 +08:00
Yang Wang
b6468bac43
optimize chatglm2 long sequence ( #8662 )
...
* add chatglm2
* optimize a little
* optimize chatglm long sequence
* fix style
* address comments and fix style
* fix bug
2023-08-03 17:56:24 -07:00
Yang Wang
3407f87075
Fix llama kv cache bug ( #8674 )
2023-08-03 17:54:55 -07:00
binbin Deng
a15a2516e6
add ( #8659 )
2023-08-03 10:12:10 +08:00
Yina Chen
119bf6d710
[LLM] Support linux cpp dynamic load .so ( #8655 )
...
* support linux cpp dynamic load .so
* update cli
2023-08-02 20:15:45 +08:00
Zhao Changmin
ca998cc6f2
LLM: Mute shape mismatch output ( #8601 )
...
* LLM: Mute shape mismatch output
2023-08-02 16:46:22 +08:00
Zhao Changmin
04c713ef06
LLM: Disable transformer api pretraining_tp ( #8645 )
...
* disable pretraining_tp
2023-08-02 11:26:01 +08:00
Yang Wang
cbeae97a26
Optimize Llama Attention to to reduce KV cache memory copy ( #8580 )
...
* Optimize llama attention to reduce KV cache memory copy
* fix bug
* fix style
* remove git
* fix style
* fix style
* fix style
* fix tests
* move llama attention to another file
* revert
* fix style
* remove jit
* fix
2023-08-01 16:37:58 -07:00
xingyuan li
cdfbe652ca
[LLM] Add chatglm support for llm-cli ( #8641 )
...
* add chatglm build
* add llm-cli support
* update git
* install cmake
* add ut for chatglm
* add files to setup
* fix bug cause permission error when sf lack file
2023-08-01 14:30:17 +09:00
Zhao Changmin
3e10260c6d
LLM: llm-convert support chatglm family ( #8643 )
...
* convert chatglm
2023-08-01 11:16:18 +08:00
Yina Chen
a607972c0b
[LLM]LLM windows load -api.dll ( #8631 )
...
* temp
* update
* revert setup.py
2023-07-31 13:47:20 +08:00
xingyuan li
3361b66449
[LLM] Revert llm-cli to disable selecting executables on Windows ( #8630 )
...
* revert vnni file select
* revert setup.py
* add model-api.dll
2023-07-31 11:15:44 +09:00
binbin Deng
fb32fefcbe
LLM: support tensor input of native int4 generate ( #8620 )
2023-07-27 17:59:49 +08:00
Zhao Changmin
5b484ab48d
LLM: Support load_low_bit loading models in shards format ( #8612 )
...
* shards_model
---------
Co-authored-by: leonardozcm <leonaordo1997zcm@gmail.com>
2023-07-26 13:30:01 +08:00
Zhao Changmin
af201052db
avoid malloc all missing keys in fp32 ( #8600 )
2023-07-25 09:48:51 +08:00
Yuwen Hu
ba42a6da63
[LLM] Set torch_dtype default value to 'auto' for transformers low bit from_pretrained API
2023-07-21 17:55:00 +08:00
Yang Wang
feb3af0567
Optimize transformer int4 memory footprint ( #8579 )
2023-07-20 20:22:13 -07:00
Yang Wang
57e880f63a
[LLM] use pytorch linear for large input matrix ( #8492 )
...
* use pytorch linear for large input matrix
* only works on server
* fix style
* optimize memory
* first check server
* revert
* address comments
* fix style
2023-07-20 09:54:25 -07:00
Zhao Changmin
e680af45ea
LLM: Optimize Langchain Pipeline ( #8561 )
...
* LLM: Optimize Langchain Pipeline
* load in low bit
2023-07-19 17:43:13 +08:00
Zhao Changmin
49d636e295
[LLM] whisper model transformer int4 verification and example ( #8511 )
...
* LLM: transformer api support
* va
* example
* revert
* pep8
* pep8
2023-07-19 08:33:20 +08:00
Yina Chen
9a7bc17ca1
[LLM] llm supports vnni link on windows ( #8543 )
...
* support win vnni link
* fix style
* fix style
* use isa_checker
* fix
* typo
* fix
* update
2023-07-18 16:43:45 +08:00
Yina Chen
4582b6939d
[LLM]llm gptneox chat ( #8527 )
...
* linux
* support win
* merge upstream & support vnni lib in chat
2023-07-18 11:17:17 +08:00
Xin Qiu
fccae91461
Add load_low_bit save_load_bit to AutoModelForCausalLM ( #8531 )
...
* transformers save_low_bit load_low_bit
* update example and add readme
* update
* update
* update
* add ut
* update
2023-07-17 15:29:55 +08:00
xingyuan li
e57db777e0
[LLM] Setup.py & llm-cli update for windows vnni binary files ( #8537 )
...
* update setup.py
* update llm-cli
2023-07-17 12:28:38 +09:00
Yishuo Wang
6320bf201e
LLM: fix memory access violation ( #8519 )
2023-07-13 17:08:08 +08:00