Zhao Changmin
e62eda74b8
refine ( #8912 )
...
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-11 16:40:33 +08:00
Yina Chen
df165ad165
init ( #8933 )
2023-09-11 14:30:55 +08:00
Ruonan Wang
b3f5dd5b5d
LLM: update q8 convert xpu&cpu ( #8930 )
2023-09-08 16:01:17 +08:00
Yina Chen
33d75adadf
[LLM]Support q5_0 on arc ( #8926 )
...
* support q5_0
* delete
* fix style
2023-09-08 15:52:36 +08:00
Yang Wang
ee98cdd85c
Support latest transformer version ( #8923 )
...
* Support latest transformer version
* fix style
2023-09-07 19:01:32 -07:00
Yang Wang
25428b22b4
Fix chatglm2 attention and kv cache ( #8924 )
...
* fix chatglm2 attention
* fix bf16 bug
* make model stateless
* add utils
* cleanup
* fix style
2023-09-07 18:54:29 -07:00
Yina Chen
b209b8f7b6
[LLM] Fix arc qtype != q4_0 generate issue ( #8920 )
...
* Fix arc precision!=q4_0 generate issue
* meet comments
2023-09-07 08:56:36 -07:00
Yang Wang
c34400e6b0
Use new layout for xpu qlinear ( #8896 )
...
* use new layout for xpu qlinear
* fix style
2023-09-06 21:55:33 -07:00
Zhao Changmin
8bc1d8a17c
LLM: Fix discards in optimize_model with non-hf models and add openai whisper example ( #8877 )
...
* openai-whisper
2023-09-07 10:35:59 +08:00
SONG Ge
7a71ced78f
[LLM Docs] Remain API Docs Issues Solution ( #8780 )
...
* langchain readthedocs update
* solve langchain.llms.transformersllm issues
* langchain.embeddings.transformersembeddings/transfortmersllms issues
* update docs for get_num_tokens
* add low_bit api doc
* add optimizer model api doc
* update rst index
* fix coomments style
* update docs following the comments
* update api doc
2023-09-06 16:29:34 +08:00
Kai Huang
4a9ff050a1
Add qlora nf4 ( #8782 )
...
* add nf4
* dequant nf4
* style
2023-09-06 09:39:22 +08:00
Zhao Changmin
95271f10e0
LLM: Rename low bit layer ( #8875 )
...
* rename lowbit
---------
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-05 13:21:12 +08:00
Yang Wang
242c9d6036
Fix chatglm2 multi-turn streamchat ( #8867 )
2023-08-31 22:13:49 -07:00
xingyuan li
de6c6bb17f
[LLM] Downgrade amx build gcc version and remove avx flag display ( #8856 )
...
* downgrade to gcc 11
* remove avx display
2023-08-31 14:08:13 +09:00
Yang Wang
3b4f4e1c3d
Fix llama attention optimization for XPU ( #8855 )
...
* Fix llama attention optimization fo XPU
* fix chatglm2
* fix typo
2023-08-30 21:30:49 -07:00
Shengsheng Huang
7b566bf686
[LLM] add new API for optimize any pytorch models ( #8827 )
...
* add new API for optimize any pytorch models
* change test util name
* revise API and update UT
* fix python style
* update ut config, change default value
* change defaults, disable ut transcribe
2023-08-30 19:41:53 +08:00
Xin Qiu
8eca982301
windows add env ( #8852 )
2023-08-30 15:54:52 +08:00
Zhao Changmin
731916c639
LLM: Enable attempting loading method automatically ( #8841 )
...
* enable auto load method
* warning error
* logger info
---------
Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-08-30 15:41:55 +08:00
Yishuo Wang
bba73ec9d2
[LLM] change chatglm native int4 checkpoint name ( #8851 )
2023-08-30 15:05:19 +08:00
Yina Chen
55e705a84c
[LLM] Support the rest of AutoXXX classes in Transformers API ( #8815 )
...
* add transformers auto models
* fix
2023-08-30 11:16:14 +08:00
Yishuo Wang
7429ea0606
[LLM] support transformer int4 + amx int4 ( #8838 )
2023-08-29 17:27:18 +08:00
Zhao Changmin
bb31d4fe80
LLM: Implement hf low_cpu_mem_usage with 1xbinary file peak memory on transformer int4 ( #8731 )
...
* 1x peak memory
2023-08-29 09:33:17 +08:00
SONG Ge
d2926c7672
[LLM] Unify Langchain Native and Transformers LLM API ( #8752 )
...
* deprecate BigDLNativeTransformers and add specific LMEmbedding method
* deprecate and add LM methods for langchain llms
* add native params to native langchain
* new imple for embedding
* move ut from bigdlnative to casual llm
* rename embeddings api and examples update align with usage updating
* docqa example hot-fix
* add more api docs
* add langchain ut for starcoder
* support model_kwargs for transformer methods when calling causalLM and add ut
* ut fix for transformers embedding
* update for langchain causal supporting transformers
* remove model_family in readme doc
* add model_families params to support more models
* update api docs and remove chatglm embeddings for now
* remove chatglm embeddings in examples
* new refactor for ut to add bloom and transformers llama ut
* disable llama transformers embedding ut
2023-08-25 11:14:21 +08:00
Yang Wang
bf3591e2ff
Optimize chatglm2 for bf16 ( #8725 )
...
* make chatglm works with bf16
* fix style
* support chatglm v1
* fix style
* fix style
* add chatglm2 file
2023-08-24 10:04:25 -07:00
Yishuo Wang
611c1fb628
[LLM] change default n_threads of native int4 langchain API ( #8779 )
2023-08-21 13:30:12 +08:00
Yishuo Wang
3d1f2b44f8
LLM: change default n_threads of native int4 models ( #8776 )
2023-08-18 15:46:19 +08:00
Yishuo Wang
2ba2133613
fix starcoder chinese output ( #8773 )
2023-08-18 13:37:02 +08:00
binbin Deng
548f7a6cf7
LLM: update convert of llama family to support llama2-70B ( #8747 )
2023-08-18 09:30:35 +08:00
Yina Chen
4afea496ab
support q8_0 ( #8765 )
2023-08-17 15:06:36 +08:00
Ruonan Wang
e9aa2bd890
LLM: reduce GPU 1st token latency and update example ( #8763 )
...
* reduce 1st token latency
* update example
* fix
* fix style
* update readme of gpu benchmark
2023-08-16 18:01:23 +08:00
SONG Ge
f4164e4492
[BigDL LLM] Update readme for unifying transformers API ( #8737 )
...
* update readme doc
* fix readthedocs error
* update comment
* update exception error info
* invalidInputError instead
* fix readme typo error and remove import error
* fix more typo
2023-08-16 14:22:32 +08:00
Yishuo Wang
77844125f2
[LLM] Support chatglm cache ( #8745 )
2023-08-14 15:10:46 +08:00
SONG Ge
aceea4dc29
[LLM] Unify Transformers and Native API ( #8713 )
...
* re-open pr to run on latest runner
* re-add examples and ut
* rename ut and move deprecate to warning instead of raising an error info
* ut fix
2023-08-11 19:45:47 +08:00
Yishuo Wang
f91035c298
[LLM] fix chatglm native int4 emoji output ( #8739 )
2023-08-11 15:38:41 +08:00
binbin Deng
77efcf7b1d
LLM: fix ChatGLM2 native int4 stream output ( #8733 )
2023-08-11 14:51:50 +08:00
Ruonan Wang
ca3e59a1dc
LLM: support stop for starcoder native int4 stream ( #8734 )
2023-08-11 14:51:30 +08:00
Yishuo Wang
3d5a7484a2
[LLM] fix bloom and starcoder memory release ( #8728 )
2023-08-11 11:18:19 +08:00
Ruonan Wang
1a7b698a83
[LLM] support ipex arc int4 & add basic llama2 example ( #8700 )
...
* first support of xpu
* make it works on gpu
update setup
update
add GPU llama2 examples
add use_optimize flag to disbale optimize for gpu
fix style
update gpu exmaple readme
fix
* update example, and update env
* fix setup to add cpp files
* replace jit with aot to avoid data leak
* rename to bigdl-core-xe
* update installation in example readme
2023-08-09 22:20:32 +08:00
Kai Huang
1b65288bdb
Add api doc for LLM ( #8605 )
...
* api doc initial
* update desc
2023-08-08 18:17:16 +08:00
binbin Deng
ea5d7aff5b
LLM: add chatglm native int4 transformers API ( #8695 )
2023-08-07 17:52:47 +08:00
Yishuo Wang
ef08250c21
[LLM] chatglm pybinding support ( #8672 )
2023-08-04 14:27:29 +08:00
Yang Wang
b6468bac43
optimize chatglm2 long sequence ( #8662 )
...
* add chatglm2
* optimize a little
* optimize chatglm long sequence
* fix style
* address comments and fix style
* fix bug
2023-08-03 17:56:24 -07:00
Yang Wang
3407f87075
Fix llama kv cache bug ( #8674 )
2023-08-03 17:54:55 -07:00
binbin Deng
a15a2516e6
add ( #8659 )
2023-08-03 10:12:10 +08:00
Yina Chen
119bf6d710
[LLM] Support linux cpp dynamic load .so ( #8655 )
...
* support linux cpp dynamic load .so
* update cli
2023-08-02 20:15:45 +08:00
Zhao Changmin
ca998cc6f2
LLM: Mute shape mismatch output ( #8601 )
...
* LLM: Mute shape mismatch output
2023-08-02 16:46:22 +08:00
Zhao Changmin
04c713ef06
LLM: Disable transformer api pretraining_tp ( #8645 )
...
* disable pretraining_tp
2023-08-02 11:26:01 +08:00
Yang Wang
cbeae97a26
Optimize Llama Attention to to reduce KV cache memory copy ( #8580 )
...
* Optimize llama attention to reduce KV cache memory copy
* fix bug
* fix style
* remove git
* fix style
* fix style
* fix style
* fix tests
* move llama attention to another file
* revert
* fix style
* remove jit
* fix
2023-08-01 16:37:58 -07:00
xingyuan li
cdfbe652ca
[LLM] Add chatglm support for llm-cli ( #8641 )
...
* add chatglm build
* add llm-cli support
* update git
* install cmake
* add ut for chatglm
* add files to setup
* fix bug cause permission error when sf lack file
2023-08-01 14:30:17 +09:00
Zhao Changmin
3e10260c6d
LLM: llm-convert support chatglm family ( #8643 )
...
* convert chatglm
2023-08-01 11:16:18 +08:00
Yina Chen
a607972c0b
[LLM]LLM windows load -api.dll ( #8631 )
...
* temp
* update
* revert setup.py
2023-07-31 13:47:20 +08:00
xingyuan li
3361b66449
[LLM] Revert llm-cli to disable selecting executables on Windows ( #8630 )
...
* revert vnni file select
* revert setup.py
* add model-api.dll
2023-07-31 11:15:44 +09:00
binbin Deng
fb32fefcbe
LLM: support tensor input of native int4 generate ( #8620 )
2023-07-27 17:59:49 +08:00
Zhao Changmin
5b484ab48d
LLM: Support load_low_bit loading models in shards format ( #8612 )
...
* shards_model
---------
Co-authored-by: leonardozcm <leonaordo1997zcm@gmail.com>
2023-07-26 13:30:01 +08:00
Zhao Changmin
af201052db
avoid malloc all missing keys in fp32 ( #8600 )
2023-07-25 09:48:51 +08:00
Yuwen Hu
ba42a6da63
[LLM] Set torch_dtype default value to 'auto' for transformers low bit from_pretrained API
2023-07-21 17:55:00 +08:00
Yang Wang
feb3af0567
Optimize transformer int4 memory footprint ( #8579 )
2023-07-20 20:22:13 -07:00
Yang Wang
57e880f63a
[LLM] use pytorch linear for large input matrix ( #8492 )
...
* use pytorch linear for large input matrix
* only works on server
* fix style
* optimize memory
* first check server
* revert
* address comments
* fix style
2023-07-20 09:54:25 -07:00
Zhao Changmin
e680af45ea
LLM: Optimize Langchain Pipeline ( #8561 )
...
* LLM: Optimize Langchain Pipeline
* load in low bit
2023-07-19 17:43:13 +08:00
Zhao Changmin
49d636e295
[LLM] whisper model transformer int4 verification and example ( #8511 )
...
* LLM: transformer api support
* va
* example
* revert
* pep8
* pep8
2023-07-19 08:33:20 +08:00
Yina Chen
9a7bc17ca1
[LLM] llm supports vnni link on windows ( #8543 )
...
* support win vnni link
* fix style
* fix style
* use isa_checker
* fix
* typo
* fix
* update
2023-07-18 16:43:45 +08:00
Yina Chen
4582b6939d
[LLM]llm gptneox chat ( #8527 )
...
* linux
* support win
* merge upstream & support vnni lib in chat
2023-07-18 11:17:17 +08:00
Xin Qiu
fccae91461
Add load_low_bit save_load_bit to AutoModelForCausalLM ( #8531 )
...
* transformers save_low_bit load_low_bit
* update example and add readme
* update
* update
* update
* add ut
* update
2023-07-17 15:29:55 +08:00
xingyuan li
e57db777e0
[LLM] Setup.py & llm-cli update for windows vnni binary files ( #8537 )
...
* update setup.py
* update llm-cli
2023-07-17 12:28:38 +09:00
Yishuo Wang
6320bf201e
LLM: fix memory access violation ( #8519 )
2023-07-13 17:08:08 +08:00
Xin Qiu
90e3d86bce
rename low bit type name ( #8512 )
...
* change qx_0 to sym_intx
* update
* fix typo
* update
* fix type
* fix style
* add python doc
* meet code review
* fix style
2023-07-13 15:53:31 +08:00
Zhao Changmin
ba0da17b40
LLM: Support AutoModelForSeq2SeqLM transformer API ( #8449 )
...
* LLM: support AutoModelForSeq2SeqLM transformer API
2023-07-13 13:33:51 +08:00
Yishuo Wang
86b5938075
LLM: fix llm pybinding ( #8509 )
2023-07-13 10:27:08 +08:00
Zhao Changmin
23f6a4c21f
LLM: Optimize transformer int4 loading ( #8499 )
...
* LLM: Optimize transformer int4 loading
2023-07-12 15:25:42 +08:00
Yishuo Wang
dd3f953288
Support vnni check ( #8497 )
2023-07-12 10:11:15 +08:00
Xin Qiu
cd7a980ec4
Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 ( #8481 )
...
* quant in Q4 5 8
* meet code review
* update readme
* style
* update
* fix error
* fix error
* update
* fix style
* update
* Update README.md
* Add load_in_low_bit
2023-07-12 08:23:08 +08:00
Yishuo Wang
db39d0a6b3
LLM: disable mmap by default for better performance ( #8467 )
2023-07-11 09:26:26 +08:00
Zhao Changmin
81d655cda9
LLM: transformer int4 save and load ( #8462 )
...
* LLM: transformer int4 save and load
2023-07-10 16:34:41 +08:00
binbin Deng
d489775d2c
LLM: fix inconsistency between output token number and max_new_token ( #8479 )
2023-07-07 17:31:05 +08:00
Ruonan Wang
2f77d485d8
Llm: Initial support of langchain transformer int4 API ( #8459 )
...
* first commit of transformer int4 and pipeline
* basic examples
temp save for embeddings
support embeddings and docqa exaple
* fix based on comment
* small fix
2023-07-06 17:50:05 +08:00
binbin Deng
14626fe05b
LLM: refactor transformers and langchain class name ( #8470 )
2023-07-06 17:16:44 +08:00
binbin Deng
77808fa124
LLM: fix n_batch in starcoder pybinding ( #8461 )
2023-07-05 17:06:50 +08:00
Yina Chen
f2bb469847
[WIP] LLm llm-cli chat mode ( #8440 )
...
* fix timezone
* temp
* Update linux interactive mode
* modify init text for interactive mode
* meet comments
* update
* win script
* meet comments
2023-07-05 14:04:17 +08:00
binbin Deng
e54e52b438
LLM: fix n_batch in bloom pybinding ( #8454 )
2023-07-04 15:10:32 +08:00
Yang Wang
449aea7ffc
Optimize transformer int4 loading memory ( #8400 )
...
* Optimize transformer int4 loading memory
* move cast to convert
* default settting low_cpu_mem_usage
2023-06-30 20:12:12 -07:00
Zhao Changmin
cc76ec809a
check out dir ( #8395 )
2023-06-27 21:28:39 +08:00
Xin Qiu
e68d631c0a
gptq2ggml: support loading safetensors model. ( #8401 )
...
* update convert gptq to ggml
* update convert gptq to ggml
* gptq to ggml
* update script
* meet code review
* meet code review
2023-06-27 11:19:33 +08:00
binbin Deng
19e19efb4c
LLM: raise warning instead of error when use unsupported parameters ( #8382 )
2023-06-26 13:23:55 +08:00
Shengsheng Huang
c113ecb929
[LLM] langchain bloom, UT's, default parameters ( #8357 )
...
* update langchain default parameters to align w/ api
* add ut's for llm and embeddings
* update inference test script to install langchain deps
* update tests workflows
---------
Co-authored-by: leonardozcm <changmin.zhao@intel.com>
2023-06-25 17:38:00 +08:00
Shengsheng Huang
446175cc05
transformer api refactor ( #8389 )
...
* transformer api refactor
* fix style
* add huggingface tokenizer usage in example and make ggml tokenzizer as option 1 and huggingface tokenizer as option 2
* fix style
2023-06-25 17:15:33 +08:00
Yang Wang
ce6d06eb0a
Support directly quantizing huggingface transformers into 4bit format ( #8371 )
...
* Support directly quantizing huggingface transformers into 4bit format
* refine example
* license
* fix bias
* address comments
* move to ggml transformers
* fix example
* fix style
* fix style
* address comments
* rename
* change API
* fix style
* add lm head to conversion
* address comments
2023-06-25 16:35:06 +08:00
binbin Deng
03c5fb71a8
LLM: fix ModuleNotFoundError when use llm-cli ( #8378 )
2023-06-21 15:03:14 +08:00
Ruonan Wang
7296453f07
LLM: support starcoder in llm-cli ( #8377 )
...
* support starcoder in cli
* small fix
2023-06-21 14:38:30 +08:00
Ruonan Wang
50af0251e4
LLM: First commit of StarCoder pybinding ( #8354 )
...
* first commit of starcoder
* update setup.py and fix style
* add starcoder_cpp, fix style
* fix style
* support windows binary
* update pybinding
* fix style, add avx2 binary
* small fix
* fix style
2023-06-21 13:23:06 +08:00
Yuwen Hu
7ef1c890eb
[LLM] Supports GPTQ convert in transfomers-like API, and supports folder outfile for llm-convert ( #8366 )
...
* Add docstrings to llm_convert
* Small docstrings fix
* Unify outfile type to be a folder path for either gptq or pth model_format
* Supports gptq model input for from_pretrained
* Fix example and readme
* Small fix
* Python style fix
* Bug fix in llm_convert
* Python style check
* Fix based on comments
* Small fix
2023-06-20 17:42:38 +08:00
Zhao Changmin
4ec46afa4f
LLM: Align converting GPTQ model API with transformer style ( #8365 )
...
* LLM: Align GPTQ API with transformer style
2023-06-20 14:27:41 +08:00
Ruonan Wang
f99d348954
LLM: convert and quantize support for StarCoder ( #8359 )
...
* basic support for starcoder
* update from_pretrained
* fix bug and fix style
2023-06-20 13:39:35 +08:00
binbin Deng
5f4f399ca7
LLM: fix bugs during supporting bloom in langchain ( #8362 )
2023-06-20 13:30:37 +08:00
Zhao Changmin
30ac9a70f5
LLM: fix expected 2 blank lines ( #8360 )
2023-06-19 18:10:02 +08:00
Zhao Changmin
c256cd136b
LLM: Fix ggml return value ( #8358 )
...
* ggml return original value
2023-06-19 17:02:56 +08:00
Zhao Changmin
d4027d7164
fix typos in llm_convert ( #8355 )
2023-06-19 16:17:21 +08:00
Zhao Changmin
4d177ca0a1
LLM: Merge convert pth/gptq model script into one shell script ( #8348 )
...
* convert model in one
* model type
* license
* readme and pep8
* ut path
* rename
* readme
* fix docs
* without lines
2023-06-19 11:50:05 +08:00
Ruonan Wang
9daf543e2f
LLM: Update convert of gpenox to sync with new libgptneox.so ( #8345 )
2023-06-15 16:28:50 +08:00
Ruonan Wang
f7f4e65788
LLM: support int8 and tmp_path for from_pretrained ( #8338 )
2023-06-15 14:48:21 +08:00
Ruonan Wang
5094970175
LLM: update convert_model to support int8 ( #8326 )
...
* update example and convert_model for int8
* reset example
* fix style
2023-06-15 09:25:07 +08:00
binbin Deng
f64e703083
LLM: first add _tokenize, detokenize and _generate for bloom pybinding ( #8316 )
2023-06-14 17:29:57 +08:00
Xin Qiu
5576679a92
add convert-gptq-to-ggml.py to bigdl-llama ( #8298 )
2023-06-14 14:51:51 +08:00
Ruonan Wang
a6c4b733cb
LLM: Update subprocess to show error message ( #8323 )
...
* update subprocess
* fix style
2023-06-13 16:43:37 +08:00
Shengsheng Huang
02c583144c
[LLM] langchain integrations and examples ( #8256 )
...
* langchain intergrations and examples
* add licences and rename
* add licences
* fix license issues and change backbone to model_family
* update examples to use model_family param
* fix linting
* fix code style
* exclude langchain integration from stylecheck
* update langchain examples and update integrations based on latets changes
* update simple llama-cpp-python style API example
* remove bloom in README
* change default n_threads to 2 and remove redundant code
---------
Co-authored-by: leonardozcm <changmin.zhao@intel.com>
2023-06-12 19:22:07 +08:00
xingyuan li
c4028d507c
[LLM] Add unified default value for cli programs ( #8310 )
...
* add unified default value for threads and n_predict
2023-06-12 16:30:27 +08:00
binbin Deng
5d5da7b2c7
LLM: optimize namespace and remove unused import logic ( #8302 )
2023-06-09 15:17:49 +08:00
Ruonan Wang
5d0e130605
LLM: fix convert path error of gptneox and bloom on windows ( #8304 )
2023-06-09 10:10:19 +08:00
Yina Chen
7bfa0fcdf9
fix style ( #8300 )
2023-06-08 16:52:17 +08:00
Yina Chen
637b72f2ad
[LLM] llm transformers api support batch actions ( #8288 )
...
* llm transformers api support batch actions
* align with transformer
* meet comment
2023-06-08 15:10:08 +08:00
xingyuan li
ea3cf6783e
LLM: Command line wrapper for llama/bloom/gptneox ( #8239 )
...
* add llama/bloom/gptneox wrapper
* add readme
* upload binary main file
2023-06-08 14:55:22 +08:00
binbin Deng
08bdfce2d8
LLM: avoid unnecessary import torch except converting process ( #8297 )
2023-06-08 14:24:58 +08:00
binbin Deng
f9e2bda04a
LLM: add stop words and enhance output for bloom pybinding ( #8280 )
2023-06-08 14:06:06 +08:00
Yina Chen
1571ba6425
remove unused import gptneox_cpp ( #8293 )
2023-06-08 11:04:47 +08:00
Yina Chen
2c037e892b
fix-transformers-neox ( #8285 )
2023-06-07 14:44:43 +08:00
Ruonan Wang
39ad68e786
LLM: enhancements for convert_model ( #8278 )
...
* update convert
* change output name
* add discription for input_path, add check for input_values
* basic support for command line
* fix style
* update based on comment
* update based on comment
2023-06-07 13:22:14 +08:00
Junwei Deng
2d14e593f0
LLM: Support generate(max_new_tokens=...), tokenize and decode for transformers-like API ( #8283 )
...
* first push
* fix pep8
2023-06-07 11:50:35 +08:00
Yina Chen
11cd2a07e0
[LLM] llm transformers format interface first part ( #8276 )
...
* llm-transformers-format
* update
* fix style
2023-06-06 17:17:37 +08:00
Pingchuan Ma (Henry)
a3f353b939
[LLM] add long time loading disclaimer for LLM model converting ( #8279 )
2023-06-06 17:15:13 +08:00
Yuwen Hu
64bc123dd3
[LLM] Add transformers-like API from_pretrained ( #8271 )
...
* Init commit for bigdl.llm.transformers.AutoModelForCausalLM
* Temp change to avoid name conflicts with external transformers lib
* Support downloading model from huggingface
* Small python style fix
* Change location of transformers to avoid library conflicts
* Add return value for converted ggml binary ckpt path for convert_model
* Avoid repeated loading of shared library and adding some comments
* Small fix
* Path type fix anddocstring fix
* Small fix
* Small fix
* Change cache dir to pwd
2023-06-06 17:04:16 +08:00
xingyuan li
38be471140
[LLM] convert_model bug fix ( #8274 )
...
* Renamed all bloomz to bloom in ggml/model & utls/convert_util.py
* Add an optional parameter for specific the model conversion path to avoid running out of disk space
2023-06-06 15:16:42 +08:00
Ruonan Wang
8bd2992a8d
LLM: accelerate sample of gptneox and update quantize ( #8262 )
...
* update quantize & accelerate sample
* fix style check
* fix style error
2023-06-05 15:36:00 +08:00
Jun Wang
2bc0e7abbb
[llm] Add convert_model api ( #8244 )
...
* add convert_model api
* change the model_path to input_path
* map int4 to q4_0
* fix blank line
* change bloomz to bloom
* remove default model_family
* change dtype to lower first
2023-06-03 10:18:29 +08:00
Yuwen Hu
e290660b20
[LLM] Add so shared library for Bloom family models ( #8258 )
...
* Add so file downloading for bloom family models
* Supports selecting of avx2/avx512 so for bloom
2023-06-02 17:39:40 +08:00
Yina Chen
657ea0ee50
[LLM] Fix linux load libs for NeoX and llama ( #8257 )
...
* init
* add lisence
* fix style
2023-06-02 17:03:17 +08:00
Yuwen Hu
286b010bf1
[LLM] First push for Bloomz pybinding ( #8252 )
...
* Initial commit to move bloom pybinding to bigdl-llm
* Revise path for shared library
* Small fix
2023-06-02 14:41:04 +08:00
Junwei Deng
350d31a472
LLM: first push gptneox pybinding ( #8234 )
...
* first push gptneox pybinding
* fix
* fix code style and add license
---------
Co-authored-by: binbin <binbin1.deng@intel.com>
2023-06-02 09:28:00 +08:00
binbin Deng
3a9aa23835
LLM: fix and update related license in llama pybinding ( #8250 )
2023-06-01 17:09:15 +08:00
binbin Deng
e56f24b424
LLM: first push llama pybinding ( #8241 )
...
* first push llama binding
* update dll
2023-06-01 10:59:15 +08:00
binbin Deng
8421af51ae
LLM: support converting to ggml format ( #8235 )
...
* add convert
* fix
* fix
* fix
* try
* test
* update check
* fix
* fix
2023-05-31 15:20:06 +08:00
Ruonan Wang
c890609d1e
LLM: Support package/quantize for llama.cpp/redpajama.cpp on Windows ( #8236 )
...
* support windows of llama.cpp
* update quantize
* update version of llama.cp submodule
* add gptneox.dll
* add quantize-gptneox.exe
2023-05-31 14:47:12 +08:00
Pingchuan Ma (Henry)
1f913a6941
[LLM] Add LLM pep8 coding style checking ( #8233 )
...
* add LLM pep8 coding checking
* resolve bugs in testing scripts and code style revision
2023-05-30 15:58:14 +08:00
Ruonan Wang
4638b85f3e
[llm] Initial support of package and quantize ( #8228 )
...
* first commit of CMakeFiles.txt to include llama & gptneox
* initial support of quantize
* update cmake for only consider linux now
* support quantize interface
* update based on comment
2023-05-26 16:36:46 +08:00
Junwei Deng
ea22416525
LLM: add first round files ( #8225 )
2023-05-25 11:29:18 +08:00