Yang Wang
b6468bac43
optimize chatglm2 long sequence ( #8662 )
...
* add chatglm2
* optimize a little
* optimize chatglm long sequence
* fix style
* address comments and fix style
* fix bug
2023-08-03 17:56:24 -07:00
Yang Wang
3407f87075
Fix llama kv cache bug ( #8674 )
2023-08-03 17:54:55 -07:00
binbin Deng
a15a2516e6
add ( #8659 )
2023-08-03 10:12:10 +08:00
Yina Chen
119bf6d710
[LLM] Support linux cpp dynamic load .so ( #8655 )
...
* support linux cpp dynamic load .so
* update cli
2023-08-02 20:15:45 +08:00
Zhao Changmin
ca998cc6f2
LLM: Mute shape mismatch output ( #8601 )
...
* LLM: Mute shape mismatch output
2023-08-02 16:46:22 +08:00
Zhao Changmin
04c713ef06
LLM: Disable transformer api pretraining_tp ( #8645 )
...
* disable pretraining_tp
2023-08-02 11:26:01 +08:00
Yang Wang
cbeae97a26
Optimize Llama Attention to to reduce KV cache memory copy ( #8580 )
...
* Optimize llama attention to reduce KV cache memory copy
* fix bug
* fix style
* remove git
* fix style
* fix style
* fix style
* fix tests
* move llama attention to another file
* revert
* fix style
* remove jit
* fix
2023-08-01 16:37:58 -07:00
xingyuan li
cdfbe652ca
[LLM] Add chatglm support for llm-cli ( #8641 )
...
* add chatglm build
* add llm-cli support
* update git
* install cmake
* add ut for chatglm
* add files to setup
* fix bug cause permission error when sf lack file
2023-08-01 14:30:17 +09:00
Zhao Changmin
3e10260c6d
LLM: llm-convert support chatglm family ( #8643 )
...
* convert chatglm
2023-08-01 11:16:18 +08:00
Yina Chen
a607972c0b
[LLM]LLM windows load -api.dll ( #8631 )
...
* temp
* update
* revert setup.py
2023-07-31 13:47:20 +08:00
xingyuan li
3361b66449
[LLM] Revert llm-cli to disable selecting executables on Windows ( #8630 )
...
* revert vnni file select
* revert setup.py
* add model-api.dll
2023-07-31 11:15:44 +09:00
binbin Deng
fb32fefcbe
LLM: support tensor input of native int4 generate ( #8620 )
2023-07-27 17:59:49 +08:00
Zhao Changmin
5b484ab48d
LLM: Support load_low_bit loading models in shards format ( #8612 )
...
* shards_model
---------
Co-authored-by: leonardozcm <leonaordo1997zcm@gmail.com>
2023-07-26 13:30:01 +08:00
Zhao Changmin
af201052db
avoid malloc all missing keys in fp32 ( #8600 )
2023-07-25 09:48:51 +08:00
Yuwen Hu
ba42a6da63
[LLM] Set torch_dtype default value to 'auto' for transformers low bit from_pretrained API
2023-07-21 17:55:00 +08:00
Yang Wang
feb3af0567
Optimize transformer int4 memory footprint ( #8579 )
2023-07-20 20:22:13 -07:00
Yang Wang
57e880f63a
[LLM] use pytorch linear for large input matrix ( #8492 )
...
* use pytorch linear for large input matrix
* only works on server
* fix style
* optimize memory
* first check server
* revert
* address comments
* fix style
2023-07-20 09:54:25 -07:00
Zhao Changmin
e680af45ea
LLM: Optimize Langchain Pipeline ( #8561 )
...
* LLM: Optimize Langchain Pipeline
* load in low bit
2023-07-19 17:43:13 +08:00
Zhao Changmin
49d636e295
[LLM] whisper model transformer int4 verification and example ( #8511 )
...
* LLM: transformer api support
* va
* example
* revert
* pep8
* pep8
2023-07-19 08:33:20 +08:00
Yina Chen
9a7bc17ca1
[LLM] llm supports vnni link on windows ( #8543 )
...
* support win vnni link
* fix style
* fix style
* use isa_checker
* fix
* typo
* fix
* update
2023-07-18 16:43:45 +08:00
Yina Chen
4582b6939d
[LLM]llm gptneox chat ( #8527 )
...
* linux
* support win
* merge upstream & support vnni lib in chat
2023-07-18 11:17:17 +08:00
Xin Qiu
fccae91461
Add load_low_bit save_load_bit to AutoModelForCausalLM ( #8531 )
...
* transformers save_low_bit load_low_bit
* update example and add readme
* update
* update
* update
* add ut
* update
2023-07-17 15:29:55 +08:00
xingyuan li
e57db777e0
[LLM] Setup.py & llm-cli update for windows vnni binary files ( #8537 )
...
* update setup.py
* update llm-cli
2023-07-17 12:28:38 +09:00
Yishuo Wang
6320bf201e
LLM: fix memory access violation ( #8519 )
2023-07-13 17:08:08 +08:00
Xin Qiu
90e3d86bce
rename low bit type name ( #8512 )
...
* change qx_0 to sym_intx
* update
* fix typo
* update
* fix type
* fix style
* add python doc
* meet code review
* fix style
2023-07-13 15:53:31 +08:00
Zhao Changmin
ba0da17b40
LLM: Support AutoModelForSeq2SeqLM transformer API ( #8449 )
...
* LLM: support AutoModelForSeq2SeqLM transformer API
2023-07-13 13:33:51 +08:00
Yishuo Wang
86b5938075
LLM: fix llm pybinding ( #8509 )
2023-07-13 10:27:08 +08:00
Zhao Changmin
23f6a4c21f
LLM: Optimize transformer int4 loading ( #8499 )
...
* LLM: Optimize transformer int4 loading
2023-07-12 15:25:42 +08:00
Yishuo Wang
dd3f953288
Support vnni check ( #8497 )
2023-07-12 10:11:15 +08:00
Xin Qiu
cd7a980ec4
Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 ( #8481 )
...
* quant in Q4 5 8
* meet code review
* update readme
* style
* update
* fix error
* fix error
* update
* fix style
* update
* Update README.md
* Add load_in_low_bit
2023-07-12 08:23:08 +08:00
Yishuo Wang
db39d0a6b3
LLM: disable mmap by default for better performance ( #8467 )
2023-07-11 09:26:26 +08:00
Zhao Changmin
81d655cda9
LLM: transformer int4 save and load ( #8462 )
...
* LLM: transformer int4 save and load
2023-07-10 16:34:41 +08:00
binbin Deng
d489775d2c
LLM: fix inconsistency between output token number and max_new_token ( #8479 )
2023-07-07 17:31:05 +08:00
Ruonan Wang
2f77d485d8
Llm: Initial support of langchain transformer int4 API ( #8459 )
...
* first commit of transformer int4 and pipeline
* basic examples
temp save for embeddings
support embeddings and docqa exaple
* fix based on comment
* small fix
2023-07-06 17:50:05 +08:00
binbin Deng
14626fe05b
LLM: refactor transformers and langchain class name ( #8470 )
2023-07-06 17:16:44 +08:00
binbin Deng
77808fa124
LLM: fix n_batch in starcoder pybinding ( #8461 )
2023-07-05 17:06:50 +08:00
Yina Chen
f2bb469847
[WIP] LLm llm-cli chat mode ( #8440 )
...
* fix timezone
* temp
* Update linux interactive mode
* modify init text for interactive mode
* meet comments
* update
* win script
* meet comments
2023-07-05 14:04:17 +08:00
binbin Deng
e54e52b438
LLM: fix n_batch in bloom pybinding ( #8454 )
2023-07-04 15:10:32 +08:00
Yang Wang
449aea7ffc
Optimize transformer int4 loading memory ( #8400 )
...
* Optimize transformer int4 loading memory
* move cast to convert
* default settting low_cpu_mem_usage
2023-06-30 20:12:12 -07:00
Zhao Changmin
cc76ec809a
check out dir ( #8395 )
2023-06-27 21:28:39 +08:00
Xin Qiu
e68d631c0a
gptq2ggml: support loading safetensors model. ( #8401 )
...
* update convert gptq to ggml
* update convert gptq to ggml
* gptq to ggml
* update script
* meet code review
* meet code review
2023-06-27 11:19:33 +08:00
binbin Deng
19e19efb4c
LLM: raise warning instead of error when use unsupported parameters ( #8382 )
2023-06-26 13:23:55 +08:00
Shengsheng Huang
c113ecb929
[LLM] langchain bloom, UT's, default parameters ( #8357 )
...
* update langchain default parameters to align w/ api
* add ut's for llm and embeddings
* update inference test script to install langchain deps
* update tests workflows
---------
Co-authored-by: leonardozcm <changmin.zhao@intel.com>
2023-06-25 17:38:00 +08:00
Shengsheng Huang
446175cc05
transformer api refactor ( #8389 )
...
* transformer api refactor
* fix style
* add huggingface tokenizer usage in example and make ggml tokenzizer as option 1 and huggingface tokenizer as option 2
* fix style
2023-06-25 17:15:33 +08:00
Yang Wang
ce6d06eb0a
Support directly quantizing huggingface transformers into 4bit format ( #8371 )
...
* Support directly quantizing huggingface transformers into 4bit format
* refine example
* license
* fix bias
* address comments
* move to ggml transformers
* fix example
* fix style
* fix style
* address comments
* rename
* change API
* fix style
* add lm head to conversion
* address comments
2023-06-25 16:35:06 +08:00
binbin Deng
03c5fb71a8
LLM: fix ModuleNotFoundError when use llm-cli ( #8378 )
2023-06-21 15:03:14 +08:00
Ruonan Wang
7296453f07
LLM: support starcoder in llm-cli ( #8377 )
...
* support starcoder in cli
* small fix
2023-06-21 14:38:30 +08:00
Ruonan Wang
50af0251e4
LLM: First commit of StarCoder pybinding ( #8354 )
...
* first commit of starcoder
* update setup.py and fix style
* add starcoder_cpp, fix style
* fix style
* support windows binary
* update pybinding
* fix style, add avx2 binary
* small fix
* fix style
2023-06-21 13:23:06 +08:00
Yuwen Hu
7ef1c890eb
[LLM] Supports GPTQ convert in transfomers-like API, and supports folder outfile for llm-convert ( #8366 )
...
* Add docstrings to llm_convert
* Small docstrings fix
* Unify outfile type to be a folder path for either gptq or pth model_format
* Supports gptq model input for from_pretrained
* Fix example and readme
* Small fix
* Python style fix
* Bug fix in llm_convert
* Python style check
* Fix based on comments
* Small fix
2023-06-20 17:42:38 +08:00
Zhao Changmin
4ec46afa4f
LLM: Align converting GPTQ model API with transformer style ( #8365 )
...
* LLM: Align GPTQ API with transformer style
2023-06-20 14:27:41 +08:00