Commit graph

229 commits

Author SHA1 Message Date
Zhao Changmin
af201052db avoid malloc all missing keys in fp32 (#8600) 2023-07-25 09:48:51 +08:00
Yuwen Hu
ba42a6da63 [LLM] Set torch_dtype default value to 'auto' for transformers low bit from_pretrained API 2023-07-21 17:55:00 +08:00
Yang Wang
feb3af0567 Optimize transformer int4 memory footprint (#8579) 2023-07-20 20:22:13 -07:00
Yang Wang
57e880f63a [LLM] use pytorch linear for large input matrix (#8492)
* use pytorch linear for large input matrix

* only works on server

* fix style

* optimize memory

* first check server

* revert

* address comments

* fix style
2023-07-20 09:54:25 -07:00
Zhao Changmin
e680af45ea LLM: Optimize Langchain Pipeline (#8561)
* LLM: Optimize Langchain Pipeline

* load in low bit
2023-07-19 17:43:13 +08:00
Zhao Changmin
49d636e295 [LLM] whisper model transformer int4 verification and example (#8511)
* LLM: transformer api support

* va

* example

* revert

* pep8

* pep8
2023-07-19 08:33:20 +08:00
Yina Chen
9a7bc17ca1 [LLM] llm supports vnni link on windows (#8543)
* support win vnni link

* fix style

* fix style

* use isa_checker

* fix

* typo

* fix

* update
2023-07-18 16:43:45 +08:00
Yina Chen
4582b6939d [LLM]llm gptneox chat (#8527)
* linux

* support win

* merge upstream & support vnni lib in chat
2023-07-18 11:17:17 +08:00
Xin Qiu
fccae91461 Add load_low_bit save_load_bit to AutoModelForCausalLM (#8531)
* transformers save_low_bit load_low_bit

* update example and add readme

* update

* update

* update

* add ut

* update
2023-07-17 15:29:55 +08:00
xingyuan li
e57db777e0 [LLM] Setup.py & llm-cli update for windows vnni binary files (#8537)
* update setup.py
* update llm-cli
2023-07-17 12:28:38 +09:00
Yishuo Wang
6320bf201e LLM: fix memory access violation (#8519) 2023-07-13 17:08:08 +08:00
Xin Qiu
90e3d86bce rename low bit type name (#8512)
* change qx_0 to sym_intx

* update

* fix typo

* update

* fix type

* fix style

* add python doc

* meet code review

* fix style
2023-07-13 15:53:31 +08:00
Zhao Changmin
ba0da17b40 LLM: Support AutoModelForSeq2SeqLM transformer API (#8449)
* LLM: support AutoModelForSeq2SeqLM transformer API
2023-07-13 13:33:51 +08:00
Yishuo Wang
86b5938075 LLM: fix llm pybinding (#8509) 2023-07-13 10:27:08 +08:00
Zhao Changmin
23f6a4c21f LLM: Optimize transformer int4 loading (#8499)
* LLM: Optimize transformer int4 loading
2023-07-12 15:25:42 +08:00
Yishuo Wang
dd3f953288 Support vnni check (#8497) 2023-07-12 10:11:15 +08:00
Xin Qiu
cd7a980ec4 Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 (#8481)
* quant in Q4 5 8

* meet code review

* update readme

* style

* update

* fix error

* fix error

* update

* fix style

* update

* Update README.md

* Add load_in_low_bit
2023-07-12 08:23:08 +08:00
Yishuo Wang
db39d0a6b3 LLM: disable mmap by default for better performance (#8467) 2023-07-11 09:26:26 +08:00
Zhao Changmin
81d655cda9 LLM: transformer int4 save and load (#8462)
* LLM: transformer int4 save and load
2023-07-10 16:34:41 +08:00
binbin Deng
d489775d2c LLM: fix inconsistency between output token number and max_new_token (#8479) 2023-07-07 17:31:05 +08:00
Ruonan Wang
2f77d485d8 Llm: Initial support of langchain transformer int4 API (#8459)
* first commit of transformer int4 and pipeline

* basic examples

temp save for embeddings

support embeddings and docqa exaple

* fix based on comment

* small fix
2023-07-06 17:50:05 +08:00
binbin Deng
14626fe05b LLM: refactor transformers and langchain class name (#8470) 2023-07-06 17:16:44 +08:00
binbin Deng
77808fa124 LLM: fix n_batch in starcoder pybinding (#8461) 2023-07-05 17:06:50 +08:00
Yina Chen
f2bb469847 [WIP] LLm llm-cli chat mode (#8440)
* fix timezone

* temp

* Update linux interactive mode

* modify init text for interactive mode

* meet comments

* update

* win script

* meet comments
2023-07-05 14:04:17 +08:00
binbin Deng
e54e52b438 LLM: fix n_batch in bloom pybinding (#8454) 2023-07-04 15:10:32 +08:00
Yang Wang
449aea7ffc Optimize transformer int4 loading memory (#8400)
* Optimize transformer int4 loading memory

* move cast to convert

* default settting low_cpu_mem_usage
2023-06-30 20:12:12 -07:00
Zhao Changmin
cc76ec809a check out dir (#8395) 2023-06-27 21:28:39 +08:00
Xin Qiu
e68d631c0a gptq2ggml: support loading safetensors model. (#8401)
* update convert gptq to ggml

* update convert gptq to ggml

* gptq to ggml

* update script

* meet code review

* meet code review
2023-06-27 11:19:33 +08:00
binbin Deng
19e19efb4c LLM: raise warning instead of error when use unsupported parameters (#8382) 2023-06-26 13:23:55 +08:00
Shengsheng Huang
c113ecb929 [LLM] langchain bloom, UT's, default parameters (#8357)
* update langchain default parameters to align w/ api

* add ut's for llm and embeddings

* update inference test script to install langchain deps

* update tests workflows

---------

Co-authored-by: leonardozcm <changmin.zhao@intel.com>
2023-06-25 17:38:00 +08:00
Shengsheng Huang
446175cc05 transformer api refactor (#8389)
* transformer api refactor

* fix style

* add huggingface tokenizer usage in example and make ggml tokenzizer as option 1 and huggingface tokenizer as option 2

* fix style
2023-06-25 17:15:33 +08:00
Yang Wang
ce6d06eb0a Support directly quantizing huggingface transformers into 4bit format (#8371)
* Support directly quantizing huggingface transformers into 4bit format

* refine example

* license

* fix bias

* address comments

* move to ggml transformers

* fix example

* fix style

* fix style

* address comments

* rename

* change API

* fix style

* add lm head to conversion

* address comments
2023-06-25 16:35:06 +08:00
binbin Deng
03c5fb71a8 LLM: fix ModuleNotFoundError when use llm-cli (#8378) 2023-06-21 15:03:14 +08:00
Ruonan Wang
7296453f07 LLM: support starcoder in llm-cli (#8377)
* support starcoder in cli

* small fix
2023-06-21 14:38:30 +08:00
Ruonan Wang
50af0251e4 LLM: First commit of StarCoder pybinding (#8354)
* first commit of starcoder

* update setup.py and fix style

* add starcoder_cpp, fix style

* fix style

* support windows binary

* update pybinding

* fix style, add avx2 binary

* small fix

* fix style
2023-06-21 13:23:06 +08:00
Yuwen Hu
7ef1c890eb [LLM] Supports GPTQ convert in transfomers-like API, and supports folder outfile for llm-convert (#8366)
* Add docstrings to llm_convert

* Small docstrings fix

* Unify outfile type to be a folder path for either gptq or pth model_format

* Supports gptq model input for from_pretrained

* Fix example and readme

* Small fix

* Python style fix

* Bug fix in llm_convert

* Python style check

* Fix based on comments

* Small fix
2023-06-20 17:42:38 +08:00
Zhao Changmin
4ec46afa4f LLM: Align converting GPTQ model API with transformer style (#8365)
* LLM: Align GPTQ API with transformer style
2023-06-20 14:27:41 +08:00
Ruonan Wang
f99d348954 LLM: convert and quantize support for StarCoder (#8359)
* basic support for starcoder

* update from_pretrained

* fix bug and fix style
2023-06-20 13:39:35 +08:00
binbin Deng
5f4f399ca7 LLM: fix bugs during supporting bloom in langchain (#8362) 2023-06-20 13:30:37 +08:00
Zhao Changmin
30ac9a70f5 LLM: fix expected 2 blank lines (#8360) 2023-06-19 18:10:02 +08:00
Zhao Changmin
c256cd136b LLM: Fix ggml return value (#8358)
* ggml return original value
2023-06-19 17:02:56 +08:00
Zhao Changmin
d4027d7164 fix typos in llm_convert (#8355) 2023-06-19 16:17:21 +08:00
Zhao Changmin
4d177ca0a1 LLM: Merge convert pth/gptq model script into one shell script (#8348)
* convert model in one

* model type

* license

* readme and pep8

* ut path

* rename

* readme

* fix docs

* without lines
2023-06-19 11:50:05 +08:00
Ruonan Wang
9daf543e2f LLM: Update convert of gpenox to sync with new libgptneox.so (#8345) 2023-06-15 16:28:50 +08:00
Ruonan Wang
f7f4e65788 LLM: support int8 and tmp_path for from_pretrained (#8338) 2023-06-15 14:48:21 +08:00
Ruonan Wang
5094970175 LLM: update convert_model to support int8 (#8326)
* update example and convert_model for int8

* reset example

* fix style
2023-06-15 09:25:07 +08:00
binbin Deng
f64e703083 LLM: first add _tokenize, detokenize and _generate for bloom pybinding (#8316) 2023-06-14 17:29:57 +08:00
Xin Qiu
5576679a92 add convert-gptq-to-ggml.py to bigdl-llama (#8298) 2023-06-14 14:51:51 +08:00
Ruonan Wang
a6c4b733cb LLM: Update subprocess to show error message (#8323)
* update subprocess

* fix style
2023-06-13 16:43:37 +08:00
Shengsheng Huang
02c583144c [LLM] langchain integrations and examples (#8256)
* langchain intergrations and examples

* add licences and rename

* add licences

* fix license issues and change backbone to model_family

* update examples to use model_family param

* fix linting

* fix code style

* exclude langchain integration from stylecheck

* update langchain examples and update integrations based on latets changes

* update simple llama-cpp-python style API example

* remove bloom in README

* change default n_threads to 2 and remove redundant code

---------

Co-authored-by: leonardozcm <changmin.zhao@intel.com>
2023-06-12 19:22:07 +08:00
xingyuan li
c4028d507c [LLM] Add unified default value for cli programs (#8310)
* add unified default value for threads and n_predict
2023-06-12 16:30:27 +08:00
binbin Deng
5d5da7b2c7 LLM: optimize namespace and remove unused import logic (#8302) 2023-06-09 15:17:49 +08:00
Ruonan Wang
5d0e130605 LLM: fix convert path error of gptneox and bloom on windows (#8304) 2023-06-09 10:10:19 +08:00
Yina Chen
7bfa0fcdf9 fix style (#8300) 2023-06-08 16:52:17 +08:00
Yina Chen
637b72f2ad [LLM] llm transformers api support batch actions (#8288)
* llm transformers api support batch actions

* align with transformer

* meet comment
2023-06-08 15:10:08 +08:00
xingyuan li
ea3cf6783e LLM: Command line wrapper for llama/bloom/gptneox (#8239)
* add llama/bloom/gptneox wrapper
* add readme
* upload binary main file
2023-06-08 14:55:22 +08:00
binbin Deng
08bdfce2d8 LLM: avoid unnecessary import torch except converting process (#8297) 2023-06-08 14:24:58 +08:00
binbin Deng
f9e2bda04a LLM: add stop words and enhance output for bloom pybinding (#8280) 2023-06-08 14:06:06 +08:00
Yina Chen
1571ba6425 remove unused import gptneox_cpp (#8293) 2023-06-08 11:04:47 +08:00
Yina Chen
2c037e892b fix-transformers-neox (#8285) 2023-06-07 14:44:43 +08:00
Ruonan Wang
39ad68e786 LLM: enhancements for convert_model (#8278)
* update convert

* change output name

* add discription for input_path, add check for input_values

* basic support for command line

* fix style

* update based on comment

* update based on comment
2023-06-07 13:22:14 +08:00
Junwei Deng
2d14e593f0 LLM: Support generate(max_new_tokens=...), tokenize and decode for transformers-like API (#8283)
* first push

* fix pep8
2023-06-07 11:50:35 +08:00
Yina Chen
11cd2a07e0 [LLM] llm transformers format interface first part (#8276)
* llm-transformers-format

* update

* fix style
2023-06-06 17:17:37 +08:00
Pingchuan Ma (Henry)
a3f353b939 [LLM] add long time loading disclaimer for LLM model converting (#8279) 2023-06-06 17:15:13 +08:00
Yuwen Hu
64bc123dd3 [LLM] Add transformers-like API from_pretrained (#8271)
* Init commit for bigdl.llm.transformers.AutoModelForCausalLM

* Temp change to avoid name conflicts with external transformers lib

* Support downloading model from huggingface

* Small python style fix

* Change location of transformers to avoid library conflicts

* Add return value for converted ggml binary ckpt path for convert_model

* Avoid repeated loading of shared library and adding some comments

* Small fix

* Path type fix anddocstring fix

* Small fix

* Small fix

* Change cache dir to pwd
2023-06-06 17:04:16 +08:00
xingyuan li
38be471140 [LLM] convert_model bug fix (#8274)
* Renamed all bloomz to bloom in ggml/model & utls/convert_util.py
* Add an optional parameter for specific the model conversion path to avoid running out of disk space
2023-06-06 15:16:42 +08:00
Ruonan Wang
8bd2992a8d LLM: accelerate sample of gptneox and update quantize (#8262)
* update quantize & accelerate sample

* fix style check

* fix style error
2023-06-05 15:36:00 +08:00
Jun Wang
2bc0e7abbb [llm] Add convert_model api (#8244)
* add convert_model api

* change the model_path to input_path

* map int4 to q4_0

* fix blank line

* change bloomz to bloom

* remove default model_family

* change dtype to lower first
2023-06-03 10:18:29 +08:00
Yuwen Hu
e290660b20 [LLM] Add so shared library for Bloom family models (#8258)
* Add so file downloading for bloom family models

* Supports selecting of avx2/avx512 so for bloom
2023-06-02 17:39:40 +08:00
Yina Chen
657ea0ee50 [LLM] Fix linux load libs for NeoX and llama (#8257)
* init

* add lisence

* fix style
2023-06-02 17:03:17 +08:00
Yuwen Hu
286b010bf1 [LLM] First push for Bloomz pybinding (#8252)
* Initial commit to move bloom pybinding to bigdl-llm

* Revise path for shared library

* Small fix
2023-06-02 14:41:04 +08:00
Junwei Deng
350d31a472 LLM: first push gptneox pybinding (#8234)
* first push gptneox pybinding

* fix

* fix code style and add license

---------

Co-authored-by: binbin <binbin1.deng@intel.com>
2023-06-02 09:28:00 +08:00
binbin Deng
3a9aa23835 LLM: fix and update related license in llama pybinding (#8250) 2023-06-01 17:09:15 +08:00
binbin Deng
e56f24b424 LLM: first push llama pybinding (#8241)
* first push llama binding

* update dll
2023-06-01 10:59:15 +08:00
binbin Deng
8421af51ae LLM: support converting to ggml format (#8235)
* add convert

* fix

* fix

* fix

* try

* test

* update check

* fix

* fix
2023-05-31 15:20:06 +08:00
Ruonan Wang
c890609d1e LLM: Support package/quantize for llama.cpp/redpajama.cpp on Windows (#8236)
* support windows of llama.cpp

* update quantize

* update version of llama.cp submodule

* add gptneox.dll

* add quantize-gptneox.exe
2023-05-31 14:47:12 +08:00
Pingchuan Ma (Henry)
1f913a6941 [LLM] Add LLM pep8 coding style checking (#8233)
* add LLM pep8 coding checking

* resolve bugs in testing scripts and code style revision
2023-05-30 15:58:14 +08:00
Ruonan Wang
4638b85f3e [llm] Initial support of package and quantize (#8228)
* first commit of CMakeFiles.txt to include llama & gptneox

* initial support of quantize

* update cmake for only consider linux now

* support quantize interface

* update based on comment
2023-05-26 16:36:46 +08:00
Junwei Deng
ea22416525 LLM: add first round files (#8225) 2023-05-25 11:29:18 +08:00