Commit graph

229 commits

Author SHA1 Message Date
Yang Wang
ee98cdd85c Support latest transformer version (#8923)
* Support latest transformer version

* fix style
2023-09-07 19:01:32 -07:00
Yang Wang
25428b22b4 Fix chatglm2 attention and kv cache (#8924)
* fix chatglm2 attention

* fix bf16 bug

* make model stateless

* add utils

* cleanup

* fix style
2023-09-07 18:54:29 -07:00
Yina Chen
b209b8f7b6 [LLM] Fix arc qtype != q4_0 generate issue (#8920)
* Fix arc precision!=q4_0 generate issue

* meet comments
2023-09-07 08:56:36 -07:00
Yang Wang
c34400e6b0 Use new layout for xpu qlinear (#8896)
* use new layout for xpu qlinear

* fix style
2023-09-06 21:55:33 -07:00
Zhao Changmin
8bc1d8a17c LLM: Fix discards in optimize_model with non-hf models and add openai whisper example (#8877)
* openai-whisper
2023-09-07 10:35:59 +08:00
SONG Ge
7a71ced78f [LLM Docs] Remain API Docs Issues Solution (#8780)
* langchain readthedocs update

* solve langchain.llms.transformersllm issues

* langchain.embeddings.transformersembeddings/transfortmersllms issues

* update docs for get_num_tokens

* add low_bit api doc

* add optimizer model api doc

* update rst index

* fix coomments style

* update docs following the comments

* update api doc
2023-09-06 16:29:34 +08:00
Kai Huang
4a9ff050a1 Add qlora nf4 (#8782)
* add nf4

* dequant nf4

* style
2023-09-06 09:39:22 +08:00
Zhao Changmin
95271f10e0 LLM: Rename low bit layer (#8875)
* rename lowbit

---------

Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-09-05 13:21:12 +08:00
Yang Wang
242c9d6036 Fix chatglm2 multi-turn streamchat (#8867) 2023-08-31 22:13:49 -07:00
xingyuan li
de6c6bb17f [LLM] Downgrade amx build gcc version and remove avx flag display (#8856)
* downgrade to gcc 11
* remove avx display
2023-08-31 14:08:13 +09:00
Yang Wang
3b4f4e1c3d Fix llama attention optimization for XPU (#8855)
* Fix llama attention optimization fo XPU

* fix chatglm2

* fix typo
2023-08-30 21:30:49 -07:00
Shengsheng Huang
7b566bf686 [LLM] add new API for optimize any pytorch models (#8827)
* add new API for optimize any pytorch models

* change test util name

* revise API and update UT

* fix python style

* update ut config, change default value

* change defaults, disable ut transcribe
2023-08-30 19:41:53 +08:00
Xin Qiu
8eca982301 windows add env (#8852) 2023-08-30 15:54:52 +08:00
Zhao Changmin
731916c639 LLM: Enable attempting loading method automatically (#8841)
* enable auto load method

* warning error

* logger info

---------

Co-authored-by: leonardozcm <leonardozcm@gmail.com>
2023-08-30 15:41:55 +08:00
Yishuo Wang
bba73ec9d2 [LLM] change chatglm native int4 checkpoint name (#8851) 2023-08-30 15:05:19 +08:00
Yina Chen
55e705a84c [LLM] Support the rest of AutoXXX classes in Transformers API (#8815)
* add transformers auto models

* fix
2023-08-30 11:16:14 +08:00
Yishuo Wang
7429ea0606 [LLM] support transformer int4 + amx int4 (#8838) 2023-08-29 17:27:18 +08:00
Zhao Changmin
bb31d4fe80 LLM: Implement hf low_cpu_mem_usage with 1xbinary file peak memory on transformer int4 (#8731)
* 1x peak memory
2023-08-29 09:33:17 +08:00
SONG Ge
d2926c7672 [LLM] Unify Langchain Native and Transformers LLM API (#8752)
* deprecate BigDLNativeTransformers and add specific LMEmbedding method

* deprecate and add LM methods for langchain llms

* add native params to native langchain

* new imple for embedding

* move ut from bigdlnative to casual llm

* rename embeddings api and examples update align with usage updating

* docqa example hot-fix

* add more api docs

* add langchain ut for starcoder

* support model_kwargs for transformer methods when calling causalLM and add ut

* ut fix for transformers embedding

* update for langchain causal supporting transformers

* remove model_family in readme doc

* add model_families params to support more models

* update api docs and remove chatglm embeddings for now

* remove chatglm embeddings in examples

* new refactor for ut to add bloom and transformers llama ut

* disable llama transformers embedding ut
2023-08-25 11:14:21 +08:00
Yang Wang
bf3591e2ff Optimize chatglm2 for bf16 (#8725)
* make chatglm works with bf16

* fix style

* support chatglm v1

* fix style

* fix style

* add chatglm2 file
2023-08-24 10:04:25 -07:00
Yishuo Wang
611c1fb628 [LLM] change default n_threads of native int4 langchain API (#8779) 2023-08-21 13:30:12 +08:00
Yishuo Wang
3d1f2b44f8 LLM: change default n_threads of native int4 models (#8776) 2023-08-18 15:46:19 +08:00
Yishuo Wang
2ba2133613 fix starcoder chinese output (#8773) 2023-08-18 13:37:02 +08:00
binbin Deng
548f7a6cf7 LLM: update convert of llama family to support llama2-70B (#8747) 2023-08-18 09:30:35 +08:00
Yina Chen
4afea496ab support q8_0 (#8765) 2023-08-17 15:06:36 +08:00
Ruonan Wang
e9aa2bd890 LLM: reduce GPU 1st token latency and update example (#8763)
* reduce 1st token latency

* update example

* fix

* fix style

* update readme of gpu benchmark
2023-08-16 18:01:23 +08:00
SONG Ge
f4164e4492 [BigDL LLM] Update readme for unifying transformers API (#8737)
* update readme doc

* fix readthedocs error

* update comment

* update exception error info

* invalidInputError instead

* fix readme typo error and remove import error

* fix more typo
2023-08-16 14:22:32 +08:00
Yishuo Wang
77844125f2 [LLM] Support chatglm cache (#8745) 2023-08-14 15:10:46 +08:00
SONG Ge
aceea4dc29 [LLM] Unify Transformers and Native API (#8713)
* re-open pr to run on latest runner

* re-add examples and ut

* rename ut and move deprecate to warning instead of raising an error info

* ut fix
2023-08-11 19:45:47 +08:00
Yishuo Wang
f91035c298 [LLM] fix chatglm native int4 emoji output (#8739) 2023-08-11 15:38:41 +08:00
binbin Deng
77efcf7b1d LLM: fix ChatGLM2 native int4 stream output (#8733) 2023-08-11 14:51:50 +08:00
Ruonan Wang
ca3e59a1dc LLM: support stop for starcoder native int4 stream (#8734) 2023-08-11 14:51:30 +08:00
Yishuo Wang
3d5a7484a2 [LLM] fix bloom and starcoder memory release (#8728) 2023-08-11 11:18:19 +08:00
Ruonan Wang
1a7b698a83 [LLM] support ipex arc int4 & add basic llama2 example (#8700)
* first support of xpu

* make it works on gpu

update setup

update

add GPU llama2 examples

add use_optimize flag to disbale optimize for gpu

fix style

update gpu exmaple readme

fix

* update example, and update env

* fix setup to add cpp files

* replace jit with aot to avoid data leak

* rename to bigdl-core-xe

* update installation in example readme
2023-08-09 22:20:32 +08:00
Kai Huang
1b65288bdb Add api doc for LLM (#8605)
* api doc initial

* update desc
2023-08-08 18:17:16 +08:00
binbin Deng
ea5d7aff5b LLM: add chatglm native int4 transformers API (#8695) 2023-08-07 17:52:47 +08:00
Yishuo Wang
ef08250c21 [LLM] chatglm pybinding support (#8672) 2023-08-04 14:27:29 +08:00
Yang Wang
b6468bac43 optimize chatglm2 long sequence (#8662)
* add chatglm2

* optimize a little

* optimize chatglm long sequence

* fix style

* address comments and fix style

* fix bug
2023-08-03 17:56:24 -07:00
Yang Wang
3407f87075 Fix llama kv cache bug (#8674) 2023-08-03 17:54:55 -07:00
binbin Deng
a15a2516e6 add (#8659) 2023-08-03 10:12:10 +08:00
Yina Chen
119bf6d710 [LLM] Support linux cpp dynamic load .so (#8655)
* support linux cpp dynamic load .so

* update cli
2023-08-02 20:15:45 +08:00
Zhao Changmin
ca998cc6f2 LLM: Mute shape mismatch output (#8601)
* LLM: Mute shape mismatch output
2023-08-02 16:46:22 +08:00
Zhao Changmin
04c713ef06 LLM: Disable transformer api pretraining_tp (#8645)
* disable pretraining_tp
2023-08-02 11:26:01 +08:00
Yang Wang
cbeae97a26 Optimize Llama Attention to to reduce KV cache memory copy (#8580)
* Optimize llama attention to reduce KV cache memory copy

* fix bug

* fix style

* remove git

* fix style

* fix style

* fix style

* fix tests

* move llama attention to another file

* revert

* fix style

* remove jit

* fix
2023-08-01 16:37:58 -07:00
xingyuan li
cdfbe652ca [LLM] Add chatglm support for llm-cli (#8641)
* add chatglm build
* add llm-cli support
* update git
* install cmake
* add ut for chatglm
* add files to setup
* fix bug cause permission error when sf lack file
2023-08-01 14:30:17 +09:00
Zhao Changmin
3e10260c6d LLM: llm-convert support chatglm family (#8643)
* convert chatglm
2023-08-01 11:16:18 +08:00
Yina Chen
a607972c0b [LLM]LLM windows load -api.dll (#8631)
* temp

* update

* revert setup.py
2023-07-31 13:47:20 +08:00
xingyuan li
3361b66449 [LLM] Revert llm-cli to disable selecting executables on Windows (#8630)
* revert vnni file select
* revert setup.py
* add model-api.dll
2023-07-31 11:15:44 +09:00
binbin Deng
fb32fefcbe LLM: support tensor input of native int4 generate (#8620) 2023-07-27 17:59:49 +08:00
Zhao Changmin
5b484ab48d LLM: Support load_low_bit loading models in shards format (#8612)
* shards_model

---------

Co-authored-by: leonardozcm <leonaordo1997zcm@gmail.com>
2023-07-26 13:30:01 +08:00
Zhao Changmin
af201052db avoid malloc all missing keys in fp32 (#8600) 2023-07-25 09:48:51 +08:00
Yuwen Hu
ba42a6da63 [LLM] Set torch_dtype default value to 'auto' for transformers low bit from_pretrained API 2023-07-21 17:55:00 +08:00
Yang Wang
feb3af0567 Optimize transformer int4 memory footprint (#8579) 2023-07-20 20:22:13 -07:00
Yang Wang
57e880f63a [LLM] use pytorch linear for large input matrix (#8492)
* use pytorch linear for large input matrix

* only works on server

* fix style

* optimize memory

* first check server

* revert

* address comments

* fix style
2023-07-20 09:54:25 -07:00
Zhao Changmin
e680af45ea LLM: Optimize Langchain Pipeline (#8561)
* LLM: Optimize Langchain Pipeline

* load in low bit
2023-07-19 17:43:13 +08:00
Zhao Changmin
49d636e295 [LLM] whisper model transformer int4 verification and example (#8511)
* LLM: transformer api support

* va

* example

* revert

* pep8

* pep8
2023-07-19 08:33:20 +08:00
Yina Chen
9a7bc17ca1 [LLM] llm supports vnni link on windows (#8543)
* support win vnni link

* fix style

* fix style

* use isa_checker

* fix

* typo

* fix

* update
2023-07-18 16:43:45 +08:00
Yina Chen
4582b6939d [LLM]llm gptneox chat (#8527)
* linux

* support win

* merge upstream & support vnni lib in chat
2023-07-18 11:17:17 +08:00
Xin Qiu
fccae91461 Add load_low_bit save_load_bit to AutoModelForCausalLM (#8531)
* transformers save_low_bit load_low_bit

* update example and add readme

* update

* update

* update

* add ut

* update
2023-07-17 15:29:55 +08:00
xingyuan li
e57db777e0 [LLM] Setup.py & llm-cli update for windows vnni binary files (#8537)
* update setup.py
* update llm-cli
2023-07-17 12:28:38 +09:00
Yishuo Wang
6320bf201e LLM: fix memory access violation (#8519) 2023-07-13 17:08:08 +08:00
Xin Qiu
90e3d86bce rename low bit type name (#8512)
* change qx_0 to sym_intx

* update

* fix typo

* update

* fix type

* fix style

* add python doc

* meet code review

* fix style
2023-07-13 15:53:31 +08:00
Zhao Changmin
ba0da17b40 LLM: Support AutoModelForSeq2SeqLM transformer API (#8449)
* LLM: support AutoModelForSeq2SeqLM transformer API
2023-07-13 13:33:51 +08:00
Yishuo Wang
86b5938075 LLM: fix llm pybinding (#8509) 2023-07-13 10:27:08 +08:00
Zhao Changmin
23f6a4c21f LLM: Optimize transformer int4 loading (#8499)
* LLM: Optimize transformer int4 loading
2023-07-12 15:25:42 +08:00
Yishuo Wang
dd3f953288 Support vnni check (#8497) 2023-07-12 10:11:15 +08:00
Xin Qiu
cd7a980ec4 Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 (#8481)
* quant in Q4 5 8

* meet code review

* update readme

* style

* update

* fix error

* fix error

* update

* fix style

* update

* Update README.md

* Add load_in_low_bit
2023-07-12 08:23:08 +08:00
Yishuo Wang
db39d0a6b3 LLM: disable mmap by default for better performance (#8467) 2023-07-11 09:26:26 +08:00
Zhao Changmin
81d655cda9 LLM: transformer int4 save and load (#8462)
* LLM: transformer int4 save and load
2023-07-10 16:34:41 +08:00
binbin Deng
d489775d2c LLM: fix inconsistency between output token number and max_new_token (#8479) 2023-07-07 17:31:05 +08:00
Ruonan Wang
2f77d485d8 Llm: Initial support of langchain transformer int4 API (#8459)
* first commit of transformer int4 and pipeline

* basic examples

temp save for embeddings

support embeddings and docqa exaple

* fix based on comment

* small fix
2023-07-06 17:50:05 +08:00
binbin Deng
14626fe05b LLM: refactor transformers and langchain class name (#8470) 2023-07-06 17:16:44 +08:00
binbin Deng
77808fa124 LLM: fix n_batch in starcoder pybinding (#8461) 2023-07-05 17:06:50 +08:00
Yina Chen
f2bb469847 [WIP] LLm llm-cli chat mode (#8440)
* fix timezone

* temp

* Update linux interactive mode

* modify init text for interactive mode

* meet comments

* update

* win script

* meet comments
2023-07-05 14:04:17 +08:00
binbin Deng
e54e52b438 LLM: fix n_batch in bloom pybinding (#8454) 2023-07-04 15:10:32 +08:00
Yang Wang
449aea7ffc Optimize transformer int4 loading memory (#8400)
* Optimize transformer int4 loading memory

* move cast to convert

* default settting low_cpu_mem_usage
2023-06-30 20:12:12 -07:00
Zhao Changmin
cc76ec809a check out dir (#8395) 2023-06-27 21:28:39 +08:00
Xin Qiu
e68d631c0a gptq2ggml: support loading safetensors model. (#8401)
* update convert gptq to ggml

* update convert gptq to ggml

* gptq to ggml

* update script

* meet code review

* meet code review
2023-06-27 11:19:33 +08:00
binbin Deng
19e19efb4c LLM: raise warning instead of error when use unsupported parameters (#8382) 2023-06-26 13:23:55 +08:00
Shengsheng Huang
c113ecb929 [LLM] langchain bloom, UT's, default parameters (#8357)
* update langchain default parameters to align w/ api

* add ut's for llm and embeddings

* update inference test script to install langchain deps

* update tests workflows

---------

Co-authored-by: leonardozcm <changmin.zhao@intel.com>
2023-06-25 17:38:00 +08:00
Shengsheng Huang
446175cc05 transformer api refactor (#8389)
* transformer api refactor

* fix style

* add huggingface tokenizer usage in example and make ggml tokenzizer as option 1 and huggingface tokenizer as option 2

* fix style
2023-06-25 17:15:33 +08:00
Yang Wang
ce6d06eb0a Support directly quantizing huggingface transformers into 4bit format (#8371)
* Support directly quantizing huggingface transformers into 4bit format

* refine example

* license

* fix bias

* address comments

* move to ggml transformers

* fix example

* fix style

* fix style

* address comments

* rename

* change API

* fix style

* add lm head to conversion

* address comments
2023-06-25 16:35:06 +08:00
binbin Deng
03c5fb71a8 LLM: fix ModuleNotFoundError when use llm-cli (#8378) 2023-06-21 15:03:14 +08:00
Ruonan Wang
7296453f07 LLM: support starcoder in llm-cli (#8377)
* support starcoder in cli

* small fix
2023-06-21 14:38:30 +08:00
Ruonan Wang
50af0251e4 LLM: First commit of StarCoder pybinding (#8354)
* first commit of starcoder

* update setup.py and fix style

* add starcoder_cpp, fix style

* fix style

* support windows binary

* update pybinding

* fix style, add avx2 binary

* small fix

* fix style
2023-06-21 13:23:06 +08:00
Yuwen Hu
7ef1c890eb [LLM] Supports GPTQ convert in transfomers-like API, and supports folder outfile for llm-convert (#8366)
* Add docstrings to llm_convert

* Small docstrings fix

* Unify outfile type to be a folder path for either gptq or pth model_format

* Supports gptq model input for from_pretrained

* Fix example and readme

* Small fix

* Python style fix

* Bug fix in llm_convert

* Python style check

* Fix based on comments

* Small fix
2023-06-20 17:42:38 +08:00
Zhao Changmin
4ec46afa4f LLM: Align converting GPTQ model API with transformer style (#8365)
* LLM: Align GPTQ API with transformer style
2023-06-20 14:27:41 +08:00
Ruonan Wang
f99d348954 LLM: convert and quantize support for StarCoder (#8359)
* basic support for starcoder

* update from_pretrained

* fix bug and fix style
2023-06-20 13:39:35 +08:00
binbin Deng
5f4f399ca7 LLM: fix bugs during supporting bloom in langchain (#8362) 2023-06-20 13:30:37 +08:00
Zhao Changmin
30ac9a70f5 LLM: fix expected 2 blank lines (#8360) 2023-06-19 18:10:02 +08:00
Zhao Changmin
c256cd136b LLM: Fix ggml return value (#8358)
* ggml return original value
2023-06-19 17:02:56 +08:00
Zhao Changmin
d4027d7164 fix typos in llm_convert (#8355) 2023-06-19 16:17:21 +08:00
Zhao Changmin
4d177ca0a1 LLM: Merge convert pth/gptq model script into one shell script (#8348)
* convert model in one

* model type

* license

* readme and pep8

* ut path

* rename

* readme

* fix docs

* without lines
2023-06-19 11:50:05 +08:00
Ruonan Wang
9daf543e2f LLM: Update convert of gpenox to sync with new libgptneox.so (#8345) 2023-06-15 16:28:50 +08:00
Ruonan Wang
f7f4e65788 LLM: support int8 and tmp_path for from_pretrained (#8338) 2023-06-15 14:48:21 +08:00
Ruonan Wang
5094970175 LLM: update convert_model to support int8 (#8326)
* update example and convert_model for int8

* reset example

* fix style
2023-06-15 09:25:07 +08:00
binbin Deng
f64e703083 LLM: first add _tokenize, detokenize and _generate for bloom pybinding (#8316) 2023-06-14 17:29:57 +08:00
Xin Qiu
5576679a92 add convert-gptq-to-ggml.py to bigdl-llama (#8298) 2023-06-14 14:51:51 +08:00
Ruonan Wang
a6c4b733cb LLM: Update subprocess to show error message (#8323)
* update subprocess

* fix style
2023-06-13 16:43:37 +08:00
Shengsheng Huang
02c583144c [LLM] langchain integrations and examples (#8256)
* langchain intergrations and examples

* add licences and rename

* add licences

* fix license issues and change backbone to model_family

* update examples to use model_family param

* fix linting

* fix code style

* exclude langchain integration from stylecheck

* update langchain examples and update integrations based on latets changes

* update simple llama-cpp-python style API example

* remove bloom in README

* change default n_threads to 2 and remove redundant code

---------

Co-authored-by: leonardozcm <changmin.zhao@intel.com>
2023-06-12 19:22:07 +08:00
xingyuan li
c4028d507c [LLM] Add unified default value for cli programs (#8310)
* add unified default value for threads and n_predict
2023-06-12 16:30:27 +08:00
binbin Deng
5d5da7b2c7 LLM: optimize namespace and remove unused import logic (#8302) 2023-06-09 15:17:49 +08:00
Ruonan Wang
5d0e130605 LLM: fix convert path error of gptneox and bloom on windows (#8304) 2023-06-09 10:10:19 +08:00
Yina Chen
7bfa0fcdf9 fix style (#8300) 2023-06-08 16:52:17 +08:00
Yina Chen
637b72f2ad [LLM] llm transformers api support batch actions (#8288)
* llm transformers api support batch actions

* align with transformer

* meet comment
2023-06-08 15:10:08 +08:00
xingyuan li
ea3cf6783e LLM: Command line wrapper for llama/bloom/gptneox (#8239)
* add llama/bloom/gptneox wrapper
* add readme
* upload binary main file
2023-06-08 14:55:22 +08:00
binbin Deng
08bdfce2d8 LLM: avoid unnecessary import torch except converting process (#8297) 2023-06-08 14:24:58 +08:00
binbin Deng
f9e2bda04a LLM: add stop words and enhance output for bloom pybinding (#8280) 2023-06-08 14:06:06 +08:00
Yina Chen
1571ba6425 remove unused import gptneox_cpp (#8293) 2023-06-08 11:04:47 +08:00
Yina Chen
2c037e892b fix-transformers-neox (#8285) 2023-06-07 14:44:43 +08:00
Ruonan Wang
39ad68e786 LLM: enhancements for convert_model (#8278)
* update convert

* change output name

* add discription for input_path, add check for input_values

* basic support for command line

* fix style

* update based on comment

* update based on comment
2023-06-07 13:22:14 +08:00
Junwei Deng
2d14e593f0 LLM: Support generate(max_new_tokens=...), tokenize and decode for transformers-like API (#8283)
* first push

* fix pep8
2023-06-07 11:50:35 +08:00
Yina Chen
11cd2a07e0 [LLM] llm transformers format interface first part (#8276)
* llm-transformers-format

* update

* fix style
2023-06-06 17:17:37 +08:00
Pingchuan Ma (Henry)
a3f353b939 [LLM] add long time loading disclaimer for LLM model converting (#8279) 2023-06-06 17:15:13 +08:00
Yuwen Hu
64bc123dd3 [LLM] Add transformers-like API from_pretrained (#8271)
* Init commit for bigdl.llm.transformers.AutoModelForCausalLM

* Temp change to avoid name conflicts with external transformers lib

* Support downloading model from huggingface

* Small python style fix

* Change location of transformers to avoid library conflicts

* Add return value for converted ggml binary ckpt path for convert_model

* Avoid repeated loading of shared library and adding some comments

* Small fix

* Path type fix anddocstring fix

* Small fix

* Small fix

* Change cache dir to pwd
2023-06-06 17:04:16 +08:00
xingyuan li
38be471140 [LLM] convert_model bug fix (#8274)
* Renamed all bloomz to bloom in ggml/model & utls/convert_util.py
* Add an optional parameter for specific the model conversion path to avoid running out of disk space
2023-06-06 15:16:42 +08:00
Ruonan Wang
8bd2992a8d LLM: accelerate sample of gptneox and update quantize (#8262)
* update quantize & accelerate sample

* fix style check

* fix style error
2023-06-05 15:36:00 +08:00
Jun Wang
2bc0e7abbb [llm] Add convert_model api (#8244)
* add convert_model api

* change the model_path to input_path

* map int4 to q4_0

* fix blank line

* change bloomz to bloom

* remove default model_family

* change dtype to lower first
2023-06-03 10:18:29 +08:00
Yuwen Hu
e290660b20 [LLM] Add so shared library for Bloom family models (#8258)
* Add so file downloading for bloom family models

* Supports selecting of avx2/avx512 so for bloom
2023-06-02 17:39:40 +08:00
Yina Chen
657ea0ee50 [LLM] Fix linux load libs for NeoX and llama (#8257)
* init

* add lisence

* fix style
2023-06-02 17:03:17 +08:00
Yuwen Hu
286b010bf1 [LLM] First push for Bloomz pybinding (#8252)
* Initial commit to move bloom pybinding to bigdl-llm

* Revise path for shared library

* Small fix
2023-06-02 14:41:04 +08:00
Junwei Deng
350d31a472 LLM: first push gptneox pybinding (#8234)
* first push gptneox pybinding

* fix

* fix code style and add license

---------

Co-authored-by: binbin <binbin1.deng@intel.com>
2023-06-02 09:28:00 +08:00
binbin Deng
3a9aa23835 LLM: fix and update related license in llama pybinding (#8250) 2023-06-01 17:09:15 +08:00
binbin Deng
e56f24b424 LLM: first push llama pybinding (#8241)
* first push llama binding

* update dll
2023-06-01 10:59:15 +08:00
binbin Deng
8421af51ae LLM: support converting to ggml format (#8235)
* add convert

* fix

* fix

* fix

* try

* test

* update check

* fix

* fix
2023-05-31 15:20:06 +08:00
Ruonan Wang
c890609d1e LLM: Support package/quantize for llama.cpp/redpajama.cpp on Windows (#8236)
* support windows of llama.cpp

* update quantize

* update version of llama.cp submodule

* add gptneox.dll

* add quantize-gptneox.exe
2023-05-31 14:47:12 +08:00
Pingchuan Ma (Henry)
1f913a6941 [LLM] Add LLM pep8 coding style checking (#8233)
* add LLM pep8 coding checking

* resolve bugs in testing scripts and code style revision
2023-05-30 15:58:14 +08:00
Ruonan Wang
4638b85f3e [llm] Initial support of package and quantize (#8228)
* first commit of CMakeFiles.txt to include llama & gptneox

* initial support of quantize

* update cmake for only consider linux now

* support quantize interface

* update based on comment
2023-05-26 16:36:46 +08:00
Junwei Deng
ea22416525 LLM: add first round files (#8225) 2023-05-25 11:29:18 +08:00