Commit graph

107 commits

Author SHA1 Message Date
xingyuan li
4f152b4e3a [LLM] Merge the llm.cpp build and the pypi release (#8503)
* checkout llm.cpp to build new binary
* use artifact to get latest built binary files
* rename quantize
* modify all release workflow
2023-07-13 16:34:24 +09:00
Yuwen Hu
bcde8ec83e [LLM] Small fix to MPT Example (#8513) 2023-07-13 14:33:21 +08:00
Zhao Changmin
ba0da17b40 LLM: Support AutoModelForSeq2SeqLM transformer API (#8449)
* LLM: support AutoModelForSeq2SeqLM transformer API
2023-07-13 13:33:51 +08:00
Yishuo Wang
86b5938075 LLM: fix llm pybinding (#8509) 2023-07-13 10:27:08 +08:00
Yuwen Hu
fcc352eee3 [LLM] Add more transformers_int4 examples (MPT) (#8498)
* Update transformers_int4 readme, and initial commit for mpt

* Update example for mpt

* Small fix and recover transformers_int4_pipeline_readme.md for now

* Update based on comments

* Small fix

* Small fix

* Update based on comments
2023-07-13 09:41:16 +08:00
Zhao Changmin
23f6a4c21f LLM: Optimize transformer int4 loading (#8499)
* LLM: Optimize transformer int4 loading
2023-07-12 15:25:42 +08:00
Yishuo Wang
dd3f953288 Support vnni check (#8497) 2023-07-12 10:11:15 +08:00
Xin Qiu
cd7a980ec4 Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 (#8481)
* quant in Q4 5 8

* meet code review

* update readme

* style

* update

* fix error

* fix error

* update

* fix style

* update

* Update README.md

* Add load_in_low_bit
2023-07-12 08:23:08 +08:00
Yishuo Wang
db39d0a6b3 LLM: disable mmap by default for better performance (#8467) 2023-07-11 09:26:26 +08:00
Yuwen Hu
52c6b057d6 Initial LLM Transformers example refactor (#8491) 2023-07-10 17:53:57 +08:00
Junwei Deng
254a7aa3c4 bigdl-llm: add voice-assistant example that are migrated from langchain use-case document (#8468) 2023-07-10 16:51:45 +08:00
Yishuo Wang
98bac815e4 specify numpy version (#8489) 2023-07-10 16:50:16 +08:00
Zhao Changmin
81d655cda9 LLM: transformer int4 save and load (#8462)
* LLM: transformer int4 save and load
2023-07-10 16:34:41 +08:00
binbin Deng
d489775d2c LLM: fix inconsistency between output token number and max_new_token (#8479) 2023-07-07 17:31:05 +08:00
Jason Dai
bcc1eae322 Llm readme update (#8472) 2023-07-06 20:04:04 +08:00
Ruonan Wang
2f77d485d8 Llm: Initial support of langchain transformer int4 API (#8459)
* first commit of transformer int4 and pipeline

* basic examples

temp save for embeddings

support embeddings and docqa exaple

* fix based on comment

* small fix
2023-07-06 17:50:05 +08:00
binbin Deng
14626fe05b LLM: refactor transformers and langchain class name (#8470) 2023-07-06 17:16:44 +08:00
binbin Deng
70bc8ea8ae LLM: update langchain and cpp-python style API examples (#8456) 2023-07-06 14:36:42 +08:00
Ruonan Wang
64b38e1dc8 llm: benchmark tool for transformers int4 (separate 1st token and rest) (#8460)
* add benchmark utils

* fix

* fix bug and add readme

* hidden latency data
2023-07-06 09:49:52 +08:00
binbin Deng
77808fa124 LLM: fix n_batch in starcoder pybinding (#8461) 2023-07-05 17:06:50 +08:00
Yina Chen
f2bb469847 [WIP] LLm llm-cli chat mode (#8440)
* fix timezone

* temp

* Update linux interactive mode

* modify init text for interactive mode

* meet comments

* update

* win script

* meet comments
2023-07-05 14:04:17 +08:00
binbin Deng
1970bcf14e LLM: add readme for transformer examples (#8444) 2023-07-04 17:25:58 +08:00
binbin Deng
e54e52b438 LLM: fix n_batch in bloom pybinding (#8454) 2023-07-04 15:10:32 +08:00
Yuwen Hu
372c775cb4 [LLM] Change default runner for LLM Linux tests to the ones with AVX512 (#8448)
* Basic change for AVX512 runner

* Remove conda channel and action rename

* Small fix

* Small fix and reduce peak convert disk space

* Define n_threads based on runner status

* Small thread num fix

* Define thread_num for cli

* test

* Add self-hosted label and other small fix
2023-07-04 14:53:03 +08:00
Jason Dai
edf23a95be Update llm readme (#8446) 2023-07-03 16:58:44 +08:00
Jason Dai
a38f927fc0 Update README.md (#8439) 2023-07-03 14:59:55 +08:00
binbin Deng
c956a46c40 LLM: first fix example/transformers (#8438) 2023-07-03 14:13:33 +08:00
Jason Dai
e5b384aaa2 Update README.md (#8437) 2023-07-03 10:54:29 +08:00
Yang Wang
449aea7ffc Optimize transformer int4 loading memory (#8400)
* Optimize transformer int4 loading memory

* move cast to convert

* default settting low_cpu_mem_usage
2023-06-30 20:12:12 -07:00
Jason Dai
2da21163f8 Update llm README.md (#8431) 2023-06-30 19:41:17 +08:00
Junwei Deng
2fd751de7a LLM: add a dev tool for getting glibc/glibcxx requirement (#8399)
* add a dev tool

* pep8 change
2023-06-30 11:09:50 +08:00
binbin Deng
146662bc0d LLM: fix langchain windows failure (#8417) 2023-06-30 09:59:10 +08:00
Yina Chen
6251ad8934 [LLM]Windows unittest (#8356)
* win-unittest

* update

* update

* try llama 7b

* delete llama

* update

* add red-3b

* only test red-3b

* revert

* add langchain

* add dependency

* delete langchain
2023-06-29 14:03:12 +08:00
Yina Chen
783aea3309 [LLM] LLM windows daily test (#8328)
* llm-win-init

* test action

* test

* add types

* update for schtasks

* update pytests

* update

* update

* update doc

* use stable ckpt from ftp instead of the converted model

* download using batch -> manually

* add starcoder test
2023-06-28 15:02:11 +08:00
binbin Deng
ca5a4b6e3a LLM: update bloom and starcoder usage in transformers_int4_pipeline (#8406) 2023-06-28 13:15:50 +08:00
Zhao Changmin
cc76ec809a check out dir (#8395) 2023-06-27 21:28:39 +08:00
Ruonan Wang
4be784a49d LLM: add UT for starcoder (convert, inference) update examples and readme (#8379)
* first commit to add path

* update example and readme

* update path

* fix

* update based on comment
2023-06-27 12:12:11 +08:00
Xin Qiu
e68d631c0a gptq2ggml: support loading safetensors model. (#8401)
* update convert gptq to ggml

* update convert gptq to ggml

* gptq to ggml

* update script

* meet code review

* meet code review
2023-06-27 11:19:33 +08:00
Ruonan Wang
b9eae23c79 LLM: add chatglm-6b example for transformer_int4 usage (#8392)
* add example for chatglm-6b

* fix
2023-06-26 13:46:43 +08:00
binbin Deng
19e19efb4c LLM: raise warning instead of error when use unsupported parameters (#8382) 2023-06-26 13:23:55 +08:00
Shengsheng Huang
c113ecb929 [LLM] langchain bloom, UT's, default parameters (#8357)
* update langchain default parameters to align w/ api

* add ut's for llm and embeddings

* update inference test script to install langchain deps

* update tests workflows

---------

Co-authored-by: leonardozcm <changmin.zhao@intel.com>
2023-06-25 17:38:00 +08:00
Shengsheng Huang
446175cc05 transformer api refactor (#8389)
* transformer api refactor

* fix style

* add huggingface tokenizer usage in example and make ggml tokenzizer as option 1 and huggingface tokenizer as option 2

* fix style
2023-06-25 17:15:33 +08:00
Yang Wang
ce6d06eb0a Support directly quantizing huggingface transformers into 4bit format (#8371)
* Support directly quantizing huggingface transformers into 4bit format

* refine example

* license

* fix bias

* address comments

* move to ggml transformers

* fix example

* fix style

* fix style

* address comments

* rename

* change API

* fix style

* add lm head to conversion

* address comments
2023-06-25 16:35:06 +08:00
binbin Deng
03c5fb71a8 LLM: fix ModuleNotFoundError when use llm-cli (#8378) 2023-06-21 15:03:14 +08:00
Ruonan Wang
7296453f07 LLM: support starcoder in llm-cli (#8377)
* support starcoder in cli

* small fix
2023-06-21 14:38:30 +08:00
Ruonan Wang
50af0251e4 LLM: First commit of StarCoder pybinding (#8354)
* first commit of starcoder

* update setup.py and fix style

* add starcoder_cpp, fix style

* fix style

* support windows binary

* update pybinding

* fix style, add avx2 binary

* small fix

* fix style
2023-06-21 13:23:06 +08:00
Yuwen Hu
a7d66b7342 [LLM] README revise for llm_convert (#8374)
* Small readme revise for llm_convert

* Small fix
2023-06-21 10:04:34 +08:00
Yuwen Hu
7ef1c890eb [LLM] Supports GPTQ convert in transfomers-like API, and supports folder outfile for llm-convert (#8366)
* Add docstrings to llm_convert

* Small docstrings fix

* Unify outfile type to be a folder path for either gptq or pth model_format

* Supports gptq model input for from_pretrained

* Fix example and readme

* Small fix

* Python style fix

* Bug fix in llm_convert

* Python style check

* Fix based on comments

* Small fix
2023-06-20 17:42:38 +08:00
Zhao Changmin
4ec46afa4f LLM: Align converting GPTQ model API with transformer style (#8365)
* LLM: Align GPTQ API with transformer style
2023-06-20 14:27:41 +08:00
Ruonan Wang
f99d348954 LLM: convert and quantize support for StarCoder (#8359)
* basic support for starcoder

* update from_pretrained

* fix bug and fix style
2023-06-20 13:39:35 +08:00