ipex-llm

Author	SHA1	Message	Date
Yina Chen	a607972c0b	[LLM]LLM windows load -api.dll (#8631 ) * temp * update * revert setup.py	2023-07-31 13:47:20 +08:00
xingyuan li	3361b66449	[LLM] Revert llm-cli to disable selecting executables on Windows (#8630 ) * revert vnni file select * revert setup.py * add model-api.dll	2023-07-31 11:15:44 +09:00
binbin Deng	fb32fefcbe	LLM: support tensor input of native int4 `generate` (#8620 )	2023-07-27 17:59:49 +08:00
Zhao Changmin	5b484ab48d	LLM: Support load_low_bit loading models in shards format (#8612 ) * shards_model --------- Co-authored-by: leonardozcm <leonaordo1997zcm@gmail.com>	2023-07-26 13:30:01 +08:00
Zhao Changmin	af201052db	avoid malloc all missing keys in fp32 (#8600 )	2023-07-25 09:48:51 +08:00
Yuwen Hu	ba42a6da63	[LLM] Set torch_dtype default value to 'auto' for transformers low bit from_pretrained API	2023-07-21 17:55:00 +08:00
Yang Wang	feb3af0567	Optimize transformer int4 memory footprint (#8579 )	2023-07-20 20:22:13 -07:00
Yang Wang	57e880f63a	[LLM] use pytorch linear for large input matrix (#8492 ) * use pytorch linear for large input matrix * only works on server * fix style * optimize memory * first check server * revert * address comments * fix style	2023-07-20 09:54:25 -07:00
Zhao Changmin	e680af45ea	LLM: Optimize Langchain Pipeline (#8561 ) * LLM: Optimize Langchain Pipeline * load in low bit	2023-07-19 17:43:13 +08:00
Zhao Changmin	49d636e295	[LLM] whisper model transformer int4 verification and example (#8511 ) * LLM: transformer api support * va * example * revert * pep8 * pep8	2023-07-19 08:33:20 +08:00
Yina Chen	9a7bc17ca1	[LLM] llm supports vnni link on windows (#8543 ) * support win vnni link * fix style * fix style * use isa_checker * fix * typo * fix * update	2023-07-18 16:43:45 +08:00
Yina Chen	4582b6939d	[LLM]llm gptneox chat (#8527 ) * linux * support win * merge upstream & support vnni lib in chat	2023-07-18 11:17:17 +08:00
Xin Qiu	fccae91461	Add load_low_bit save_load_bit to AutoModelForCausalLM (#8531 ) * transformers save_low_bit load_low_bit * update example and add readme * update * update * update * add ut * update	2023-07-17 15:29:55 +08:00
xingyuan li	e57db777e0	[LLM] Setup.py & llm-cli update for windows vnni binary files (#8537 ) * update setup.py * update llm-cli	2023-07-17 12:28:38 +09:00
Yishuo Wang	6320bf201e	LLM: fix memory access violation (#8519 )	2023-07-13 17:08:08 +08:00
Xin Qiu	90e3d86bce	rename low bit type name (#8512 ) * change qx_0 to sym_intx * update * fix typo * update * fix type * fix style * add python doc * meet code review * fix style	2023-07-13 15:53:31 +08:00
Zhao Changmin	ba0da17b40	LLM: Support AutoModelForSeq2SeqLM transformer API (#8449 ) * LLM: support AutoModelForSeq2SeqLM transformer API	2023-07-13 13:33:51 +08:00
Yishuo Wang	86b5938075	LLM: fix llm pybinding (#8509 )	2023-07-13 10:27:08 +08:00
Zhao Changmin	23f6a4c21f	LLM: Optimize transformer int4 loading (#8499 ) * LLM: Optimize transformer int4 loading	2023-07-12 15:25:42 +08:00
Yishuo Wang	dd3f953288	Support vnni check (#8497 )	2023-07-12 10:11:15 +08:00
Xin Qiu	cd7a980ec4	Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 (#8481 ) * quant in Q4 5 8 * meet code review * update readme * style * update * fix error * fix error * update * fix style * update * Update README.md * Add load_in_low_bit	2023-07-12 08:23:08 +08:00
Yishuo Wang	db39d0a6b3	LLM: disable mmap by default for better performance (#8467 )	2023-07-11 09:26:26 +08:00
Zhao Changmin	81d655cda9	LLM: transformer int4 save and load (#8462 ) * LLM: transformer int4 save and load	2023-07-10 16:34:41 +08:00
binbin Deng	d489775d2c	LLM: fix inconsistency between output token number and `max_new_token` (#8479 )	2023-07-07 17:31:05 +08:00
Ruonan Wang	2f77d485d8	Llm: Initial support of langchain transformer int4 API (#8459 ) * first commit of transformer int4 and pipeline * basic examples temp save for embeddings support embeddings and docqa exaple * fix based on comment * small fix	2023-07-06 17:50:05 +08:00
binbin Deng	14626fe05b	LLM: refactor transformers and langchain class name (#8470 )	2023-07-06 17:16:44 +08:00
binbin Deng	77808fa124	LLM: fix n_batch in starcoder pybinding (#8461 )	2023-07-05 17:06:50 +08:00
Yina Chen	f2bb469847	[WIP] LLm llm-cli chat mode (#8440 ) * fix timezone * temp * Update linux interactive mode * modify init text for interactive mode * meet comments * update * win script * meet comments	2023-07-05 14:04:17 +08:00
binbin Deng	e54e52b438	LLM: fix n_batch in bloom pybinding (#8454 )	2023-07-04 15:10:32 +08:00
Yang Wang	449aea7ffc	Optimize transformer int4 loading memory (#8400 ) * Optimize transformer int4 loading memory * move cast to convert * default settting low_cpu_mem_usage	2023-06-30 20:12:12 -07:00
Zhao Changmin	cc76ec809a	check out dir (#8395 )	2023-06-27 21:28:39 +08:00
Xin Qiu	e68d631c0a	gptq2ggml: support loading safetensors model. (#8401 ) * update convert gptq to ggml * update convert gptq to ggml * gptq to ggml * update script * meet code review * meet code review	2023-06-27 11:19:33 +08:00
binbin Deng	19e19efb4c	LLM: raise warning instead of error when use unsupported parameters (#8382 )	2023-06-26 13:23:55 +08:00
Shengsheng Huang	c113ecb929	[LLM] langchain bloom, UT's, default parameters (#8357 ) * update langchain default parameters to align w/ api * add ut's for llm and embeddings * update inference test script to install langchain deps * update tests workflows --------- Co-authored-by: leonardozcm <changmin.zhao@intel.com>	2023-06-25 17:38:00 +08:00
Shengsheng Huang	446175cc05	transformer api refactor (#8389 ) * transformer api refactor * fix style * add huggingface tokenizer usage in example and make ggml tokenzizer as option 1 and huggingface tokenizer as option 2 * fix style	2023-06-25 17:15:33 +08:00
Yang Wang	ce6d06eb0a	Support directly quantizing huggingface transformers into 4bit format (#8371 ) * Support directly quantizing huggingface transformers into 4bit format * refine example * license * fix bias * address comments * move to ggml transformers * fix example * fix style * fix style * address comments * rename * change API * fix style * add lm head to conversion * address comments	2023-06-25 16:35:06 +08:00
binbin Deng	03c5fb71a8	LLM: fix ModuleNotFoundError when use llm-cli (#8378 )	2023-06-21 15:03:14 +08:00
Ruonan Wang	7296453f07	LLM: support starcoder in llm-cli (#8377 ) * support starcoder in cli * small fix	2023-06-21 14:38:30 +08:00
Ruonan Wang	50af0251e4	LLM: First commit of StarCoder pybinding (#8354 ) * first commit of starcoder * update setup.py and fix style * add starcoder_cpp, fix style * fix style * support windows binary * update pybinding * fix style, add avx2 binary * small fix * fix style	2023-06-21 13:23:06 +08:00
Yuwen Hu	7ef1c890eb	[LLM] Supports GPTQ convert in transfomers-like API, and supports folder outfile for `llm-convert` (#8366 ) * Add docstrings to llm_convert * Small docstrings fix * Unify outfile type to be a folder path for either gptq or pth model_format * Supports gptq model input for from_pretrained * Fix example and readme * Small fix * Python style fix * Bug fix in llm_convert * Python style check * Fix based on comments * Small fix	2023-06-20 17:42:38 +08:00
Zhao Changmin	4ec46afa4f	LLM: Align converting GPTQ model API with transformer style (#8365 ) * LLM: Align GPTQ API with transformer style	2023-06-20 14:27:41 +08:00
Ruonan Wang	f99d348954	LLM: convert and quantize support for StarCoder (#8359 ) * basic support for starcoder * update from_pretrained * fix bug and fix style	2023-06-20 13:39:35 +08:00
binbin Deng	5f4f399ca7	LLM: fix bugs during supporting bloom in langchain (#8362 )	2023-06-20 13:30:37 +08:00
Zhao Changmin	30ac9a70f5	LLM: fix expected 2 blank lines (#8360 )	2023-06-19 18:10:02 +08:00
Zhao Changmin	c256cd136b	LLM: Fix ggml return value (#8358 ) * ggml return original value	2023-06-19 17:02:56 +08:00
Zhao Changmin	d4027d7164	fix typos in llm_convert (#8355 )	2023-06-19 16:17:21 +08:00
Zhao Changmin	4d177ca0a1	LLM: Merge convert pth/gptq model script into one shell script (#8348 ) * convert model in one * model type * license * readme and pep8 * ut path * rename * readme * fix docs * without lines	2023-06-19 11:50:05 +08:00
Ruonan Wang	9daf543e2f	LLM: Update convert of gpenox to sync with new libgptneox.so (#8345 )	2023-06-15 16:28:50 +08:00
Ruonan Wang	f7f4e65788	LLM: support int8 and tmp_path for `from_pretrained` (#8338 )	2023-06-15 14:48:21 +08:00
Ruonan Wang	5094970175	LLM: update `convert_model` to support int8 (#8326 ) * update example and convert_model for int8 * reset example * fix style	2023-06-15 09:25:07 +08:00
binbin Deng	f64e703083	LLM: first add `_tokenize`, `detokenize` and `_generate` for bloom pybinding (#8316 )	2023-06-14 17:29:57 +08:00
Xin Qiu	5576679a92	add convert-gptq-to-ggml.py to bigdl-llama (#8298 )	2023-06-14 14:51:51 +08:00
Ruonan Wang	a6c4b733cb	LLM: Update subprocess to show error message (#8323 ) * update subprocess * fix style	2023-06-13 16:43:37 +08:00
Shengsheng Huang	02c583144c	[LLM] langchain integrations and examples (#8256 ) * langchain intergrations and examples * add licences and rename * add licences * fix license issues and change backbone to model_family * update examples to use model_family param * fix linting * fix code style * exclude langchain integration from stylecheck * update langchain examples and update integrations based on latets changes * update simple llama-cpp-python style API example * remove bloom in README * change default n_threads to 2 and remove redundant code --------- Co-authored-by: leonardozcm <changmin.zhao@intel.com>	2023-06-12 19:22:07 +08:00
xingyuan li	c4028d507c	[LLM] Add unified default value for cli programs (#8310 ) * add unified default value for threads and n_predict	2023-06-12 16:30:27 +08:00
binbin Deng	5d5da7b2c7	LLM: optimize namespace and remove unused import logic (#8302 )	2023-06-09 15:17:49 +08:00
Ruonan Wang	5d0e130605	LLM: fix convert path error of gptneox and bloom on windows (#8304 )	2023-06-09 10:10:19 +08:00
Yina Chen	7bfa0fcdf9	fix style (#8300 )	2023-06-08 16:52:17 +08:00
Yina Chen	637b72f2ad	[LLM] llm transformers api support batch actions (#8288 ) * llm transformers api support batch actions * align with transformer * meet comment	2023-06-08 15:10:08 +08:00
xingyuan li	ea3cf6783e	LLM: Command line wrapper for llama/bloom/gptneox (#8239 ) * add llama/bloom/gptneox wrapper * add readme * upload binary main file	2023-06-08 14:55:22 +08:00
binbin Deng	08bdfce2d8	LLM: avoid unnecessary import torch except converting process (#8297 )	2023-06-08 14:24:58 +08:00
binbin Deng	f9e2bda04a	LLM: add stop words and enhance output for bloom pybinding (#8280 )	2023-06-08 14:06:06 +08:00
Yina Chen	1571ba6425	remove unused import gptneox_cpp (#8293 )	2023-06-08 11:04:47 +08:00
Yina Chen	2c037e892b	fix-transformers-neox (#8285 )	2023-06-07 14:44:43 +08:00
Ruonan Wang	39ad68e786	LLM: enhancements for `convert_model` (#8278 ) * update convert * change output name * add discription for input_path, add check for input_values * basic support for command line * fix style * update based on comment * update based on comment	2023-06-07 13:22:14 +08:00
Junwei Deng	2d14e593f0	LLM: Support `generate(max_new_tokens=...)`, `tokenize` and `decode` for transformers-like API (#8283 ) * first push * fix pep8	2023-06-07 11:50:35 +08:00
Yina Chen	11cd2a07e0	[LLM] llm transformers format interface first part (#8276 ) * llm-transformers-format * update * fix style	2023-06-06 17:17:37 +08:00
Pingchuan Ma (Henry)	a3f353b939	[LLM] add long time loading disclaimer for LLM model converting (#8279 )	2023-06-06 17:15:13 +08:00
Yuwen Hu	64bc123dd3	[LLM] Add transformers-like API from_pretrained (#8271 ) * Init commit for bigdl.llm.transformers.AutoModelForCausalLM * Temp change to avoid name conflicts with external transformers lib * Support downloading model from huggingface * Small python style fix * Change location of transformers to avoid library conflicts * Add return value for converted ggml binary ckpt path for convert_model * Avoid repeated loading of shared library and adding some comments * Small fix * Path type fix anddocstring fix * Small fix * Small fix * Change cache dir to pwd	2023-06-06 17:04:16 +08:00
xingyuan li	38be471140	[LLM] convert_model bug fix (#8274 ) * Renamed all bloomz to bloom in ggml/model & utls/convert_util.py * Add an optional parameter for specific the model conversion path to avoid running out of disk space	2023-06-06 15:16:42 +08:00
Ruonan Wang	8bd2992a8d	LLM: accelerate sample of gptneox and update quantize (#8262 ) * update quantize & accelerate sample * fix style check * fix style error	2023-06-05 15:36:00 +08:00
Jun Wang	2bc0e7abbb	[llm] Add convert_model api (#8244 ) * add convert_model api * change the model_path to input_path * map int4 to q4_0 * fix blank line * change bloomz to bloom * remove default model_family * change dtype to lower first	2023-06-03 10:18:29 +08:00
Yuwen Hu	e290660b20	[LLM] Add so shared library for Bloom family models (#8258 ) * Add so file downloading for bloom family models * Supports selecting of avx2/avx512 so for bloom	2023-06-02 17:39:40 +08:00
Yina Chen	657ea0ee50	[LLM] Fix linux load libs for NeoX and llama (#8257 ) * init * add lisence * fix style	2023-06-02 17:03:17 +08:00
Yuwen Hu	286b010bf1	[LLM] First push for Bloomz pybinding (#8252 ) * Initial commit to move bloom pybinding to bigdl-llm * Revise path for shared library * Small fix	2023-06-02 14:41:04 +08:00
Junwei Deng	350d31a472	LLM: first push gptneox pybinding (#8234 ) * first push gptneox pybinding * fix * fix code style and add license --------- Co-authored-by: binbin <binbin1.deng@intel.com>	2023-06-02 09:28:00 +08:00
binbin Deng	3a9aa23835	LLM: fix and update related license in llama pybinding (#8250 )	2023-06-01 17:09:15 +08:00
binbin Deng	e56f24b424	LLM: first push llama pybinding (#8241 ) * first push llama binding * update dll	2023-06-01 10:59:15 +08:00
binbin Deng	8421af51ae	LLM: support converting to ggml format (#8235 ) * add convert * fix * fix * fix * try * test * update check * fix * fix	2023-05-31 15:20:06 +08:00
Ruonan Wang	c890609d1e	LLM: Support package/quantize for llama.cpp/redpajama.cpp on Windows (#8236 ) * support windows of llama.cpp * update quantize * update version of llama.cp submodule * add gptneox.dll * add quantize-gptneox.exe	2023-05-31 14:47:12 +08:00
Pingchuan Ma (Henry)	1f913a6941	[LLM] Add LLM pep8 coding style checking (#8233 ) * add LLM pep8 coding checking * resolve bugs in testing scripts and code style revision	2023-05-30 15:58:14 +08:00
Ruonan Wang	4638b85f3e	[llm] Initial support of package and quantize (#8228 ) * first commit of CMakeFiles.txt to include llama & gptneox * initial support of quantize * update cmake for only consider linux now * support quantize interface * update based on comment	2023-05-26 16:36:46 +08:00
Junwei Deng	ea22416525	LLM: add first round files (#8225 )	2023-05-25 11:29:18 +08:00

1 2 3 4 5

233 commits