ipex-llm

Author	SHA1	Message	Date
Yina Chen	35fdf94031	[LLM]Arc starcoder example (#8814 ) * arc starcoder example init * add log * meet comments	2023-08-28 16:48:00 +08:00
xingyuan li	6a902b892e	[LLM] Add amx build step (#8822 ) * add amx build step	2023-08-28 17:41:18 +09:00
Ruonan Wang	eae92bc7da	llm: quick fix path (#8810 )	2023-08-25 16:02:31 +08:00
Ruonan Wang	0186f3ab2f	llm: update all ARC int4 examples (#8809 ) * update GPU examples * update other examples * fix * update based on comment	2023-08-25 15:26:10 +08:00
Song Jiaming	b8b1b6888b	[LLM] Performance test (#8796 )	2023-08-25 14:31:45 +08:00
Yang Wang	9d0f6a8cce	rename math.py in example to avoid conflict (#8805 )	2023-08-24 21:06:31 -07:00
SONG Ge	d2926c7672	[LLM] Unify Langchain Native and Transformers LLM API (#8752 ) * deprecate BigDLNativeTransformers and add specific LMEmbedding method * deprecate and add LM methods for langchain llms * add native params to native langchain * new imple for embedding * move ut from bigdlnative to casual llm * rename embeddings api and examples update align with usage updating * docqa example hot-fix * add more api docs * add langchain ut for starcoder * support model_kwargs for transformer methods when calling causalLM and add ut * ut fix for transformers embedding * update for langchain causal supporting transformers * remove model_family in readme doc * add model_families params to support more models * update api docs and remove chatglm embeddings for now * remove chatglm embeddings in examples * new refactor for ut to add bloom and transformers llama ut * disable llama transformers embedding ut	2023-08-25 11:14:21 +08:00
binbin Deng	5582872744	LLM: update chatglm example to be more friendly for beginners (#8795 )	2023-08-25 10:55:01 +08:00
Yina Chen	7c37424a63	Fix voice assistant example input error on Linux (#8799 ) * fix linux error * update * remove alsa log	2023-08-25 10:47:27 +08:00
Yang Wang	bf3591e2ff	Optimize chatglm2 for bf16 (#8725 ) * make chatglm works with bf16 * fix style * support chatglm v1 * fix style * fix style * add chatglm2 file	2023-08-24 10:04:25 -07:00
xingyuan li	c94bdd3791	[LLM] Merge windows & linux nightly test (#8756 ) * fix download statement * add check before build wheel * use curl to upload files * windows unittest won't upload converted model * split llm-cli test into windows & linux versions * update tempdir create way * fix nightly converted model name * windows llm-cli starcoder test temply disabled * remove taskset dependency * rename llm_unit_tests_linux to llm_unit_tests	2023-08-23 12:48:41 +09:00
Jason Dai	dcadd09154	Update llm document (#8784 )	2023-08-21 22:34:44 +08:00
Yishuo Wang	611c1fb628	[LLM] change default n_threads of native int4 langchain API (#8779 )	2023-08-21 13:30:12 +08:00
Yishuo Wang	3d1f2b44f8	LLM: change default n_threads of native int4 models (#8776 )	2023-08-18 15:46:19 +08:00
Yishuo Wang	2ba2133613	fix starcoder chinese output (#8773 )	2023-08-18 13:37:02 +08:00
binbin Deng	548f7a6cf7	LLM: update convert of llama family to support llama2-70B (#8747 )	2023-08-18 09:30:35 +08:00
Yina Chen	4afea496ab	support q8_0 (#8765 )	2023-08-17 15:06:36 +08:00
Ruonan Wang	e9aa2bd890	LLM: reduce GPU 1st token latency and update example (#8763 ) * reduce 1st token latency * update example * fix * fix style * update readme of gpu benchmark	2023-08-16 18:01:23 +08:00
binbin Deng	06609d9260	LLM: add qwen example on arc (#8757 )	2023-08-16 17:11:08 +08:00
SONG Ge	f4164e4492	[BigDL LLM] Update readme for unifying transformers API (#8737 ) * update readme doc * fix readthedocs error * update comment * update exception error info * invalidInputError instead * fix readme typo error and remove import error * fix more typo	2023-08-16 14:22:32 +08:00
Song Jiaming	c1f9af6d97	[LLM] chatglm example and transformers low-bit examples (#8751 )	2023-08-16 11:41:44 +08:00
Ruonan Wang	8805186f2f	LLM: add benchmark tool for gpu (#8760 ) * add benchmark tool for gpu * update	2023-08-16 11:22:10 +08:00
binbin Deng	97283c033c	LLM: add falcon example on arc (#8742 )	2023-08-15 17:38:38 +08:00
binbin Deng	8c55911308	LLM: add baichuan-13B on arc example (#8755 )	2023-08-15 15:07:04 +08:00
binbin Deng	be2ae6eb7c	LLM: fix langchain native int4 voiceasistant example (#8750 )	2023-08-14 17:23:33 +08:00
Ruonan Wang	d28ad8f7db	LLM: add whisper example for arc transformer int4 (#8749 ) * add whisper example for arc int4 * fix	2023-08-14 17:05:48 +08:00
Yishuo Wang	77844125f2	[LLM] Support chatglm cache (#8745 )	2023-08-14 15:10:46 +08:00
Ruonan Wang	faaccb64a2	LLM: add chatglm2 example for Arc (#8741 ) * add chatglm2 example * update * fix readme	2023-08-14 10:43:08 +08:00
binbin Deng	b10d7e1adf	LLM: add mpt example on arc (#8723 )	2023-08-14 09:40:01 +08:00
binbin Deng	e9a1afffc5	LLM: add internlm example on arc (#8722 )	2023-08-14 09:39:39 +08:00
SONG Ge	aceea4dc29	[LLM] Unify Transformers and Native API (#8713 ) * re-open pr to run on latest runner * re-add examples and ut * rename ut and move deprecate to warning instead of raising an error info * ut fix	2023-08-11 19:45:47 +08:00
Yishuo Wang	f91035c298	[LLM] fix chatglm native int4 emoji output (#8739 )	2023-08-11 15:38:41 +08:00
binbin Deng	77efcf7b1d	LLM: fix ChatGLM2 native int4 stream output (#8733 )	2023-08-11 14:51:50 +08:00
Ruonan Wang	ca3e59a1dc	LLM: support stop for starcoder native int4 stream (#8734 )	2023-08-11 14:51:30 +08:00
Song Jiaming	e292dfd970	[WIP] LLM transformers api for langchain (#8642 )	2023-08-11 13:32:35 +08:00
Yishuo Wang	3d5a7484a2	[LLM] fix bloom and starcoder memory release (#8728 )	2023-08-11 11:18:19 +08:00
xingyuan li	02ec01cb48	[LLM] Add bigdl-core-xe dependency when installing bigdl-llm[xpu] (#8716 ) * add bigdl-core-xe dependency	2023-08-10 17:41:42 +09:00
Shengsheng Huang	7c56c39e36	Fix GPU examples READ to use bigdl-core-xe (#8714 ) * Update README.md * Update README.md	2023-08-10 12:53:49 +08:00
Yina Chen	6d1ca88aac	add voice assistant example (#8711 )	2023-08-10 12:42:14 +08:00
Song Jiaming	e717e304a6	LLM first example test and template (#8658 )	2023-08-10 10:03:11 +08:00
Ruonan Wang	1a7b698a83	[LLM] support ipex arc int4 & add basic llama2 example (#8700 ) * first support of xpu * make it works on gpu update setup update add GPU llama2 examples add use_optimize flag to disbale optimize for gpu fix style update gpu exmaple readme fix * update example, and update env * fix setup to add cpp files * replace jit with aot to avoid data leak * rename to bigdl-core-xe * update installation in example readme	2023-08-09 22:20:32 +08:00
Jason Dai	d03218674a	Update llm readme (#8703 )	2023-08-09 14:47:26 +08:00
Kai Huang	1b65288bdb	Add api doc for LLM (#8605 ) * api doc initial * update desc	2023-08-08 18:17:16 +08:00
binbin Deng	4c44153584	LLM: add Qwen transformers int4 example (#8699 )	2023-08-08 11:23:09 +08:00
Yishuo Wang	710b9b8982	[LLM] add linux chatglm pybinding binary file (#8698 )	2023-08-08 11:16:30 +08:00
binbin Deng	ea5d7aff5b	LLM: add chatglm native int4 transformers API (#8695 )	2023-08-07 17:52:47 +08:00
Yishuo Wang	6da830cf7e	[LLM] add chaglm pybinding binary file in setup.py (#8692 )	2023-08-07 09:41:03 +08:00
Cengguang Zhang	ebcf75d506	feat: set transformers lib version. (#8683 )	2023-08-04 15:01:59 +08:00
Yishuo Wang	ef08250c21	[LLM] chatglm pybinding support (#8672 )	2023-08-04 14:27:29 +08:00
Yishuo Wang	5837cc424a	[LLM] add chatglm pybinding binary file release (#8677 )	2023-08-04 11:45:27 +08:00
Yang Wang	b6468bac43	optimize chatglm2 long sequence (#8662 ) * add chatglm2 * optimize a little * optimize chatglm long sequence * fix style * address comments and fix style * fix bug	2023-08-03 17:56:24 -07:00
Yang Wang	3407f87075	Fix llama kv cache bug (#8674 )	2023-08-03 17:54:55 -07:00
Yina Chen	59903ea668	llm linux support avx & avx2 (#8669 )	2023-08-03 17:10:59 +08:00
xingyuan li	110cfb5546	[LLM] Remove old windows nightly test code (#8668 ) Remove old Windows nightly test code triggered by task scheduler Add new Windows nightly workflow for nightly testing	2023-08-03 17:12:23 +09:00
xingyuan li	610084e3c0	[LLM] Complete windows unittest (#8611 ) * add windows nightly test workflow * use github runner to run pr test * model load should use lowbit * remove tmp dir after testing	2023-08-03 14:48:42 +09:00
binbin Deng	a15a2516e6	add (#8659 )	2023-08-03 10:12:10 +08:00
Xin Qiu	0714888705	build windows avx dll (#8657 ) * windows avx * add to actions	2023-08-03 02:06:24 +08:00
Yina Chen	119bf6d710	[LLM] Support linux cpp dynamic load .so (#8655 ) * support linux cpp dynamic load .so * update cli	2023-08-02 20:15:45 +08:00
Zhao Changmin	ca998cc6f2	LLM: Mute shape mismatch output (#8601 ) * LLM: Mute shape mismatch output	2023-08-02 16:46:22 +08:00
Zhao Changmin	04c713ef06	LLM: Disable transformer api `pretraining_tp` (#8645 ) * disable pretraining_tp	2023-08-02 11:26:01 +08:00
binbin Deng	6fc31bb4cf	LLM: first update descriptions for ChatGLM transformers int4 example (#8646 )	2023-08-02 11:00:56 +08:00
Yang Wang	cbeae97a26	Optimize Llama Attention to to reduce KV cache memory copy (#8580 ) * Optimize llama attention to reduce KV cache memory copy * fix bug * fix style * remove git * fix style * fix style * fix style * fix tests * move llama attention to another file * revert * fix style * remove jit * fix	2023-08-01 16:37:58 -07:00
binbin Deng	39994738d1	LLM: add chat & stream chat example for ChatGLM2 transformers int4 (#8636 )	2023-08-01 14:57:45 +08:00
xingyuan li	cdfbe652ca	[LLM] Add chatglm support for llm-cli (#8641 ) * add chatglm build * add llm-cli support * update git * install cmake * add ut for chatglm * add files to setup * fix bug cause permission error when sf lack file	2023-08-01 14:30:17 +09:00
Zhao Changmin	d6cbfc6d2c	LLM: Add requirements in whisper example (#8644 ) * LLM: Add requirements in whisper example	2023-08-01 12:07:14 +08:00
Zhao Changmin	3e10260c6d	LLM: llm-convert support chatglm family (#8643 ) * convert chatglm	2023-08-01 11:16:18 +08:00
Yina Chen	a607972c0b	[LLM]LLM windows load -api.dll (#8631 ) * temp * update * revert setup.py	2023-07-31 13:47:20 +08:00
xingyuan li	3361b66449	[LLM] Revert llm-cli to disable selecting executables on Windows (#8630 ) * revert vnni file select * revert setup.py * add model-api.dll	2023-07-31 11:15:44 +09:00
binbin Deng	3dbab9087b	LLM: add llama2-7b native int4 example (#8629 )	2023-07-28 10:56:16 +08:00
binbin Deng	fb32fefcbe	LLM: support tensor input of native int4 `generate` (#8620 )	2023-07-27 17:59:49 +08:00
Zhao Changmin	5b484ab48d	LLM: Support load_low_bit loading models in shards format (#8612 ) * shards_model --------- Co-authored-by: leonardozcm <leonaordo1997zcm@gmail.com>	2023-07-26 13:30:01 +08:00
binbin Deng	fcf8c085e3	LLM: add llama2-13b native int4 example (#8613 )	2023-07-26 10:12:52 +08:00
Song Jiaming	650b82fa6e	[LLM] add CausalLM and Speech UT (#8597 )	2023-07-25 11:22:36 +08:00
Zhao Changmin	af201052db	avoid malloc all missing keys in fp32 (#8600 )	2023-07-25 09:48:51 +08:00
binbin Deng	3f24202e4c	[LLM] Add more transformers int4 example (Llama 2) (#8602 )	2023-07-25 09:21:12 +08:00
Jason Dai	0f8201c730	llm readme update (#8595 )	2023-07-24 09:47:49 +08:00
Yuwen Hu	ba42a6da63	[LLM] Set torch_dtype default value to 'auto' for transformers low bit from_pretrained API	2023-07-21 17:55:00 +08:00
Yuwen Hu	bbde423349	[LLM] Add current Linux UT inference tests to nightly tests (#8578 ) * Add current inference uts to nightly tests * Change test model from chatglm-6b to chatglm2-6b * Add thread num env variable for nightly test * Fix urls * Small fix	2023-07-21 13:26:38 +08:00
Yang Wang	feb3af0567	Optimize transformer int4 memory footprint (#8579 )	2023-07-20 20:22:13 -07:00
Yang Wang	57e880f63a	[LLM] use pytorch linear for large input matrix (#8492 ) * use pytorch linear for large input matrix * only works on server * fix style * optimize memory * first check server * revert * address comments * fix style	2023-07-20 09:54:25 -07:00
Yuwen Hu	6504e31a97	Small fix (#8577 )	2023-07-20 16:37:04 +08:00
Yuwen Hu	2266ca7d2b	[LLM] Small updates to transformers int4 ut (#8574 ) * Small fix to transformers int4 ut * Small fix	2023-07-20 13:20:25 +08:00
xingyuan li	7b8d9c1b0d	[LLM] Add dependency file check in setup.py (#8565 ) * add package file check	2023-07-20 14:20:08 +09:00
Song Jiaming	411d896636	LLM first transformers UT (#8514 ) * ut * transformers api first ut * name * dir issue * use chatglm instead of chatglm2 * omp * set omp in sh * source * taskset * test * test omp * add test	2023-07-20 10:16:27 +08:00
Yuwen Hu	cad78740a7	[LLM] Small fixes to the Whisper transformers INT4 example (#8573 ) * Small fixes to the whisper example * Small fix * Small fix	2023-07-20 10:11:33 +08:00
binbin Deng	7a9fdf74df	[LLM] Add more transformers int4 example (Dolly v2) (#8571 ) * add * add trust_remote_mode	2023-07-19 18:20:16 +08:00
Zhao Changmin	e680af45ea	LLM: Optimize Langchain Pipeline (#8561 ) * LLM: Optimize Langchain Pipeline * load in low bit	2023-07-19 17:43:13 +08:00
Shengsheng Huang	616b7cb0a2	add more langchain examples (#8542 ) * update langchain descriptions * add mathchain example * update readme * update readme	2023-07-19 17:42:18 +08:00
binbin Deng	457571b44e	[LLM] Add more transformers int4 example (InternLM) (#8557 )	2023-07-19 15:15:38 +08:00
xingyuan li	b6510fa054	fix move/download dll step (#8564 )	2023-07-19 12:17:07 +09:00
xingyuan li	c52ed37745	fix starcoder dll name (#8563 )	2023-07-19 11:55:06 +09:00
Zhao Changmin	3dbe3bf18e	transformer_int4 (#8553 )	2023-07-19 08:33:58 +08:00
Zhao Changmin	49d636e295	[LLM] whisper model transformer int4 verification and example (#8511 ) * LLM: transformer api support * va * example * revert * pep8 * pep8	2023-07-19 08:33:20 +08:00
Yina Chen	9a7bc17ca1	[LLM] llm supports vnni link on windows (#8543 ) * support win vnni link * fix style * fix style * use isa_checker * fix * typo * fix * update	2023-07-18 16:43:45 +08:00
Yina Chen	4582b6939d	[LLM]llm gptneox chat (#8527 ) * linux * support win * merge upstream & support vnni lib in chat	2023-07-18 11:17:17 +08:00
Jason Dai	1ebc43b151	Update READMEs (#8554 )	2023-07-18 11:06:06 +08:00
Yuwen Hu	ee70977c07	[LLM] Transformers int4 example small typo fixes (#8550 )	2023-07-17 18:15:32 +08:00
Yuwen Hu	1344f50f75	[LLM] Add more transformers int4 examples (Falcon) (#8546 ) * Initial commit * Add Falcon examples and other small fix * Small fix * Small fix * Update based on comments * Small fix	2023-07-17 17:36:21 +08:00
Yuwen Hu	de772e7a80	Update mpt for prompt tuning (#8547 )	2023-07-17 17:33:54 +08:00
binbin Deng	f1fd746722	[LLM] Add more transformers int4 example (vicuna) (#8544 )	2023-07-17 16:59:55 +08:00
Xin Qiu	fccae91461	Add load_low_bit save_load_bit to AutoModelForCausalLM (#8531 ) * transformers save_low_bit load_low_bit * update example and add readme * update * update * update * add ut * update	2023-07-17 15:29:55 +08:00
binbin Deng	808a64d53a	[LLM] Add more transformers int4 example (starcoder) (#8540 )	2023-07-17 14:41:19 +08:00
xingyuan li	e57db777e0	[LLM] Setup.py & llm-cli update for windows vnni binary files (#8537 ) * update setup.py * update llm-cli	2023-07-17 12:28:38 +09:00
binbin Deng	f56b5ade4c	[LLM] Add more transformers int4 example (chatglm2) (#8539 )	2023-07-14 17:58:33 +08:00
binbin Deng	92d33cf35a	[LLM] Add more transformers int4 example (phoenix) (#8520 )	2023-07-14 17:58:04 +08:00
Yuwen Hu	e0f0def279	Remove unused example for now (#8538 )	2023-07-14 17:32:50 +08:00
binbin Deng	b397e40015	[LLM] Add more transformers int4 example (RedPajama) (#8523 )	2023-07-14 17:30:28 +08:00
Yuwen Hu	7bf3e10415	[LLM] Add more int4 transformers examples (MOSS) (#8532 ) * Add Moss example * Small fix	2023-07-14 16:41:41 +08:00
Yuwen Hu	59b7287ef5	[LLM] Add more transformers int4 example (Baichuan) (#8522 ) * Add example model Baichuan * Small updates to client windows settings * Small refactor * Small fix	2023-07-14 16:41:29 +08:00
Yuwen Hu	ca6e38607c	[LLM] Add more transformers examples (ChatGLM) (#8521 ) * Add example for chatglm v1 and other small fixes * Small fix * Small further fix * Small fix * Update based on comments & updates for client windows recommended settingts * Small fix * Small refactor * Small fix * Small fix * Small fix to dolly v1 * Small fix	2023-07-14 16:41:13 +08:00
xingyuan li	c87853233b	[LLM] Add windows vnni binary build step (#8518 ) * add windows vnni build step * update build info * add download command	2023-07-14 17:24:39 +09:00
Yishuo Wang	6320bf201e	LLM: fix memory access violation (#8519 )	2023-07-13 17:08:08 +08:00
xingyuan li	60c2c0c3dc	Bug fix for merged pr #8503 (#8516 )	2023-07-13 17:26:30 +09:00
Yuwen Hu	349bcb4bae	[LLM] Add more transformers int4 example (Dolly v1) (#8517 ) * Initial commit for dolly v1 * Add example for Dolly v1 and other small fix * Small output updates * Small fix * fix based on comments	2023-07-13 16:13:47 +08:00
Xin Qiu	90e3d86bce	rename low bit type name (#8512 ) * change qx_0 to sym_intx * update * fix typo * update * fix type * fix style * add python doc * meet code review * fix style	2023-07-13 15:53:31 +08:00
xingyuan li	4f152b4e3a	[LLM] Merge the llm.cpp build and the pypi release (#8503 ) * checkout llm.cpp to build new binary * use artifact to get latest built binary files * rename quantize * modify all release workflow	2023-07-13 16:34:24 +09:00
Yuwen Hu	bcde8ec83e	[LLM] Small fix to MPT Example (#8513 )	2023-07-13 14:33:21 +08:00
Zhao Changmin	ba0da17b40	LLM: Support AutoModelForSeq2SeqLM transformer API (#8449 ) * LLM: support AutoModelForSeq2SeqLM transformer API	2023-07-13 13:33:51 +08:00
Yishuo Wang	86b5938075	LLM: fix llm pybinding (#8509 )	2023-07-13 10:27:08 +08:00
Yuwen Hu	fcc352eee3	[LLM] Add more transformers_int4 examples (MPT) (#8498 ) * Update transformers_int4 readme, and initial commit for mpt * Update example for mpt * Small fix and recover transformers_int4_pipeline_readme.md for now * Update based on comments * Small fix * Small fix * Update based on comments	2023-07-13 09:41:16 +08:00
Zhao Changmin	23f6a4c21f	LLM: Optimize transformer int4 loading (#8499 ) * LLM: Optimize transformer int4 loading	2023-07-12 15:25:42 +08:00
Yishuo Wang	dd3f953288	Support vnni check (#8497 )	2023-07-12 10:11:15 +08:00
Xin Qiu	cd7a980ec4	Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 (#8481 ) * quant in Q4 5 8 * meet code review * update readme * style * update * fix error * fix error * update * fix style * update * Update README.md * Add load_in_low_bit	2023-07-12 08:23:08 +08:00
Yishuo Wang	db39d0a6b3	LLM: disable mmap by default for better performance (#8467 )	2023-07-11 09:26:26 +08:00
Yuwen Hu	52c6b057d6	Initial LLM Transformers example refactor (#8491 )	2023-07-10 17:53:57 +08:00
Junwei Deng	254a7aa3c4	bigdl-llm: add voice-assistant example that are migrated from langchain use-case document (#8468 )	2023-07-10 16:51:45 +08:00
Yishuo Wang	98bac815e4	specify numpy version (#8489 )	2023-07-10 16:50:16 +08:00
Zhao Changmin	81d655cda9	LLM: transformer int4 save and load (#8462 ) * LLM: transformer int4 save and load	2023-07-10 16:34:41 +08:00
binbin Deng	d489775d2c	LLM: fix inconsistency between output token number and `max_new_token` (#8479 )	2023-07-07 17:31:05 +08:00
Jason Dai	bcc1eae322	Llm readme update (#8472 )	2023-07-06 20:04:04 +08:00
Ruonan Wang	2f77d485d8	Llm: Initial support of langchain transformer int4 API (#8459 ) * first commit of transformer int4 and pipeline * basic examples temp save for embeddings support embeddings and docqa exaple * fix based on comment * small fix	2023-07-06 17:50:05 +08:00
binbin Deng	14626fe05b	LLM: refactor transformers and langchain class name (#8470 )	2023-07-06 17:16:44 +08:00
binbin Deng	70bc8ea8ae	LLM: update langchain and cpp-python style API examples (#8456 )	2023-07-06 14:36:42 +08:00
Ruonan Wang	64b38e1dc8	llm: benchmark tool for transformers int4 (separate 1st token and rest) (#8460 ) * add benchmark utils * fix * fix bug and add readme * hidden latency data	2023-07-06 09:49:52 +08:00
binbin Deng	77808fa124	LLM: fix n_batch in starcoder pybinding (#8461 )	2023-07-05 17:06:50 +08:00
Yina Chen	f2bb469847	[WIP] LLm llm-cli chat mode (#8440 ) * fix timezone * temp * Update linux interactive mode * modify init text for interactive mode * meet comments * update * win script * meet comments	2023-07-05 14:04:17 +08:00
binbin Deng	1970bcf14e	LLM: add readme for transformer examples (#8444 )	2023-07-04 17:25:58 +08:00
binbin Deng	e54e52b438	LLM: fix n_batch in bloom pybinding (#8454 )	2023-07-04 15:10:32 +08:00
Yuwen Hu	372c775cb4	[LLM] Change default runner for LLM Linux tests to the ones with AVX512 (#8448 ) * Basic change for AVX512 runner * Remove conda channel and action rename * Small fix * Small fix and reduce peak convert disk space * Define n_threads based on runner status * Small thread num fix * Define thread_num for cli * test * Add self-hosted label and other small fix	2023-07-04 14:53:03 +08:00
Jason Dai	edf23a95be	Update llm readme (#8446 )	2023-07-03 16:58:44 +08:00
Jason Dai	a38f927fc0	Update README.md (#8439 )	2023-07-03 14:59:55 +08:00
binbin Deng	c956a46c40	LLM: first fix example/transformers (#8438 )	2023-07-03 14:13:33 +08:00
Jason Dai	e5b384aaa2	Update README.md (#8437 )	2023-07-03 10:54:29 +08:00
Yang Wang	449aea7ffc	Optimize transformer int4 loading memory (#8400 ) * Optimize transformer int4 loading memory * move cast to convert * default settting low_cpu_mem_usage	2023-06-30 20:12:12 -07:00
Jason Dai	2da21163f8	Update llm README.md (#8431 )	2023-06-30 19:41:17 +08:00
Junwei Deng	2fd751de7a	LLM: add a dev tool for getting glibc/glibcxx requirement (#8399 ) * add a dev tool * pep8 change	2023-06-30 11:09:50 +08:00
binbin Deng	146662bc0d	LLM: fix langchain windows failure (#8417 )	2023-06-30 09:59:10 +08:00
Yina Chen	6251ad8934	[LLM]Windows unittest (#8356 ) * win-unittest * update * update * try llama 7b * delete llama * update * add red-3b * only test red-3b * revert * add langchain * add dependency * delete langchain	2023-06-29 14:03:12 +08:00
Yina Chen	783aea3309	[LLM] LLM windows daily test (#8328 ) * llm-win-init * test action * test * add types * update for schtasks * update pytests * update * update * update doc * use stable ckpt from ftp instead of the converted model * download using batch -> manually * add starcoder test	2023-06-28 15:02:11 +08:00
binbin Deng	ca5a4b6e3a	LLM: update bloom and starcoder usage in transformers_int4_pipeline (#8406 )	2023-06-28 13:15:50 +08:00
Zhao Changmin	cc76ec809a	check out dir (#8395 )	2023-06-27 21:28:39 +08:00
Ruonan Wang	4be784a49d	LLM: add UT for starcoder (convert, inference) update examples and readme (#8379 ) * first commit to add path * update example and readme * update path * fix * update based on comment	2023-06-27 12:12:11 +08:00
Xin Qiu	e68d631c0a	gptq2ggml: support loading safetensors model. (#8401 ) * update convert gptq to ggml * update convert gptq to ggml * gptq to ggml * update script * meet code review * meet code review	2023-06-27 11:19:33 +08:00
Ruonan Wang	b9eae23c79	LLM: add chatglm-6b example for transformer_int4 usage (#8392 ) * add example for chatglm-6b * fix	2023-06-26 13:46:43 +08:00
binbin Deng	19e19efb4c	LLM: raise warning instead of error when use unsupported parameters (#8382 )	2023-06-26 13:23:55 +08:00
Shengsheng Huang	c113ecb929	[LLM] langchain bloom, UT's, default parameters (#8357 ) * update langchain default parameters to align w/ api * add ut's for llm and embeddings * update inference test script to install langchain deps * update tests workflows --------- Co-authored-by: leonardozcm <changmin.zhao@intel.com>	2023-06-25 17:38:00 +08:00
Shengsheng Huang	446175cc05	transformer api refactor (#8389 ) * transformer api refactor * fix style * add huggingface tokenizer usage in example and make ggml tokenzizer as option 1 and huggingface tokenizer as option 2 * fix style	2023-06-25 17:15:33 +08:00
Yang Wang	ce6d06eb0a	Support directly quantizing huggingface transformers into 4bit format (#8371 ) * Support directly quantizing huggingface transformers into 4bit format * refine example * license * fix bias * address comments * move to ggml transformers * fix example * fix style * fix style * address comments * rename * change API * fix style * add lm head to conversion * address comments	2023-06-25 16:35:06 +08:00
binbin Deng	03c5fb71a8	LLM: fix ModuleNotFoundError when use llm-cli (#8378 )	2023-06-21 15:03:14 +08:00
Ruonan Wang	7296453f07	LLM: support starcoder in llm-cli (#8377 ) * support starcoder in cli * small fix	2023-06-21 14:38:30 +08:00
Ruonan Wang	50af0251e4	LLM: First commit of StarCoder pybinding (#8354 ) * first commit of starcoder * update setup.py and fix style * add starcoder_cpp, fix style * fix style * support windows binary * update pybinding * fix style, add avx2 binary * small fix * fix style	2023-06-21 13:23:06 +08:00
Yuwen Hu	a7d66b7342	[LLM] README revise for `llm_convert` (#8374 ) * Small readme revise for llm_convert * Small fix	2023-06-21 10:04:34 +08:00
Yuwen Hu	7ef1c890eb	[LLM] Supports GPTQ convert in transfomers-like API, and supports folder outfile for `llm-convert` (#8366 ) * Add docstrings to llm_convert * Small docstrings fix * Unify outfile type to be a folder path for either gptq or pth model_format * Supports gptq model input for from_pretrained * Fix example and readme * Small fix * Python style fix * Bug fix in llm_convert * Python style check * Fix based on comments * Small fix	2023-06-20 17:42:38 +08:00
Zhao Changmin	4ec46afa4f	LLM: Align converting GPTQ model API with transformer style (#8365 ) * LLM: Align GPTQ API with transformer style	2023-06-20 14:27:41 +08:00
Ruonan Wang	f99d348954	LLM: convert and quantize support for StarCoder (#8359 ) * basic support for starcoder * update from_pretrained * fix bug and fix style	2023-06-20 13:39:35 +08:00
binbin Deng	5f4f399ca7	LLM: fix bugs during supporting bloom in langchain (#8362 )	2023-06-20 13:30:37 +08:00
Zhao Changmin	30ac9a70f5	LLM: fix expected 2 blank lines (#8360 )	2023-06-19 18:10:02 +08:00
Zhao Changmin	c256cd136b	LLM: Fix ggml return value (#8358 ) * ggml return original value	2023-06-19 17:02:56 +08:00
Zhao Changmin	d4027d7164	fix typos in llm_convert (#8355 )	2023-06-19 16:17:21 +08:00
Zhao Changmin	4d177ca0a1	LLM: Merge convert pth/gptq model script into one shell script (#8348 ) * convert model in one * model type * license * readme and pep8 * ut path * rename * readme * fix docs * without lines	2023-06-19 11:50:05 +08:00
binbin Deng	ab1a833990	LLM: add basic uts related to inference (#8346 )	2023-06-19 10:25:51 +08:00
Yuwen Hu	1aa33d35d5	[LLM] Refactor LLM Linux tests (#8349 ) * Small name fix * Add convert nightly tests, and for other llm tests, use stable ckpt * Small fix and ftp fix * Small fix * Small fix	2023-06-16 15:22:48 +08:00
Ruonan Wang	9daf543e2f	LLM: Update convert of gpenox to sync with new libgptneox.so (#8345 )	2023-06-15 16:28:50 +08:00
Ruonan Wang	9fda7e34f1	LLM: fix version control (#8342 )	2023-06-15 15:18:50 +08:00
Ruonan Wang	f7f4e65788	LLM: support int8 and tmp_path for `from_pretrained` (#8338 )	2023-06-15 14:48:21 +08:00
Yuwen Hu	b30aa49c4e	[LLM] Add Actions for downloading & converting models (#8320 ) * First push to downloading and converting llm models for testing (Gondolin runner, avx2 for now) * Change yml file name	2023-06-15 13:43:47 +08:00
Ruonan Wang	8840dadd86	LLM: binary file version control on source forge (#8329 ) * support version control for llm based on date * update action	2023-06-15 09:53:27 +08:00
Ruonan Wang	5094970175	LLM: update `convert_model` to support int8 (#8326 ) * update example and convert_model for int8 * reset example * fix style	2023-06-15 09:25:07 +08:00
binbin Deng	f64e703083	LLM: first add `_tokenize`, `detokenize` and `_generate` for bloom pybinding (#8316 )	2023-06-14 17:29:57 +08:00
Xin Qiu	5576679a92	add convert-gptq-to-ggml.py to bigdl-llama (#8298 )	2023-06-14 14:51:51 +08:00
Ruonan Wang	a6c4b733cb	LLM: Update subprocess to show error message (#8323 ) * update subprocess * fix style	2023-06-13 16:43:37 +08:00
Shengsheng Huang	02c583144c	[LLM] langchain integrations and examples (#8256 ) * langchain intergrations and examples * add licences and rename * add licences * fix license issues and change backbone to model_family * update examples to use model_family param * fix linting * fix code style * exclude langchain integration from stylecheck * update langchain examples and update integrations based on latets changes * update simple llama-cpp-python style API example * remove bloom in README * change default n_threads to 2 and remove redundant code --------- Co-authored-by: leonardozcm <changmin.zhao@intel.com>	2023-06-12 19:22:07 +08:00
Yuwen Hu	f83c48280f	[LLM] Unify transformers-like API example for 3 different model families (#8315 ) * Refactor bigdl-llm transformers-like API to unify them * Small fix	2023-06-12 17:20:30 +08:00
xingyuan li	c4028d507c	[LLM] Add unified default value for cli programs (#8310 ) * add unified default value for threads and n_predict	2023-06-12 16:30:27 +08:00
Junwei Deng	f41995051b	LLM: add new readme as first version document (#8296 ) * add new readme * revice * revice * change readme * add python req	2023-06-09 15:52:02 +08:00
Yuwen Hu	c619315131	[LLM] Add examples for `gptneox`, `llama`, and `bloom` family model using transformers-like API (#8286 ) * First push of bigdl-llm example for gptneox model family * Add some args and other small updates * Small updates * Add example for llama family models * Small fix * Small fix * Update for batch_decode api and change default model for llama example * Small fix * Small fix * Small fix * Small model family name fix and add example for bloom * Small fix * Small default prompt fix * Small fix * Change default prompt * Add sample output for inference * Hide example inference time	2023-06-09 15:48:22 +08:00
binbin Deng	5d5da7b2c7	LLM: optimize namespace and remove unused import logic (#8302 )	2023-06-09 15:17:49 +08:00
Ruonan Wang	5d0e130605	LLM: fix convert path error of gptneox and bloom on windows (#8304 )	2023-06-09 10:10:19 +08:00
Yina Chen	7bfa0fcdf9	fix style (#8300 )	2023-06-08 16:52:17 +08:00
Yina Chen	637b72f2ad	[LLM] llm transformers api support batch actions (#8288 ) * llm transformers api support batch actions * align with transformer * meet comment	2023-06-08 15:10:08 +08:00
xingyuan li	ea3cf6783e	LLM: Command line wrapper for llama/bloom/gptneox (#8239 ) * add llama/bloom/gptneox wrapper * add readme * upload binary main file	2023-06-08 14:55:22 +08:00
binbin Deng	08bdfce2d8	LLM: avoid unnecessary import torch except converting process (#8297 )	2023-06-08 14:24:58 +08:00
binbin Deng	f9e2bda04a	LLM: add stop words and enhance output for bloom pybinding (#8280 )	2023-06-08 14:06:06 +08:00
Yina Chen	6990328e5c	[LLM]Add bloom quantize in setup.py (#8295 ) * add bloom quantize in setup.py * fix	2023-06-08 11:18:22 +08:00
Yina Chen	1571ba6425	remove unused import gptneox_cpp (#8293 )	2023-06-08 11:04:47 +08:00
Ruonan Wang	aa91657019	LLM: add bloom dll/exe in setup (#8284 )	2023-06-08 09:28:28 +08:00
Pingchuan Ma (Henry)	773255e009	[LLM] Add dev wheel building and basic UT script for LLM package on Linux (#8264 ) * add wheel build for linux * test fix * test self-hosted runner * test fix * update runner * update runner * update fix * init cicd * init cicd * test conda * update fix * update no need manual python deps * test fix bugs * test fix bugs * test fix bugs * fix bugs	2023-06-08 00:49:57 +08:00
Yina Chen	2c037e892b	fix-transformers-neox (#8285 )	2023-06-07 14:44:43 +08:00
Ruonan Wang	39ad68e786	LLM: enhancements for `convert_model` (#8278 ) * update convert * change output name * add discription for input_path, add check for input_values * basic support for command line * fix style * update based on comment * update based on comment	2023-06-07 13:22:14 +08:00
Junwei Deng	2d14e593f0	LLM: Support `generate(max_new_tokens=...)`, `tokenize` and `decode` for transformers-like API (#8283 ) * first push * fix pep8	2023-06-07 11:50:35 +08:00
Yina Chen	11cd2a07e0	[LLM] llm transformers format interface first part (#8276 ) * llm-transformers-format * update * fix style	2023-06-06 17:17:37 +08:00
Pingchuan Ma (Henry)	a3f353b939	[LLM] add long time loading disclaimer for LLM model converting (#8279 )	2023-06-06 17:15:13 +08:00
Yuwen Hu	64bc123dd3	[LLM] Add transformers-like API from_pretrained (#8271 ) * Init commit for bigdl.llm.transformers.AutoModelForCausalLM * Temp change to avoid name conflicts with external transformers lib * Support downloading model from huggingface * Small python style fix * Change location of transformers to avoid library conflicts * Add return value for converted ggml binary ckpt path for convert_model * Avoid repeated loading of shared library and adding some comments * Small fix * Path type fix anddocstring fix * Small fix * Small fix * Change cache dir to pwd	2023-06-06 17:04:16 +08:00
Pingchuan Ma (Henry)	2ed5842448	[LLM] add convert's python deps for LLM (#8260 ) * add python deps for LLM * update release.sh * change deps group name * update all * fix update * test fix * update	2023-06-06 16:01:17 +08:00
xingyuan li	38be471140	[LLM] convert_model bug fix (#8274 ) * Renamed all bloomz to bloom in ggml/model & utls/convert_util.py * Add an optional parameter for specific the model conversion path to avoid running out of disk space	2023-06-06 15:16:42 +08:00
Ruonan Wang	8bd2992a8d	LLM: accelerate sample of gptneox and update quantize (#8262 ) * update quantize & accelerate sample * fix style check * fix style error	2023-06-05 15:36:00 +08:00
Jun Wang	2bc0e7abbb	[llm] Add convert_model api (#8244 ) * add convert_model api * change the model_path to input_path * map int4 to q4_0 * fix blank line * change bloomz to bloom * remove default model_family * change dtype to lower first	2023-06-03 10:18:29 +08:00
Yuwen Hu	e290660b20	[LLM] Add so shared library for Bloom family models (#8258 ) * Add so file downloading for bloom family models * Supports selecting of avx2/avx512 so for bloom	2023-06-02 17:39:40 +08:00
Pingchuan Ma (Henry)	c48d5f7cff	[LLM] Enable UT workflow logics for LLM (#8243 ) * check push connection * enable UT workflow logics for LLM * test fix * add licenses * test fix according to suggestions * test fix * update changes	2023-06-02 17:06:35 +08:00
Yina Chen	657ea0ee50	[LLM] Fix linux load libs for NeoX and llama (#8257 ) * init * add lisence * fix style	2023-06-02 17:03:17 +08:00
Yuwen Hu	286b010bf1	[LLM] First push for Bloomz pybinding (#8252 ) * Initial commit to move bloom pybinding to bigdl-llm * Revise path for shared library * Small fix	2023-06-02 14:41:04 +08:00
Yina Chen	91a1528fce	[LLM]Support for linux package (llama, NeoX) & quantize (llama) (#8246 ) * temp * update * update * remove cmake * runtime get platform -> change platform name using sed * update * update * add platform flags(default: current platform) & delete legacy libs & add neox quantize	2023-06-02 13:51:35 +08:00
Junwei Deng	350d31a472	LLM: first push gptneox pybinding (#8234 ) * first push gptneox pybinding * fix * fix code style and add license --------- Co-authored-by: binbin <binbin1.deng@intel.com>	2023-06-02 09:28:00 +08:00
binbin Deng	3a9aa23835	LLM: fix and update related license in llama pybinding (#8250 )	2023-06-01 17:09:15 +08:00
Pingchuan Ma (Henry)	141febec1f	Add dev wheel building script for LLM package on Windows (#8238 ) * Add dev wheel building script for LLM package on Windows * delete conda * delete python version check * minor adjust * wheel name fixed * test check * test fix * change wheel name	2023-06-01 11:55:26 +08:00
binbin Deng	e56f24b424	LLM: first push llama pybinding (#8241 ) * first push llama binding * update dll	2023-06-01 10:59:15 +08:00
Ruonan Wang	3fd716d422	LLM: update setup.py to add a missing data(#8240 )	2023-06-01 10:25:43 +08:00
binbin Deng	8421af51ae	LLM: support converting to ggml format (#8235 ) * add convert * fix * fix * fix * try * test * update check * fix * fix	2023-05-31 15:20:06 +08:00
Ruonan Wang	c890609d1e	LLM: Support package/quantize for llama.cpp/redpajama.cpp on Windows (#8236 ) * support windows of llama.cpp * update quantize * update version of llama.cp submodule * add gptneox.dll * add quantize-gptneox.exe	2023-05-31 14:47:12 +08:00
Pingchuan Ma (Henry)	1f913a6941	[LLM] Add LLM pep8 coding style checking (#8233 ) * add LLM pep8 coding checking * resolve bugs in testing scripts and code style revision	2023-05-30 15:58:14 +08:00
Ruonan Wang	4638b85f3e	[llm] Initial support of package and quantize (#8228 ) * first commit of CMakeFiles.txt to include llama & gptneox * initial support of quantize * update cmake for only consider linux now * support quantize interface * update based on comment	2023-05-26 16:36:46 +08:00
Junwei Deng	ea22416525	LLM: add first round files (#8225 )	2023-05-25 11:29:18 +08:00

... 16 17 18 19 20 ...

1072 commits