ipex-llm

Author	SHA1	Message	Date
Yina Chen	119bf6d710	[LLM] Support linux cpp dynamic load .so (#8655 ) * support linux cpp dynamic load .so * update cli	2023-08-02 20:15:45 +08:00
Zhao Changmin	ca998cc6f2	LLM: Mute shape mismatch output (#8601 ) * LLM: Mute shape mismatch output	2023-08-02 16:46:22 +08:00
Zhao Changmin	04c713ef06	LLM: Disable transformer api `pretraining_tp` (#8645 ) * disable pretraining_tp	2023-08-02 11:26:01 +08:00
binbin Deng	6fc31bb4cf	LLM: first update descriptions for ChatGLM transformers int4 example (#8646 )	2023-08-02 11:00:56 +08:00
Yang Wang	cbeae97a26	Optimize Llama Attention to to reduce KV cache memory copy (#8580 ) * Optimize llama attention to reduce KV cache memory copy * fix bug * fix style * remove git * fix style * fix style * fix style * fix tests * move llama attention to another file * revert * fix style * remove jit * fix	2023-08-01 16:37:58 -07:00
binbin Deng	39994738d1	LLM: add chat & stream chat example for ChatGLM2 transformers int4 (#8636 )	2023-08-01 14:57:45 +08:00
xingyuan li	cdfbe652ca	[LLM] Add chatglm support for llm-cli (#8641 ) * add chatglm build * add llm-cli support * update git * install cmake * add ut for chatglm * add files to setup * fix bug cause permission error when sf lack file	2023-08-01 14:30:17 +09:00
Zhao Changmin	d6cbfc6d2c	LLM: Add requirements in whisper example (#8644 ) * LLM: Add requirements in whisper example	2023-08-01 12:07:14 +08:00
Zhao Changmin	3e10260c6d	LLM: llm-convert support chatglm family (#8643 ) * convert chatglm	2023-08-01 11:16:18 +08:00
Yina Chen	a607972c0b	[LLM]LLM windows load -api.dll (#8631 ) * temp * update * revert setup.py	2023-07-31 13:47:20 +08:00
xingyuan li	3361b66449	[LLM] Revert llm-cli to disable selecting executables on Windows (#8630 ) * revert vnni file select * revert setup.py * add model-api.dll	2023-07-31 11:15:44 +09:00
binbin Deng	3dbab9087b	LLM: add llama2-7b native int4 example (#8629 )	2023-07-28 10:56:16 +08:00
binbin Deng	fb32fefcbe	LLM: support tensor input of native int4 `generate` (#8620 )	2023-07-27 17:59:49 +08:00
Zhao Changmin	5b484ab48d	LLM: Support load_low_bit loading models in shards format (#8612 ) * shards_model --------- Co-authored-by: leonardozcm <leonaordo1997zcm@gmail.com>	2023-07-26 13:30:01 +08:00
binbin Deng	fcf8c085e3	LLM: add llama2-13b native int4 example (#8613 )	2023-07-26 10:12:52 +08:00
Song Jiaming	650b82fa6e	[LLM] add CausalLM and Speech UT (#8597 )	2023-07-25 11:22:36 +08:00
Zhao Changmin	af201052db	avoid malloc all missing keys in fp32 (#8600 )	2023-07-25 09:48:51 +08:00
binbin Deng	3f24202e4c	[LLM] Add more transformers int4 example (Llama 2) (#8602 )	2023-07-25 09:21:12 +08:00
Jason Dai	0f8201c730	llm readme update (#8595 )	2023-07-24 09:47:49 +08:00
Yuwen Hu	ba42a6da63	[LLM] Set torch_dtype default value to 'auto' for transformers low bit from_pretrained API	2023-07-21 17:55:00 +08:00
Yuwen Hu	bbde423349	[LLM] Add current Linux UT inference tests to nightly tests (#8578 ) * Add current inference uts to nightly tests * Change test model from chatglm-6b to chatglm2-6b * Add thread num env variable for nightly test * Fix urls * Small fix	2023-07-21 13:26:38 +08:00
Yang Wang	feb3af0567	Optimize transformer int4 memory footprint (#8579 )	2023-07-20 20:22:13 -07:00
Yang Wang	57e880f63a	[LLM] use pytorch linear for large input matrix (#8492 ) * use pytorch linear for large input matrix * only works on server * fix style * optimize memory * first check server * revert * address comments * fix style	2023-07-20 09:54:25 -07:00
Yuwen Hu	6504e31a97	Small fix (#8577 )	2023-07-20 16:37:04 +08:00
Yuwen Hu	2266ca7d2b	[LLM] Small updates to transformers int4 ut (#8574 ) * Small fix to transformers int4 ut * Small fix	2023-07-20 13:20:25 +08:00
xingyuan li	7b8d9c1b0d	[LLM] Add dependency file check in setup.py (#8565 ) * add package file check	2023-07-20 14:20:08 +09:00
Song Jiaming	411d896636	LLM first transformers UT (#8514 ) * ut * transformers api first ut * name * dir issue * use chatglm instead of chatglm2 * omp * set omp in sh * source * taskset * test * test omp * add test	2023-07-20 10:16:27 +08:00
Yuwen Hu	cad78740a7	[LLM] Small fixes to the Whisper transformers INT4 example (#8573 ) * Small fixes to the whisper example * Small fix * Small fix	2023-07-20 10:11:33 +08:00
binbin Deng	7a9fdf74df	[LLM] Add more transformers int4 example (Dolly v2) (#8571 ) * add * add trust_remote_mode	2023-07-19 18:20:16 +08:00
Zhao Changmin	e680af45ea	LLM: Optimize Langchain Pipeline (#8561 ) * LLM: Optimize Langchain Pipeline * load in low bit	2023-07-19 17:43:13 +08:00
Shengsheng Huang	616b7cb0a2	add more langchain examples (#8542 ) * update langchain descriptions * add mathchain example * update readme * update readme	2023-07-19 17:42:18 +08:00
binbin Deng	457571b44e	[LLM] Add more transformers int4 example (InternLM) (#8557 )	2023-07-19 15:15:38 +08:00
xingyuan li	b6510fa054	fix move/download dll step (#8564 )	2023-07-19 12:17:07 +09:00
xingyuan li	c52ed37745	fix starcoder dll name (#8563 )	2023-07-19 11:55:06 +09:00
Zhao Changmin	3dbe3bf18e	transformer_int4 (#8553 )	2023-07-19 08:33:58 +08:00
Zhao Changmin	49d636e295	[LLM] whisper model transformer int4 verification and example (#8511 ) * LLM: transformer api support * va * example * revert * pep8 * pep8	2023-07-19 08:33:20 +08:00
Yina Chen	9a7bc17ca1	[LLM] llm supports vnni link on windows (#8543 ) * support win vnni link * fix style * fix style * use isa_checker * fix * typo * fix * update	2023-07-18 16:43:45 +08:00
Yina Chen	4582b6939d	[LLM]llm gptneox chat (#8527 ) * linux * support win * merge upstream & support vnni lib in chat	2023-07-18 11:17:17 +08:00
Jason Dai	1ebc43b151	Update READMEs (#8554 )	2023-07-18 11:06:06 +08:00
Yuwen Hu	ee70977c07	[LLM] Transformers int4 example small typo fixes (#8550 )	2023-07-17 18:15:32 +08:00
Yuwen Hu	1344f50f75	[LLM] Add more transformers int4 examples (Falcon) (#8546 ) * Initial commit * Add Falcon examples and other small fix * Small fix * Small fix * Update based on comments * Small fix	2023-07-17 17:36:21 +08:00
Yuwen Hu	de772e7a80	Update mpt for prompt tuning (#8547 )	2023-07-17 17:33:54 +08:00
binbin Deng	f1fd746722	[LLM] Add more transformers int4 example (vicuna) (#8544 )	2023-07-17 16:59:55 +08:00
Xin Qiu	fccae91461	Add load_low_bit save_load_bit to AutoModelForCausalLM (#8531 ) * transformers save_low_bit load_low_bit * update example and add readme * update * update * update * add ut * update	2023-07-17 15:29:55 +08:00
binbin Deng	808a64d53a	[LLM] Add more transformers int4 example (starcoder) (#8540 )	2023-07-17 14:41:19 +08:00
xingyuan li	e57db777e0	[LLM] Setup.py & llm-cli update for windows vnni binary files (#8537 ) * update setup.py * update llm-cli	2023-07-17 12:28:38 +09:00
binbin Deng	f56b5ade4c	[LLM] Add more transformers int4 example (chatglm2) (#8539 )	2023-07-14 17:58:33 +08:00
binbin Deng	92d33cf35a	[LLM] Add more transformers int4 example (phoenix) (#8520 )	2023-07-14 17:58:04 +08:00
Yuwen Hu	e0f0def279	Remove unused example for now (#8538 )	2023-07-14 17:32:50 +08:00
binbin Deng	b397e40015	[LLM] Add more transformers int4 example (RedPajama) (#8523 )	2023-07-14 17:30:28 +08:00
Yuwen Hu	7bf3e10415	[LLM] Add more int4 transformers examples (MOSS) (#8532 ) * Add Moss example * Small fix	2023-07-14 16:41:41 +08:00
Yuwen Hu	59b7287ef5	[LLM] Add more transformers int4 example (Baichuan) (#8522 ) * Add example model Baichuan * Small updates to client windows settings * Small refactor * Small fix	2023-07-14 16:41:29 +08:00
Yuwen Hu	ca6e38607c	[LLM] Add more transformers examples (ChatGLM) (#8521 ) * Add example for chatglm v1 and other small fixes * Small fix * Small further fix * Small fix * Update based on comments & updates for client windows recommended settingts * Small fix * Small refactor * Small fix * Small fix * Small fix to dolly v1 * Small fix	2023-07-14 16:41:13 +08:00
xingyuan li	c87853233b	[LLM] Add windows vnni binary build step (#8518 ) * add windows vnni build step * update build info * add download command	2023-07-14 17:24:39 +09:00
Yishuo Wang	6320bf201e	LLM: fix memory access violation (#8519 )	2023-07-13 17:08:08 +08:00
xingyuan li	60c2c0c3dc	Bug fix for merged pr #8503 (#8516 )	2023-07-13 17:26:30 +09:00
Yuwen Hu	349bcb4bae	[LLM] Add more transformers int4 example (Dolly v1) (#8517 ) * Initial commit for dolly v1 * Add example for Dolly v1 and other small fix * Small output updates * Small fix * fix based on comments	2023-07-13 16:13:47 +08:00
Xin Qiu	90e3d86bce	rename low bit type name (#8512 ) * change qx_0 to sym_intx * update * fix typo * update * fix type * fix style * add python doc * meet code review * fix style	2023-07-13 15:53:31 +08:00
xingyuan li	4f152b4e3a	[LLM] Merge the llm.cpp build and the pypi release (#8503 ) * checkout llm.cpp to build new binary * use artifact to get latest built binary files * rename quantize * modify all release workflow	2023-07-13 16:34:24 +09:00
Yuwen Hu	bcde8ec83e	[LLM] Small fix to MPT Example (#8513 )	2023-07-13 14:33:21 +08:00
Zhao Changmin	ba0da17b40	LLM: Support AutoModelForSeq2SeqLM transformer API (#8449 ) * LLM: support AutoModelForSeq2SeqLM transformer API	2023-07-13 13:33:51 +08:00
Yishuo Wang	86b5938075	LLM: fix llm pybinding (#8509 )	2023-07-13 10:27:08 +08:00
Yuwen Hu	fcc352eee3	[LLM] Add more transformers_int4 examples (MPT) (#8498 ) * Update transformers_int4 readme, and initial commit for mpt * Update example for mpt * Small fix and recover transformers_int4_pipeline_readme.md for now * Update based on comments * Small fix * Small fix * Update based on comments	2023-07-13 09:41:16 +08:00
Zhao Changmin	23f6a4c21f	LLM: Optimize transformer int4 loading (#8499 ) * LLM: Optimize transformer int4 loading	2023-07-12 15:25:42 +08:00
Yishuo Wang	dd3f953288	Support vnni check (#8497 )	2023-07-12 10:11:15 +08:00
Xin Qiu	cd7a980ec4	Transformer int4 add qtype, support q4_1 q5_0 q5_1 q8_0 (#8481 ) * quant in Q4 5 8 * meet code review * update readme * style * update * fix error * fix error * update * fix style * update * Update README.md * Add load_in_low_bit	2023-07-12 08:23:08 +08:00
Yishuo Wang	db39d0a6b3	LLM: disable mmap by default for better performance (#8467 )	2023-07-11 09:26:26 +08:00
Yuwen Hu	52c6b057d6	Initial LLM Transformers example refactor (#8491 )	2023-07-10 17:53:57 +08:00
Junwei Deng	254a7aa3c4	bigdl-llm: add voice-assistant example that are migrated from langchain use-case document (#8468 )	2023-07-10 16:51:45 +08:00
Yishuo Wang	98bac815e4	specify numpy version (#8489 )	2023-07-10 16:50:16 +08:00
Zhao Changmin	81d655cda9	LLM: transformer int4 save and load (#8462 ) * LLM: transformer int4 save and load	2023-07-10 16:34:41 +08:00
binbin Deng	d489775d2c	LLM: fix inconsistency between output token number and `max_new_token` (#8479 )	2023-07-07 17:31:05 +08:00
Jason Dai	bcc1eae322	Llm readme update (#8472 )	2023-07-06 20:04:04 +08:00
Ruonan Wang	2f77d485d8	Llm: Initial support of langchain transformer int4 API (#8459 ) * first commit of transformer int4 and pipeline * basic examples temp save for embeddings support embeddings and docqa exaple * fix based on comment * small fix	2023-07-06 17:50:05 +08:00
binbin Deng	14626fe05b	LLM: refactor transformers and langchain class name (#8470 )	2023-07-06 17:16:44 +08:00
binbin Deng	70bc8ea8ae	LLM: update langchain and cpp-python style API examples (#8456 )	2023-07-06 14:36:42 +08:00
Ruonan Wang	64b38e1dc8	llm: benchmark tool for transformers int4 (separate 1st token and rest) (#8460 ) * add benchmark utils * fix * fix bug and add readme * hidden latency data	2023-07-06 09:49:52 +08:00
binbin Deng	77808fa124	LLM: fix n_batch in starcoder pybinding (#8461 )	2023-07-05 17:06:50 +08:00
Yina Chen	f2bb469847	[WIP] LLm llm-cli chat mode (#8440 ) * fix timezone * temp * Update linux interactive mode * modify init text for interactive mode * meet comments * update * win script * meet comments	2023-07-05 14:04:17 +08:00
binbin Deng	1970bcf14e	LLM: add readme for transformer examples (#8444 )	2023-07-04 17:25:58 +08:00
binbin Deng	e54e52b438	LLM: fix n_batch in bloom pybinding (#8454 )	2023-07-04 15:10:32 +08:00
Yuwen Hu	372c775cb4	[LLM] Change default runner for LLM Linux tests to the ones with AVX512 (#8448 ) * Basic change for AVX512 runner * Remove conda channel and action rename * Small fix * Small fix and reduce peak convert disk space * Define n_threads based on runner status * Small thread num fix * Define thread_num for cli * test * Add self-hosted label and other small fix	2023-07-04 14:53:03 +08:00
Jason Dai	edf23a95be	Update llm readme (#8446 )	2023-07-03 16:58:44 +08:00
Jason Dai	a38f927fc0	Update README.md (#8439 )	2023-07-03 14:59:55 +08:00
binbin Deng	c956a46c40	LLM: first fix example/transformers (#8438 )	2023-07-03 14:13:33 +08:00
Jason Dai	e5b384aaa2	Update README.md (#8437 )	2023-07-03 10:54:29 +08:00
Yang Wang	449aea7ffc	Optimize transformer int4 loading memory (#8400 ) * Optimize transformer int4 loading memory * move cast to convert * default settting low_cpu_mem_usage	2023-06-30 20:12:12 -07:00
Jason Dai	2da21163f8	Update llm README.md (#8431 )	2023-06-30 19:41:17 +08:00
Junwei Deng	2fd751de7a	LLM: add a dev tool for getting glibc/glibcxx requirement (#8399 ) * add a dev tool * pep8 change	2023-06-30 11:09:50 +08:00
binbin Deng	146662bc0d	LLM: fix langchain windows failure (#8417 )	2023-06-30 09:59:10 +08:00
Yina Chen	6251ad8934	[LLM]Windows unittest (#8356 ) * win-unittest * update * update * try llama 7b * delete llama * update * add red-3b * only test red-3b * revert * add langchain * add dependency * delete langchain	2023-06-29 14:03:12 +08:00
Yina Chen	783aea3309	[LLM] LLM windows daily test (#8328 ) * llm-win-init * test action * test * add types * update for schtasks * update pytests * update * update * update doc * use stable ckpt from ftp instead of the converted model * download using batch -> manually * add starcoder test	2023-06-28 15:02:11 +08:00
binbin Deng	ca5a4b6e3a	LLM: update bloom and starcoder usage in transformers_int4_pipeline (#8406 )	2023-06-28 13:15:50 +08:00
Zhao Changmin	cc76ec809a	check out dir (#8395 )	2023-06-27 21:28:39 +08:00
Ruonan Wang	4be784a49d	LLM: add UT for starcoder (convert, inference) update examples and readme (#8379 ) * first commit to add path * update example and readme * update path * fix * update based on comment	2023-06-27 12:12:11 +08:00
Xin Qiu	e68d631c0a	gptq2ggml: support loading safetensors model. (#8401 ) * update convert gptq to ggml * update convert gptq to ggml * gptq to ggml * update script * meet code review * meet code review	2023-06-27 11:19:33 +08:00
Ruonan Wang	b9eae23c79	LLM: add chatglm-6b example for transformer_int4 usage (#8392 ) * add example for chatglm-6b * fix	2023-06-26 13:46:43 +08:00
binbin Deng	19e19efb4c	LLM: raise warning instead of error when use unsupported parameters (#8382 )	2023-06-26 13:23:55 +08:00
Shengsheng Huang	c113ecb929	[LLM] langchain bloom, UT's, default parameters (#8357 ) * update langchain default parameters to align w/ api * add ut's for llm and embeddings * update inference test script to install langchain deps * update tests workflows --------- Co-authored-by: leonardozcm <changmin.zhao@intel.com>	2023-06-25 17:38:00 +08:00
Shengsheng Huang	446175cc05	transformer api refactor (#8389 ) * transformer api refactor * fix style * add huggingface tokenizer usage in example and make ggml tokenzizer as option 1 and huggingface tokenizer as option 2 * fix style	2023-06-25 17:15:33 +08:00

1 2 3 4 5

215 commits