ipex-llm

Author	SHA1	Message	Date
Yishuo Wang	5837cc424a	[LLM] add chatglm pybinding binary file release (#8677 )	2023-08-04 11:45:27 +08:00
xingyuan li	bc4cdb07c9	Remove conda for llm workflow (#8671 )	2023-08-04 12:09:42 +09:00
Yang Wang	b6468bac43	optimize chatglm2 long sequence (#8662 ) * add chatglm2 * optimize a little * optimize chatglm long sequence * fix style * address comments and fix style * fix bug	2023-08-03 17:56:24 -07:00
Yang Wang	3407f87075	Fix llama kv cache bug (#8674 )	2023-08-03 17:54:55 -07:00
Yina Chen	59903ea668	llm linux support avx & avx2 (#8669 )	2023-08-03 17:10:59 +08:00
xingyuan li	110cfb5546	[LLM] Remove old windows nightly test code (#8668 ) Remove old Windows nightly test code triggered by task scheduler Add new Windows nightly workflow for nightly testing	2023-08-03 17:12:23 +09:00
Yina Chen	bd177ab612	[LLM] llm binary build linux add avx & avx2 (#8665 ) * llm add linux avx & avx2 release * fix name * update check	2023-08-03 14:38:31 +08:00
xingyuan li	610084e3c0	[LLM] Complete windows unittest (#8611 ) * add windows nightly test workflow * use github runner to run pr test * model load should use lowbit * remove tmp dir after testing	2023-08-03 14:48:42 +09:00
binbin Deng	a15a2516e6	add (#8659 )	2023-08-03 10:12:10 +08:00
Xin Qiu	0714888705	build windows avx dll (#8657 ) * windows avx * add to actions	2023-08-03 02:06:24 +08:00
Yina Chen	119bf6d710	[LLM] Support linux cpp dynamic load .so (#8655 ) * support linux cpp dynamic load .so * update cli	2023-08-02 20:15:45 +08:00
Zhao Changmin	ca998cc6f2	LLM: Mute shape mismatch output (#8601 ) * LLM: Mute shape mismatch output	2023-08-02 16:46:22 +08:00
Yina Chen	15b3adc7ec	[LLM] llm linux binary make -> cmake (#8656 ) * llm linux make -> cmake * update * update	2023-08-02 16:41:54 +08:00
Zhao Changmin	04c713ef06	LLM: Disable transformer api `pretraining_tp` (#8645 ) * disable pretraining_tp	2023-08-02 11:26:01 +08:00
binbin Deng	6fc31bb4cf	LLM: first update descriptions for ChatGLM transformers int4 example (#8646 )	2023-08-02 11:00:56 +08:00
xingyuan li	769209b7f0	Chatglm unittest disable due to missing instruction (#8650 )	2023-08-02 10:28:42 +09:00
Yang Wang	cbeae97a26	Optimize Llama Attention to to reduce KV cache memory copy (#8580 ) * Optimize llama attention to reduce KV cache memory copy * fix bug * fix style * remove git * fix style * fix style * fix style * fix tests * move llama attention to another file * revert * fix style * remove jit * fix	2023-08-01 16:37:58 -07:00
binbin Deng	39994738d1	LLM: add chat & stream chat example for ChatGLM2 transformers int4 (#8636 )	2023-08-01 14:57:45 +08:00
xingyuan li	cdfbe652ca	[LLM] Add chatglm support for llm-cli (#8641 ) * add chatglm build * add llm-cli support * update git * install cmake * add ut for chatglm * add files to setup * fix bug cause permission error when sf lack file	2023-08-01 14:30:17 +09:00
Zhao Changmin	d6cbfc6d2c	LLM: Add requirements in whisper example (#8644 ) * LLM: Add requirements in whisper example	2023-08-01 12:07:14 +08:00
Zhao Changmin	3e10260c6d	LLM: llm-convert support chatglm family (#8643 ) * convert chatglm	2023-08-01 11:16:18 +08:00
Yina Chen	a607972c0b	[LLM]LLM windows load -api.dll (#8631 ) * temp * update * revert setup.py	2023-07-31 13:47:20 +08:00
xingyuan li	3361b66449	[LLM] Revert llm-cli to disable selecting executables on Windows (#8630 ) * revert vnni file select * revert setup.py * add model-api.dll	2023-07-31 11:15:44 +09:00
binbin Deng	3dbab9087b	LLM: add llama2-7b native int4 example (#8629 )	2023-07-28 10:56:16 +08:00
binbin Deng	fb32fefcbe	LLM: support tensor input of native int4 `generate` (#8620 )	2023-07-27 17:59:49 +08:00
Zhao Changmin	5b484ab48d	LLM: Support load_low_bit loading models in shards format (#8612 ) * shards_model --------- Co-authored-by: leonardozcm <leonaordo1997zcm@gmail.com>	2023-07-26 13:30:01 +08:00
xingyuan li	919791e406	Add needs to make sure run in order (#8621 )	2023-07-26 14:16:57 +09:00
xingyuan li	e3418d7e61	[LLM] Remove concurrency group for binary build workflow (#8619 ) * remove concurrency group for nightly test	2023-07-26 12:15:53 +09:00
binbin Deng	fcf8c085e3	LLM: add llama2-13b native int4 example (#8613 )	2023-07-26 10:12:52 +08:00
xingyuan li	a98b3fe961	Fix cancel flag causing nightly builds to fail (#8618 )	2023-07-26 11:11:08 +09:00
xingyuan li	7d45233825	fix trigger enable flag (#8616 )	2023-07-26 10:53:03 +09:00
Guancheng Fu	07d1aee825	[PPML] add fastchat image for tdx (#8610 )	2023-07-25 15:23:41 +08:00
Song Jiaming	650b82fa6e	[LLM] add CausalLM and Speech UT (#8597 )	2023-07-25 11:22:36 +08:00
xingyuan li	9c897ac7db	[LLM] Merge redundant code in workflow (#8596 ) * modify workflow concurrency group * Add build check to avoid repeated compilation * remove redundant code	2023-07-25 12:12:00 +09:00
Zhao Changmin	af201052db	avoid malloc all missing keys in fp32 (#8600 )	2023-07-25 09:48:51 +08:00
binbin Deng	3f24202e4c	[LLM] Add more transformers int4 example (Llama 2) (#8602 )	2023-07-25 09:21:12 +08:00
Jason Dai	0f8201c730	llm readme update (#8595 )	2023-07-24 09:47:49 +08:00
Yuwen Hu	ba42a6da63	[LLM] Set torch_dtype default value to 'auto' for transformers low bit from_pretrained API	2023-07-21 17:55:00 +08:00
Yuwen Hu	bbde423349	[LLM] Add current Linux UT inference tests to nightly tests (#8578 ) * Add current inference uts to nightly tests * Change test model from chatglm-6b to chatglm2-6b * Add thread num env variable for nightly test * Fix urls * Small fix	2023-07-21 13:26:38 +08:00
Yang Wang	feb3af0567	Optimize transformer int4 memory footprint (#8579 )	2023-07-20 20:22:13 -07:00
Yang Wang	57e880f63a	[LLM] use pytorch linear for large input matrix (#8492 ) * use pytorch linear for large input matrix * only works on server * fix style * optimize memory * first check server * revert * address comments * fix style	2023-07-20 09:54:25 -07:00
Yuwen Hu	6504e31a97	Small fix (#8577 )	2023-07-20 16:37:04 +08:00
Yuwen Hu	2266ca7d2b	[LLM] Small updates to transformers int4 ut (#8574 ) * Small fix to transformers int4 ut * Small fix	2023-07-20 13:20:25 +08:00
xingyuan li	7b8d9c1b0d	[LLM] Add dependency file check in setup.py (#8565 ) * add package file check	2023-07-20 14:20:08 +09:00
xingyuan li	2eeb653c75	fix llm build workflow misspell (#8575 )	2023-07-20 12:08:54 +09:00
Song Jiaming	411d896636	LLM first transformers UT (#8514 ) * ut * transformers api first ut * name * dir issue * use chatglm instead of chatglm2 * omp * set omp in sh * source * taskset * test * test omp * add test	2023-07-20 10:16:27 +08:00
Yuwen Hu	cad78740a7	[LLM] Small fixes to the Whisper transformers INT4 example (#8573 ) * Small fixes to the whisper example * Small fix * Small fix	2023-07-20 10:11:33 +08:00
binbin Deng	7a9fdf74df	[LLM] Add more transformers int4 example (Dolly v2) (#8571 ) * add * add trust_remote_mode	2023-07-19 18:20:16 +08:00
Zhao Changmin	e680af45ea	LLM: Optimize Langchain Pipeline (#8561 ) * LLM: Optimize Langchain Pipeline * load in low bit	2023-07-19 17:43:13 +08:00
Shengsheng Huang	616b7cb0a2	add more langchain examples (#8542 ) * update langchain descriptions * add mathchain example * update readme * update readme	2023-07-19 17:42:18 +08:00

1 2 3 4 5 ...

1251 commits