ipex-llm

Author	SHA1	Message	Date
xingyuan li	de6c6bb17f	[LLM] Downgrade amx build gcc version and remove avx flag display (#8856 ) * downgrade to gcc 11 * remove avx display	2023-08-31 14:08:13 +09:00
Yang Wang	3b4f4e1c3d	Fix llama attention optimization for XPU (#8855 ) * Fix llama attention optimization fo XPU * fix chatglm2 * fix typo	2023-08-30 21:30:49 -07:00
Shengsheng Huang	7b566bf686	[LLM] add new API for optimize any pytorch models (#8827 ) * add new API for optimize any pytorch models * change test util name * revise API and update UT * fix python style * update ut config, change default value * change defaults, disable ut transcribe	2023-08-30 19:41:53 +08:00
Xin Qiu	8eca982301	windows add env (#8852 )	2023-08-30 15:54:52 +08:00
Zhao Changmin	731916c639	LLM: Enable attempting loading method automatically (#8841 ) * enable auto load method * warning error * logger info --------- Co-authored-by: leonardozcm <leonardozcm@gmail.com>	2023-08-30 15:41:55 +08:00
Yishuo Wang	bba73ec9d2	[LLM] change chatglm native int4 checkpoint name (#8851 )	2023-08-30 15:05:19 +08:00
Yina Chen	55e705a84c	[LLM] Support the rest of AutoXXX classes in Transformers API (#8815 ) * add transformers auto models * fix	2023-08-30 11:16:14 +08:00
Zhao Changmin	887018b0f2	Update ut save&load (#8847 ) Co-authored-by: leonardozcm <leonardozcm@gmail.com>	2023-08-30 10:32:57 +08:00
Yina Chen	3462fd5c96	Add arc gpt-j example (#8840 )	2023-08-30 10:31:24 +08:00
Ruonan Wang	f42c0bad1b	LLM: update GPU doc (#8845 )	2023-08-30 09:24:19 +08:00
Jason Dai	aab7deab1f	Reorganize GPU examples (#8844 )	2023-08-30 08:32:08 +08:00
Yang Wang	a386ad984e	Add Data Center GPU Flex Series to Readme (#8835 ) * Add Data Center GPU Flex Series to Readme * remove * update starcoder	2023-08-29 11:19:09 -07:00
Yishuo Wang	7429ea0606	[LLM] support transformer int4 + amx int4 (#8838 )	2023-08-29 17:27:18 +08:00
Ruonan Wang	ddff7a6f05	Update readme of GPU to specify oneapi version(#8820 )	2023-08-29 13:14:22 +08:00
Zhao Changmin	bb31d4fe80	LLM: Implement hf `low_cpu_mem_usage` with 1xbinary file peak memory on transformer int4 (#8731 ) * 1x peak memory	2023-08-29 09:33:17 +08:00
Yina Chen	35fdf94031	[LLM]Arc starcoder example (#8814 ) * arc starcoder example init * add log * meet comments	2023-08-28 16:48:00 +08:00
xingyuan li	6a902b892e	[LLM] Add amx build step (#8822 ) * add amx build step	2023-08-28 17:41:18 +09:00
Ruonan Wang	eae92bc7da	llm: quick fix path (#8810 )	2023-08-25 16:02:31 +08:00
Ruonan Wang	0186f3ab2f	llm: update all ARC int4 examples (#8809 ) * update GPU examples * update other examples * fix * update based on comment	2023-08-25 15:26:10 +08:00
Song Jiaming	b8b1b6888b	[LLM] Performance test (#8796 )	2023-08-25 14:31:45 +08:00
Yang Wang	9d0f6a8cce	rename math.py in example to avoid conflict (#8805 )	2023-08-24 21:06:31 -07:00
SONG Ge	d2926c7672	[LLM] Unify Langchain Native and Transformers LLM API (#8752 ) * deprecate BigDLNativeTransformers and add specific LMEmbedding method * deprecate and add LM methods for langchain llms * add native params to native langchain * new imple for embedding * move ut from bigdlnative to casual llm * rename embeddings api and examples update align with usage updating * docqa example hot-fix * add more api docs * add langchain ut for starcoder * support model_kwargs for transformer methods when calling causalLM and add ut * ut fix for transformers embedding * update for langchain causal supporting transformers * remove model_family in readme doc * add model_families params to support more models * update api docs and remove chatglm embeddings for now * remove chatglm embeddings in examples * new refactor for ut to add bloom and transformers llama ut * disable llama transformers embedding ut	2023-08-25 11:14:21 +08:00
binbin Deng	5582872744	LLM: update chatglm example to be more friendly for beginners (#8795 )	2023-08-25 10:55:01 +08:00
Yina Chen	7c37424a63	Fix voice assistant example input error on Linux (#8799 ) * fix linux error * update * remove alsa log	2023-08-25 10:47:27 +08:00
Yang Wang	bf3591e2ff	Optimize chatglm2 for bf16 (#8725 ) * make chatglm works with bf16 * fix style * support chatglm v1 * fix style * fix style * add chatglm2 file	2023-08-24 10:04:25 -07:00
xingyuan li	c94bdd3791	[LLM] Merge windows & linux nightly test (#8756 ) * fix download statement * add check before build wheel * use curl to upload files * windows unittest won't upload converted model * split llm-cli test into windows & linux versions * update tempdir create way * fix nightly converted model name * windows llm-cli starcoder test temply disabled * remove taskset dependency * rename llm_unit_tests_linux to llm_unit_tests	2023-08-23 12:48:41 +09:00
Jason Dai	dcadd09154	Update llm document (#8784 )	2023-08-21 22:34:44 +08:00
Yishuo Wang	611c1fb628	[LLM] change default n_threads of native int4 langchain API (#8779 )	2023-08-21 13:30:12 +08:00
Yishuo Wang	3d1f2b44f8	LLM: change default n_threads of native int4 models (#8776 )	2023-08-18 15:46:19 +08:00
Yishuo Wang	2ba2133613	fix starcoder chinese output (#8773 )	2023-08-18 13:37:02 +08:00
binbin Deng	548f7a6cf7	LLM: update convert of llama family to support llama2-70B (#8747 )	2023-08-18 09:30:35 +08:00
Yina Chen	4afea496ab	support q8_0 (#8765 )	2023-08-17 15:06:36 +08:00
Ruonan Wang	e9aa2bd890	LLM: reduce GPU 1st token latency and update example (#8763 ) * reduce 1st token latency * update example * fix * fix style * update readme of gpu benchmark	2023-08-16 18:01:23 +08:00
binbin Deng	06609d9260	LLM: add qwen example on arc (#8757 )	2023-08-16 17:11:08 +08:00
SONG Ge	f4164e4492	[BigDL LLM] Update readme for unifying transformers API (#8737 ) * update readme doc * fix readthedocs error * update comment * update exception error info * invalidInputError instead * fix readme typo error and remove import error * fix more typo	2023-08-16 14:22:32 +08:00
Song Jiaming	c1f9af6d97	[LLM] chatglm example and transformers low-bit examples (#8751 )	2023-08-16 11:41:44 +08:00
Ruonan Wang	8805186f2f	LLM: add benchmark tool for gpu (#8760 ) * add benchmark tool for gpu * update	2023-08-16 11:22:10 +08:00
binbin Deng	97283c033c	LLM: add falcon example on arc (#8742 )	2023-08-15 17:38:38 +08:00
binbin Deng	8c55911308	LLM: add baichuan-13B on arc example (#8755 )	2023-08-15 15:07:04 +08:00
binbin Deng	be2ae6eb7c	LLM: fix langchain native int4 voiceasistant example (#8750 )	2023-08-14 17:23:33 +08:00
Ruonan Wang	d28ad8f7db	LLM: add whisper example for arc transformer int4 (#8749 ) * add whisper example for arc int4 * fix	2023-08-14 17:05:48 +08:00
Yishuo Wang	77844125f2	[LLM] Support chatglm cache (#8745 )	2023-08-14 15:10:46 +08:00
Ruonan Wang	faaccb64a2	LLM: add chatglm2 example for Arc (#8741 ) * add chatglm2 example * update * fix readme	2023-08-14 10:43:08 +08:00
binbin Deng	b10d7e1adf	LLM: add mpt example on arc (#8723 )	2023-08-14 09:40:01 +08:00
binbin Deng	e9a1afffc5	LLM: add internlm example on arc (#8722 )	2023-08-14 09:39:39 +08:00
SONG Ge	aceea4dc29	[LLM] Unify Transformers and Native API (#8713 ) * re-open pr to run on latest runner * re-add examples and ut * rename ut and move deprecate to warning instead of raising an error info * ut fix	2023-08-11 19:45:47 +08:00
Yishuo Wang	f91035c298	[LLM] fix chatglm native int4 emoji output (#8739 )	2023-08-11 15:38:41 +08:00
binbin Deng	77efcf7b1d	LLM: fix ChatGLM2 native int4 stream output (#8733 )	2023-08-11 14:51:50 +08:00
Ruonan Wang	ca3e59a1dc	LLM: support stop for starcoder native int4 stream (#8734 )	2023-08-11 14:51:30 +08:00
Song Jiaming	e292dfd970	[WIP] LLM transformers api for langchain (#8642 )	2023-08-11 13:32:35 +08:00

1 2 3 4 5

237 commits