ipex-llm

Author	SHA1	Message	Date
Ruonan Wang	c0797ea232	LLM: update setup to specify bigdl-core-xe version (#8913 )	2023-09-07 15:11:55 +08:00
Ruonan Wang	057e77e229	LLM: update benchmark_utils.py to handle do_sample=True (#8903 )	2023-09-07 14:20:47 +08:00
Yang Wang	c34400e6b0	Use new layout for xpu qlinear (#8896 ) * use new layout for xpu qlinear * fix style	2023-09-06 21:55:33 -07:00
Zhao Changmin	8bc1d8a17c	LLM: Fix discards in `optimize_model` with non-hf models and add openai whisper example (#8877 ) * openai-whisper	2023-09-07 10:35:59 +08:00
Xin Qiu	5d9942a3ca	transformer int4 and native int4's benchmark script for 32 256 1k 2k input (#8871 ) * transformer * move * update * add header * update all-in-one * clean up	2023-09-07 09:49:55 +08:00
Yina Chen	bfc71fbc15	Add known issue in arc voice assistant example (#8902 ) * add known issue in voice assistant example * update cpu	2023-09-07 09:28:26 +08:00
Yuwen Hu	db26c7b84d	[LLM] Update readme gif & image url to the ones hosted on readthedocs (#8900 )	2023-09-06 20:04:17 +08:00
SONG Ge	7a71ced78f	[LLM Docs] Remain API Docs Issues Solution (#8780 ) * langchain readthedocs update * solve langchain.llms.transformersllm issues * langchain.embeddings.transformersembeddings/transfortmersllms issues * update docs for get_num_tokens * add low_bit api doc * add optimizer model api doc * update rst index * fix coomments style * update docs following the comments * update api doc	2023-09-06 16:29:34 +08:00
Xin Qiu	49a39452c6	update benchmark (#8899 )	2023-09-06 15:11:43 +08:00
Kai Huang	4a9ff050a1	Add qlora nf4 (#8782 ) * add nf4 * dequant nf4 * style	2023-09-06 09:39:22 +08:00
xingyuan li	704a896e90	[LLM] Add perf test on xpu for bigdl-llm (#8866 ) * add xpu latency job * update install way * remove duplicated workflow * add perf upload	2023-09-05 17:36:24 +09:00
Zhao Changmin	95271f10e0	LLM: Rename low bit layer (#8875 ) * rename lowbit --------- Co-authored-by: leonardozcm <leonardozcm@gmail.com>	2023-09-05 13:21:12 +08:00
Yina Chen	74a2c2ddf5	Update optimize_model=True in llama2 chatglm2 arc examples (#8878 ) * add optimize_model=True in llama2 chatglm2 examples * add ipex optimize in gpt-j example	2023-09-05 10:35:37 +08:00
Jason Dai	5e58f698cd	Update readthedocs (#8882 )	2023-09-04 15:42:16 +08:00
Song Jiaming	7b3ac66e17	[LLM] auto performance test fix specific settings to template (#8876 )	2023-09-01 15:49:04 +08:00
Yang Wang	242c9d6036	Fix chatglm2 multi-turn streamchat (#8867 )	2023-08-31 22:13:49 -07:00
Song Jiaming	c06f1ca93e	[LLM] auto perf test to output to csv (#8846 )	2023-09-01 10:48:00 +08:00
Zhao Changmin	9c652fbe95	LLM: Whisper long segment recognize example (#8826 ) * LLM: Long segment recognize example	2023-08-31 16:41:25 +08:00
Yishuo Wang	a232c5aa21	[LLM] add protobuf in bigdl-llm dependency (#8861 )	2023-08-31 15:23:31 +08:00
xingyuan li	de6c6bb17f	[LLM] Downgrade amx build gcc version and remove avx flag display (#8856 ) * downgrade to gcc 11 * remove avx display	2023-08-31 14:08:13 +09:00
Yang Wang	3b4f4e1c3d	Fix llama attention optimization for XPU (#8855 ) * Fix llama attention optimization fo XPU * fix chatglm2 * fix typo	2023-08-30 21:30:49 -07:00
Shengsheng Huang	7b566bf686	[LLM] add new API for optimize any pytorch models (#8827 ) * add new API for optimize any pytorch models * change test util name * revise API and update UT * fix python style * update ut config, change default value * change defaults, disable ut transcribe	2023-08-30 19:41:53 +08:00
Xin Qiu	8eca982301	windows add env (#8852 )	2023-08-30 15:54:52 +08:00
Zhao Changmin	731916c639	LLM: Enable attempting loading method automatically (#8841 ) * enable auto load method * warning error * logger info --------- Co-authored-by: leonardozcm <leonardozcm@gmail.com>	2023-08-30 15:41:55 +08:00
Yishuo Wang	bba73ec9d2	[LLM] change chatglm native int4 checkpoint name (#8851 )	2023-08-30 15:05:19 +08:00
Yina Chen	55e705a84c	[LLM] Support the rest of AutoXXX classes in Transformers API (#8815 ) * add transformers auto models * fix	2023-08-30 11:16:14 +08:00
Zhao Changmin	887018b0f2	Update ut save&load (#8847 ) Co-authored-by: leonardozcm <leonardozcm@gmail.com>	2023-08-30 10:32:57 +08:00
Yina Chen	3462fd5c96	Add arc gpt-j example (#8840 )	2023-08-30 10:31:24 +08:00
Ruonan Wang	f42c0bad1b	LLM: update GPU doc (#8845 )	2023-08-30 09:24:19 +08:00
Jason Dai	aab7deab1f	Reorganize GPU examples (#8844 )	2023-08-30 08:32:08 +08:00
Yang Wang	a386ad984e	Add Data Center GPU Flex Series to Readme (#8835 ) * Add Data Center GPU Flex Series to Readme * remove * update starcoder	2023-08-29 11:19:09 -07:00
Yishuo Wang	7429ea0606	[LLM] support transformer int4 + amx int4 (#8838 )	2023-08-29 17:27:18 +08:00
Ruonan Wang	ddff7a6f05	Update readme of GPU to specify oneapi version(#8820 )	2023-08-29 13:14:22 +08:00
Zhao Changmin	bb31d4fe80	LLM: Implement hf `low_cpu_mem_usage` with 1xbinary file peak memory on transformer int4 (#8731 ) * 1x peak memory	2023-08-29 09:33:17 +08:00
Yina Chen	35fdf94031	[LLM]Arc starcoder example (#8814 ) * arc starcoder example init * add log * meet comments	2023-08-28 16:48:00 +08:00
xingyuan li	6a902b892e	[LLM] Add amx build step (#8822 ) * add amx build step	2023-08-28 17:41:18 +09:00
Ruonan Wang	eae92bc7da	llm: quick fix path (#8810 )	2023-08-25 16:02:31 +08:00
Ruonan Wang	0186f3ab2f	llm: update all ARC int4 examples (#8809 ) * update GPU examples * update other examples * fix * update based on comment	2023-08-25 15:26:10 +08:00
Song Jiaming	b8b1b6888b	[LLM] Performance test (#8796 )	2023-08-25 14:31:45 +08:00
Yang Wang	9d0f6a8cce	rename math.py in example to avoid conflict (#8805 )	2023-08-24 21:06:31 -07:00
SONG Ge	d2926c7672	[LLM] Unify Langchain Native and Transformers LLM API (#8752 ) * deprecate BigDLNativeTransformers and add specific LMEmbedding method * deprecate and add LM methods for langchain llms * add native params to native langchain * new imple for embedding * move ut from bigdlnative to casual llm * rename embeddings api and examples update align with usage updating * docqa example hot-fix * add more api docs * add langchain ut for starcoder * support model_kwargs for transformer methods when calling causalLM and add ut * ut fix for transformers embedding * update for langchain causal supporting transformers * remove model_family in readme doc * add model_families params to support more models * update api docs and remove chatglm embeddings for now * remove chatglm embeddings in examples * new refactor for ut to add bloom and transformers llama ut * disable llama transformers embedding ut	2023-08-25 11:14:21 +08:00
binbin Deng	5582872744	LLM: update chatglm example to be more friendly for beginners (#8795 )	2023-08-25 10:55:01 +08:00
Yina Chen	7c37424a63	Fix voice assistant example input error on Linux (#8799 ) * fix linux error * update * remove alsa log	2023-08-25 10:47:27 +08:00
Yang Wang	bf3591e2ff	Optimize chatglm2 for bf16 (#8725 ) * make chatglm works with bf16 * fix style * support chatglm v1 * fix style * fix style * add chatglm2 file	2023-08-24 10:04:25 -07:00
xingyuan li	c94bdd3791	[LLM] Merge windows & linux nightly test (#8756 ) * fix download statement * add check before build wheel * use curl to upload files * windows unittest won't upload converted model * split llm-cli test into windows & linux versions * update tempdir create way * fix nightly converted model name * windows llm-cli starcoder test temply disabled * remove taskset dependency * rename llm_unit_tests_linux to llm_unit_tests	2023-08-23 12:48:41 +09:00
Jason Dai	dcadd09154	Update llm document (#8784 )	2023-08-21 22:34:44 +08:00
Yishuo Wang	611c1fb628	[LLM] change default n_threads of native int4 langchain API (#8779 )	2023-08-21 13:30:12 +08:00
Yishuo Wang	3d1f2b44f8	LLM: change default n_threads of native int4 models (#8776 )	2023-08-18 15:46:19 +08:00
Yishuo Wang	2ba2133613	fix starcoder chinese output (#8773 )	2023-08-18 13:37:02 +08:00
binbin Deng	548f7a6cf7	LLM: update convert of llama family to support llama2-70B (#8747 )	2023-08-18 09:30:35 +08:00

1 2 3 4 5 ...

256 commits