ipex-llm

Author	SHA1	Message	Date
SONG Ge	160a1e5ee7	[WIP] Add UT for Mistral Optimized Model (#9248 ) * add ut for mistral model * update * fix model path * upgrade transformers version for mistral model * refactor correctness ut for mustral model * refactor mistral correctness ut * revert test_optimize_model back * remove mistral from test_optimize_model * add to revert transformers version back to 4.31.0	2023-10-25 15:14:17 +08:00
Yang Wang	067c7e8098	Support deepspeed AutoTP (#9230 ) * Support deepspeed * add test script * refactor convert * refine example * refine * refine example * fix style * refine example and adapte latest ipex * fix style	2023-10-24 23:46:28 -07:00
Yining Wang	a6a8afc47e	Add qwen vl CPU example (#9221 ) * eee * add examples on CPU and GPU * fix * fix * optimize model examples * add Qwen-VL-Chat CPU example * Add Qwen-VL CPU example * fix optimize problem * fix error * Have updated, benchmark fix removed from this PR * add generate API example * Change formats in qwen-vl example * Add CPU transformer int4 example for qwen-vl * fix repo-id problem and add Readme * change picture url * Remove unnecessary file --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2023-10-25 13:22:12 +08:00
binbin Deng	f597a9d4f5	LLM: update perf test configuration (#9264 )	2023-10-25 12:35:48 +08:00
binbin Deng	770ac70b00	LLM: add `low_bit` option in benchmark scripts (#9257 )	2023-10-25 10:27:48 +08:00
WeiguangHan	ec9195da42	LLM: using html to visualize the perf result for Arc (#9228 ) * LLM: using html to visualize the perf result for Arc * deploy the html file * add python license * reslove some comments	2023-10-24 18:05:25 +08:00
Jin Qiao	90162264a3	LLM: replace torch.float32 with auto type (#9261 )	2023-10-24 17:12:13 +08:00
SONG Ge	bd5215d75b	[LLM] Reimplement chatglm fuse rms optimization (#9260 ) * re-implement chatglm rope rms * update	2023-10-24 16:35:12 +08:00
dingbaorong	5a2ce421af	add cpu and gpu examples of flan-t5 (#9171 ) * add cpu and gpu examples of flan-t5 * address yuwen's comments * Add explanation why we add modules to not convert * Refine prompt and add a translation example * Add a empty line at the end of files * add examples of flan-t5 using optimize_mdoel api * address bin's comments * address binbin's comments * add flan-t5 in readme	2023-10-24 15:24:01 +08:00
Yining Wang	4a19f50d16	phi-1_5 CPU and GPU examples (#9173 ) * eee * add examples on CPU and GPU * fix * fix * optimize model examples * have updated * Warmup and configs added * Update two tables	2023-10-24 15:08:04 +08:00
SONG Ge	bfc1e2d733	add fused rms optimization for chatglm model (#9256 )	2023-10-24 14:40:58 +08:00
Ruonan Wang	b15656229e	LLM: fix benchmark issue (#9255 )	2023-10-24 14:15:05 +08:00
Guancheng Fu	f37547249d	Refine README/CICD (#9253 )	2023-10-24 12:56:03 +08:00
binbin Deng	db37edae8a	LLM: update langchain api document page (#9222 )	2023-10-24 10:13:41 +08:00
Xin Qiu	0c5055d38c	add position_ids and fuse embedding for falcon (#9242 ) * add position_ids for falcon * add cpu * add cpu * add license	2023-10-24 09:58:20 +08:00
Wang, Jian4	c14a61681b	Add load low-bit in model-serving for reduce EPC (#9239 ) * init load low-bit * fix * fix	2023-10-23 11:28:20 +08:00
Yina Chen	0383306688	Add arc fp8 support (#9232 ) * add fp8 support * add log * fix style	2023-10-20 17:15:07 +08:00
Yang Wang	118249b011	support transformers 4.34+ for llama (#9229 )	2023-10-19 22:36:30 -07:00
Chen, Zhentao	5850241423	correct Readme GPU example and API docstring (#9225 ) * update readme to correct GPU usage * update from_pretrained supported low bit options * fix stype check	2023-10-19 16:08:47 +08:00
WeiguangHan	f87f67ee1c	LLM: arc perf test for some popular models (#9188 )	2023-10-19 15:56:15 +08:00
Yang Wang	b0ddde0410	Fix removing convert dtype bug (#9216 ) * Fix removing convert dtype bug * fix style	2023-10-18 11:24:22 -07:00
Ruonan Wang	942d6418e7	LLM: fix chatglm kv cache (#9215 )	2023-10-18 19:09:53 +08:00
SONG Ge	0765f94770	[LLM] Optimize kv_cache for mistral model family (#9189 ) * add kv_cache optimization for mistral model * kv_cache optimize for mistral * update stylr * update	2023-10-18 15:13:37 +08:00
Ruonan Wang	3555ebc148	LLM: fix wrong length in gptj kv_cache optimization (#9210 ) * fix wrong length in gptj kv cache * update	2023-10-18 14:59:02 +08:00
Shengsheng Huang	6dad8d16df	optimize NormHead for Baichuan2 (#9205 ) * optimize NormHead for Baichuan2 * fix ut and change name * rename functions	2023-10-18 14:05:07 +08:00
Jin Qiao	a3b664ed03	LLM: add GPU More-Data-Types and Save/Load example (#9199 )	2023-10-18 13:13:45 +08:00
WeiguangHan	b9194c5786	LLM: skip some model tests using certain api (#9163 ) * LLM: Skip some model tests using certain api * initialize variable named result	2023-10-18 09:39:27 +08:00
Ruonan Wang	09815f7064	LLM: fix RMSNorm optimization of Baichuan2-13B/Baichuan-13B (#9204 ) * fix rmsnorm of baichuan2-13B * update baichuan1-13B too * fix style	2023-10-17 18:40:34 +08:00
Jin Qiao	d7ce78edf0	LLM: fix portable zip README image link (#9201 ) * LLM: fix portable zip readme img link * LLM: make README first image center align	2023-10-17 16:38:22 +08:00
Cheen Hau, 俊豪	66c2e45634	Add unit tests for optimized model correctness (#9151 ) * Add test to check correctness of optimized model * Refactor optimized model test * Use models in llm-unit-test * Use AutoTokenizer for bloom * Print out each passed test * Remove unused tokenizer from import	2023-10-17 14:46:41 +08:00
Jin Qiao	d946bd7c55	LLM: add CPU More-Data-Types and Save-Load examples (#9179 )	2023-10-17 14:38:52 +08:00
Ruonan Wang	c0497ab41b	LLM: support kv_cache optimization for Qwen-VL-Chat (#9193 ) * dupport qwen_vl_chat * fix style	2023-10-17 13:33:56 +08:00
binbin Deng	1cd9ab15b8	LLM: fix ChatGLMConfig check (#9191 )	2023-10-17 11:52:56 +08:00
Yang Wang	7160afd4d1	Support XPU DDP training and autocast for LowBitMatmul (#9167 ) * support autocast in low bit matmul * Support XPU DDP training * fix amp	2023-10-16 20:47:19 -07:00
Ruonan Wang	77afb8796b	LLM: fix convert of chatglm (#9190 )	2023-10-17 10:48:13 +08:00
dingbaorong	af3b575c7e	expose modules_to_not_convert in optimize_model (#9180 ) * expose modules_to_not_convert in optimize_model * some fixes	2023-10-17 09:50:26 +08:00
Cengguang Zhang	5ca8a851e9	LLM: add fuse optimization for Mistral. (#9184 ) * add fuse optimization for mistral. * fix. * fix * fix style. * fix. * fix error. * fix style. * fix style.	2023-10-16 16:50:31 +08:00
Jiao Wang	49e1381c7f	update rope (#9155 )	2023-10-15 21:51:45 -07:00
Jason Dai	b192a8032c	Update llm-readme (#9176 )	2023-10-16 10:54:52 +08:00
binbin Deng	a164c24746	LLM: add kv_cache optimization for chatglm2-6b-32k (#9165 )	2023-10-16 10:43:15 +08:00
Yang Wang	7a2de00b48	Fixes for xpu Bf16 training (#9156 ) * Support bf16 training * Use a stable transformer version * remove env * fix style	2023-10-14 21:28:59 -07:00
Cengguang Zhang	51a133de56	LLM: add fuse rope and norm optimization for Baichuan. (#9166 ) * add fuse rope optimization. * add rms norm optimization.	2023-10-13 17:36:52 +08:00
Jin Qiao	db7f938fdc	LLM: add replit and starcoder to gpu pytorch model example (#9154 )	2023-10-13 15:44:17 +08:00
Jin Qiao	797b156a0d	LLM: add dolly-v1 and dolly-v2 to gpu pytorch model example (#9153 )	2023-10-13 15:43:35 +08:00
Yishuo Wang	259cbb4126	[LLM] add initial bigdl-llm-init (#9150 )	2023-10-13 15:31:45 +08:00
Cengguang Zhang	433f408081	LLM: Add fuse rope and norm optimization for Aquila. (#9161 ) * add fuse norm optimization. * add fuse rope optimization	2023-10-13 14:18:37 +08:00
SONG Ge	e7aa67e141	[LLM] Add rope optimization for internlm (#9159 ) * add rope and norm optimization for internlm and gptneox * revert gptneox back and split with pr#9155 # * add norm_forward * style fix * update * update	2023-10-13 14:18:28 +08:00
Jin Qiao	f754ab3e60	LLM: add baichuan and baichuan2 to gpu pytorch model example (#9152 )	2023-10-13 13:44:31 +08:00
Ruonan Wang	b8aee7bb1b	LLM: Fix Qwen kv_cache optimization (#9148 ) * first commit * ut pass * accelerate rotate half by using common util function * fix style	2023-10-12 15:49:42 +08:00
binbin Deng	69942d3826	LLM: fix model check before attention optimization (#9149 )	2023-10-12 15:21:51 +08:00
JIN Qiao	1a1ddc4144	LLM: Add Replit CPU and GPU example (#9028 )	2023-10-12 13:42:14 +08:00
JIN Qiao	d74834ff4c	LLM: add gpu pytorch-models example llama2 and chatglm2 (#9142 )	2023-10-12 13:41:48 +08:00
Ruonan Wang	4f34557224	LLM: support num_beams in all-in-one benchmark (#9141 ) * support num_beams * fix	2023-10-12 13:35:12 +08:00
Ruonan Wang	62ac7ae444	LLM: fix inaccurate input / output tokens of current all-in-one benchmark (#9137 ) * first fix * fix all apis * fix	2023-10-11 17:13:34 +08:00
binbin Deng	eb3fb18eb4	LLM: improve PyTorch API doc (#9128 )	2023-10-11 15:03:39 +08:00
binbin Deng	995b0f119f	LLM: update some gpu examples (#9136 )	2023-10-11 14:23:56 +08:00
Ruonan Wang	1c8d5da362	LLM: fix llama tokenizer for all-in-one benchmark (#9129 ) * fix tokenizer for gpu benchmark * fix ipex fp16 * meet code review * fix	2023-10-11 13:39:39 +08:00
binbin Deng	2ad67a18b1	LLM: add mistral examples (#9121 )	2023-10-11 13:38:15 +08:00
Ruonan Wang	1363e666fc	LLM: update benchmark_util.py for beam search (#9126 ) * update reorder_cache * fix	2023-10-11 09:41:53 +08:00
Guoqiong Song	e8c5645067	add LLM example of aquila on GPU (#9056 ) * aquila, dolly-v1, dolly-v2, vacuna	2023-10-10 17:01:35 -07:00
Ruonan Wang	388f688ef3	LLM: update setup.py to add `bigdl-core-xe` package (#9122 )	2023-10-10 15:02:48 +08:00
Zhao Changmin	1709beba5b	LLM: Explicitly close pickle file pointer before removing temporary directory (#9120 ) * fp close	2023-10-10 14:57:23 +08:00
Yuwen Hu	0e09dd926b	[LLM] Fix example test (#9118 ) * Update llm example test link due to example layout change * Add better change detect	2023-10-10 13:24:18 +08:00
Ruonan Wang	ad7d9231f5	LLM: add benchmark script for Max gpu and ipex fp16 gpu (#9112 ) * add pvc bash * meet code review * rename to run-max-gpu.sh	2023-10-10 10:18:41 +08:00
binbin Deng	e4d1457a70	LLM: improve transformers style API doc (#9113 )	2023-10-10 09:31:00 +08:00
Yuwen Hu	65212451cc	[LLM] Small update to performance tests (#9106 ) * small updates to llm performance tests regarding model handling * Small fix	2023-10-09 16:55:25 +08:00
Zhao Changmin	edccfb2ed3	LLM: Check model device type (#9092 ) * check model device	2023-10-09 15:49:15 +08:00
binbin Deng	5e9962b60e	LLM: update example layout (#9046 )	2023-10-09 15:36:39 +08:00
Yina Chen	4c4f8d1663	[LLM]Fix Arc falcon abnormal output issue (#9096 ) * update * update * fix error & style * fix style * update train * to input_seq_size	2023-10-09 15:09:37 +08:00
Zhao Changmin	548e4dd5fe	LLM: Adapt transformers models for `optimize model` SL (#9022 ) * LLM: Adapt transformers model for SL	2023-10-09 11:13:44 +08:00
Ruonan Wang	f64257a093	LLM: basic api support for esimd fp16 (#9067 ) * basic api support for fp16 * fix style * fix * fix error and style * fix style * meet code review * update based on comments	2023-10-09 11:05:17 +08:00
JIN Qiao	65373d2a8b	LLM: adjust portable zip content (#9054 ) * LLM: adjust portable zip content * LLM: adjust portable zip README	2023-10-09 10:51:19 +08:00
Xin Qiu	b3e94a32d4	change log4error import (#9098 )	2023-10-08 09:23:28 +08:00
Kai Huang	78ea7ddb1c	Combine apply_rotary_pos_emb for gpt-neox (#9074 )	2023-10-07 16:27:46 +08:00
Yang Wang	36dd4afd61	Fix llama when rope scaling is not None (#9086 ) * Fix llama when rope scaling is not None * fix style * fix style	2023-10-06 13:27:37 -07:00
Yang Wang	fcb1c618a0	using bigdl-llm fused rope for llama (#9066 ) * optimize llama xpu rope * fix bug * fix style * refine append cache * remove check * do not cache cos sin * remove unnecessary changes * clean up * fix style * check for training	2023-10-06 09:57:29 -07:00
Jiao Wang	aefa5a5bfe	Qwen kv cache (#9079 ) * qwen and aquila * update * update * style	2023-10-05 11:59:17 -07:00
Jiao Wang	d5ca1f32b6	Aquila KV cache optimization (#9080 ) * update * update * style	2023-10-05 11:10:57 -07:00
Yang Wang	88565c76f6	add export merged model example (#9018 ) * add export merged model example * add sources * add script * fix style	2023-10-04 21:18:52 -07:00
Yang Wang	0cd8f1c79c	Use ipex fused rms norm for llama (#9081 ) * also apply rmsnorm * fix cpu	2023-10-04 21:04:55 -07:00
Cengguang Zhang	fb883100e7	LLM: support chatglm-18b convert attention forward in benchmark scripts. (#9072 ) * add chatglm-18b convert. * fix if statement. * fix	2023-09-28 14:04:52 +08:00
Yishuo Wang	6de2189e90	[LLM] fix chatglm main choice (#9073 )	2023-09-28 11:23:37 +08:00
Cengguang Zhang	ad62c58b33	LLM: Enable jemalloc in benchmark scripts. (#9058 ) * enable jemalloc. * fix readme.	2023-09-26 15:37:49 +08:00
Cengguang Zhang	b4a1266ef0	[WIP] LLM: add kv cache support for internlm. (#9036 ) * LLM: add kv cache support for internlm * add internlm apply_rotary_pos_emb * fix. * fix style.	2023-09-25 14:16:59 +08:00
Ruonan Wang	975da86e00	LLM: fix gptneox kv cache (#9044 )	2023-09-25 13:03:57 +08:00
Cengguang Zhang	26213a5829	LLM: Change benchmark bf16 load format. (#9035 ) * LLM: Change benchmark bf16 load format. * comment on bf16 chatglm. * fix.	2023-09-22 17:38:38 +08:00
JinBridge	023555fb1f	LLM: Add one-click installer for Windows (#8999 ) * LLM: init one-click installer for windows * LLM: fix typo in one-click installer readme * LLM: one-click installer try except logic * LLM: one-click installer add dependency * LLM: one-click installer adjust README.md * LLM: one-click installer split README and add zip compress in setup.bat * LLM: one-click installer verified internlm and llama2 and replace gif * LLM: remove one-click installer images * LLM: finetune the one-click installer README.md * LLM: fix typo in one-click installer README.md * LLM: rename one-click installer to protable executable * LLM: rename other places to protable executable * LLM: rename the zip filename to executable * LLM: update .gitignore * LLM: add colorama to setup.bat	2023-09-22 14:46:30 +08:00
Jiao Wang	028a6d9383	MPT model optimize for long sequence (#9020 ) * mpt_long_seq * update * update * update * style * style2 * update	2023-09-21 21:27:23 -07:00
Ruonan Wang	b943d73844	LLM: refactor kv cache (#9030 ) * refactor utils * meet code review; update all models * small fix	2023-09-21 21:28:03 +08:00
Cengguang Zhang	868511cf02	LLM: fix kv cache issue of bloom and falcon. (#9029 )	2023-09-21 18:12:20 +08:00
Ruonan Wang	bf51ec40b2	LLM: Fix empty cache (#9024 ) * fix * fix * update example	2023-09-21 17:16:07 +08:00
Yina Chen	714884414e	fix error (#9025 )	2023-09-21 16:42:11 +08:00
binbin Deng	edb225530b	add bark (#9016 )	2023-09-21 12:24:58 +08:00
SONG Ge	fa47967583	[LLM] Optimize kv_cache for gptj model family (#9010 ) * optimize gptj model family attention * add license and comment for dolly-model * remove xpu mentioned * remove useless info * code sytle * style fix * code style in gptj fix * remove gptj arch * move apply_rotary_pos_emb into utils * kv_seq_length update * use hidden_states instead of query layer to reach batch size	2023-09-21 10:42:08 +08:00
Cengguang Zhang	b3cad7de57	LLM: add bloom kv cache support (#9012 ) * LLM: add bloom kv cache support * fix style.	2023-09-20 21:10:53 +08:00
Kai Huang	156af15d1e	Add NF3 (#9008 ) * add nf3 * grammar	2023-09-20 20:03:07 +08:00
Kai Huang	6981745fe4	Optimize kv_cache for gpt-neox model family (#9015 ) * override gptneox * style * move to utils * revert	2023-09-20 19:59:19 +08:00
JinBridge	48b503c630	LLM: add example of aquila (#9006 ) * LLM: add example of aquila * LLM: replace AquilaChat with Aquila * LLM: shorten prompt of aquila example	2023-09-20 15:52:56 +08:00
Cengguang Zhang	735a17f7b4	LLM: add kv cache to falcon family. (#8995 ) * add kv cache to falcon family. * fix: import error. * refactor * update comments. * add two version falcon attention forward. * fix * fix. * fix. * fix. * fix style. * fix style.	2023-09-20 15:36:30 +08:00
Ruonan Wang	94a7f8917b	LLM: fix optimized kv cache for baichuan-13b (#9009 ) * fix baichuan 13b * fix style * fix * fix style	2023-09-20 15:30:14 +08:00

1 2 3 4 5 ...

448 commits