ipex-llm

Author	SHA1	Message	Date
Jiao Wang	d5ca1f32b6	Aquila KV cache optimization (#9080 ) * update * update * style	2023-10-05 11:10:57 -07:00
Jason Dai	7506100bd5	Update readme (#9084 )	2023-10-05 16:54:09 +08:00
Yang Wang	88565c76f6	add export merged model example (#9018 ) * add export merged model example * add sources * add script * fix style	2023-10-04 21:18:52 -07:00
Yang Wang	0cd8f1c79c	Use ipex fused rms norm for llama (#9081 ) * also apply rmsnorm * fix cpu	2023-10-04 21:04:55 -07:00
Cengguang Zhang	fb883100e7	LLM: support chatglm-18b convert attention forward in benchmark scripts. (#9072 ) * add chatglm-18b convert. * fix if statement. * fix	2023-09-28 14:04:52 +08:00
Yishuo Wang	6de2189e90	[LLM] fix chatglm main choice (#9073 )	2023-09-28 11:23:37 +08:00
binbin Deng	760183bac6	LLM: update key feature and installation page of document (#9068 )	2023-09-27 15:44:34 +08:00
Lilac09	c91b2bd574	fix:modify indentation (#9070 ) * modify Dockerfile * add README.md * add README.md * Modify Dockerfile * Add bigdl inference cpu image build * Add bigdl llm cpu image build * Add bigdl llm cpu image build * Add bigdl llm cpu image build * Modify Dockerfile * Add bigdl inference cpu image build * Add bigdl inference cpu image build * Add bigdl llm xpu image build * manually build * recover file * manually build * recover file * modify indentation	2023-09-27 14:53:52 +08:00
Cengguang Zhang	ad62c58b33	LLM: Enable jemalloc in benchmark scripts. (#9058 ) * enable jemalloc. * fix readme.	2023-09-26 15:37:49 +08:00
Lilac09	ecee02b34d	Add bigdl llm xpu image build (#9062 ) * modify Dockerfile * add README.md * add README.md * Modify Dockerfile * Add bigdl inference cpu image build * Add bigdl llm cpu image build * Add bigdl llm cpu image build * Add bigdl llm cpu image build * Modify Dockerfile * Add bigdl inference cpu image build * Add bigdl inference cpu image build * Add bigdl llm xpu image build	2023-09-26 14:29:03 +08:00
Lilac09	9ac950fa52	Add bigdl llm cpu image build (#9047 ) * modify Dockerfile * add README.md * add README.md * Modify Dockerfile * Add bigdl inference cpu image build * Add bigdl llm cpu image build * Add bigdl llm cpu image build * Add bigdl llm cpu image build	2023-09-26 13:22:11 +08:00
Ziteng Zhang	a717352c59	Replace Llama 7b to Llama2-7b in README.md (#9055 ) * Replace Llama 7b with Llama2-7b in README.md Need to replace the base model to Llama2-7b as we are operating on Llama2 here. * Replace Llama 7b to Llama2-7b in README.md a llama 7b in the 1st line is missed * Update architecture graph --------- Co-authored-by: Heyang Sun <60865256+Uxito-Ada@users.noreply.github.com>	2023-09-26 09:56:46 +08:00
Guancheng Fu	cc84ed70b3	Create serving images (#9048 ) * Finished & Tested * Install latest pip from base images * Add blank line * Delete unused comment * fix typos	2023-09-25 15:51:45 +08:00
Cengguang Zhang	b4a1266ef0	[WIP] LLM: add kv cache support for internlm. (#9036 ) * LLM: add kv cache support for internlm * add internlm apply_rotary_pos_emb * fix. * fix style.	2023-09-25 14:16:59 +08:00
Ruonan Wang	975da86e00	LLM: fix gptneox kv cache (#9044 )	2023-09-25 13:03:57 +08:00
Heyang Sun	4b843d1dbf	change lora-model output behavior on k8s (#9038 ) Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2023-09-25 09:28:44 +08:00
Cengguang Zhang	26213a5829	LLM: Change benchmark bf16 load format. (#9035 ) * LLM: Change benchmark bf16 load format. * comment on bf16 chatglm. * fix.	2023-09-22 17:38:38 +08:00
JinBridge	023555fb1f	LLM: Add one-click installer for Windows (#8999 ) * LLM: init one-click installer for windows * LLM: fix typo in one-click installer readme * LLM: one-click installer try except logic * LLM: one-click installer add dependency * LLM: one-click installer adjust README.md * LLM: one-click installer split README and add zip compress in setup.bat * LLM: one-click installer verified internlm and llama2 and replace gif * LLM: remove one-click installer images * LLM: finetune the one-click installer README.md * LLM: fix typo in one-click installer README.md * LLM: rename one-click installer to protable executable * LLM: rename other places to protable executable * LLM: rename the zip filename to executable * LLM: update .gitignore * LLM: add colorama to setup.bat	2023-09-22 14:46:30 +08:00
Jiao Wang	028a6d9383	MPT model optimize for long sequence (#9020 ) * mpt_long_seq * update * update * update * style * style2 * update	2023-09-21 21:27:23 -07:00
Lilac09	9126abdf9b	add README.md for bigdl-llm-cpu image (#9026 ) * modify Dockerfile * add README.md * add README.md	2023-09-22 09:03:57 +08:00
Ruonan Wang	b943d73844	LLM: refactor kv cache (#9030 ) * refactor utils * meet code review; update all models * small fix	2023-09-21 21:28:03 +08:00
Cengguang Zhang	868511cf02	LLM: fix kv cache issue of bloom and falcon. (#9029 )	2023-09-21 18:12:20 +08:00
Ruonan Wang	bf51ec40b2	LLM: Fix empty cache (#9024 ) * fix * fix * update example	2023-09-21 17:16:07 +08:00
Yina Chen	714884414e	fix error (#9025 )	2023-09-21 16:42:11 +08:00
binbin Deng	edb225530b	add bark (#9016 )	2023-09-21 12:24:58 +08:00
SONG Ge	fa47967583	[LLM] Optimize kv_cache for gptj model family (#9010 ) * optimize gptj model family attention * add license and comment for dolly-model * remove xpu mentioned * remove useless info * code sytle * style fix * code style in gptj fix * remove gptj arch * move apply_rotary_pos_emb into utils * kv_seq_length update * use hidden_states instead of query layer to reach batch size	2023-09-21 10:42:08 +08:00
Guancheng Fu	3913ba4577	add README.md (#9004 )	2023-09-21 10:32:56 +08:00
Cengguang Zhang	b3cad7de57	LLM: add bloom kv cache support (#9012 ) * LLM: add bloom kv cache support * fix style.	2023-09-20 21:10:53 +08:00
Kai Huang	156af15d1e	Add NF3 (#9008 ) * add nf3 * grammar	2023-09-20 20:03:07 +08:00
Kai Huang	6981745fe4	Optimize kv_cache for gpt-neox model family (#9015 ) * override gptneox * style * move to utils * revert	2023-09-20 19:59:19 +08:00
JinBridge	48b503c630	LLM: add example of aquila (#9006 ) * LLM: add example of aquila * LLM: replace AquilaChat with Aquila * LLM: shorten prompt of aquila example	2023-09-20 15:52:56 +08:00
Cengguang Zhang	735a17f7b4	LLM: add kv cache to falcon family. (#8995 ) * add kv cache to falcon family. * fix: import error. * refactor * update comments. * add two version falcon attention forward. * fix * fix. * fix. * fix. * fix style. * fix style.	2023-09-20 15:36:30 +08:00
Ruonan Wang	94a7f8917b	LLM: fix optimized kv cache for baichuan-13b (#9009 ) * fix baichuan 13b * fix style * fix * fix style	2023-09-20 15:30:14 +08:00
Yang Wang	c88f6ec457	Experiment XPU QLora Finetuning (#8937 ) * Support xpu finetuning * support xpu finetuning * fix style * fix style * fix style * refine example * add readme * refine readme * refine api * fix fp16 * fix example * refactor * fix style * fix compute type * add qlora * refine training args * fix example * fix style * fast path forinference * address comments * refine readme * revert lint	2023-09-19 10:15:44 -07:00
Jason Dai	51518e029d	Update llm readme (#9005 )	2023-09-19 20:01:33 +08:00
Ruonan Wang	249386261c	LLM: add Baichuan2 cpu example (#9002 ) * add baichuan2 cpu examples * add link * update prompt	2023-09-19 18:08:30 +08:00
Yuwen Hu	c389e1323d	fix xpu performance tests by making sure that latest bigdl-core-xe is installed (#9001 )	2023-09-19 17:33:30 +08:00
Guancheng Fu	b6c9198d47	Add xpu image for bigdl-llm (#9003 ) * Add xpu image * fix * fix * fix format	2023-09-19 16:56:22 +08:00
Ruonan Wang	004c45c2be	LLM: Support optimized kv_cache for baichuan family (#8997 ) * add initial support for baichuan attantion * support baichuan1 * update based on comment * update based on comment * support baichuan2 * update link, change how to jusge baichuan2 * fix style * add model parameter for pob emb * update based on comment	2023-09-19 15:38:54 +08:00
Xin Qiu	37bb0cbf8f	Speed up gpt-j in gpubenchmark (#9000 ) * Speedup gpt-j in gpubenchmark * meet code review	2023-09-19 14:22:28 +08:00
Zhao Changmin	2a05581da7	LLM: Apply `low_cpu_mem_usage` algorithm on `optimize_model` API (#8987 ) * low_cpu_mem_usage	2023-09-18 21:41:42 +08:00
Cengguang Zhang	8299b68fea	update readme. (#8996 )	2023-09-18 17:06:15 +08:00
binbin Deng	c1d25a51a8	LLM: add `optimize_model` example for bert (#8975 )	2023-09-18 16:18:35 +08:00
Cengguang Zhang	74338fd291	LLM: add auto torch dtype in benchmark. (#8981 )	2023-09-18 15:48:25 +08:00
Ruonan Wang	cabe7c0358	LLM: add baichuan2 example for arc (#8994 ) * add baichuan2 examples * add link * small fix	2023-09-18 14:32:27 +08:00
Guancheng Fu	7353882732	add Dockerfile (#8993 )	2023-09-18 13:25:37 +08:00
binbin Deng	0a552d5bdc	LLM: fix installation on windows (#8989 )	2023-09-18 11:14:54 +08:00
Xiangyu Tian	52878d3e5f	[PPML] Enable TLS in Attestation API Serving for LLM finetuning (#8945 ) Add enableTLS flag to enable TLS in Attestation API Serving for LLM finetuning.	2023-09-18 09:32:25 +08:00
Ruonan Wang	32716106e0	update use_cahce=True (#8986 )	2023-09-18 07:59:33 +08:00
Xin Qiu	64ee1d7689	update run_transformer_int4_gpu (#8983 ) * xpuperf * update run.py * clean upo * uodate * update * meet code review	2023-09-15 15:10:04 +08:00

1 2 3 4 5 ...

1438 commits