ipex-llm

Author	SHA1	Message	Date
binbin Deng	5d7e044dbc	LLM: add low bit option in deepspeed autotp example (#10382 )	2024-03-12 17:07:09 +08:00
binbin Deng	df3bcc0e65	LLM: remove english_quotes dataset (#10370 )	2024-03-12 16:57:40 +08:00
Zhao Changmin	df2b84f7de	Enable kv cache on arc batch (#10308 )	2024-03-12 16:46:04 +08:00
Lilac09	5809a3f5fe	Add run-hbm.sh & add user guide for spr and hbm (#10357 ) * add run-hbm.sh * add spr and hbm guide * only support quad mode * only support quad mode * update special cases * update special cases	2024-03-12 16:15:27 +08:00
binbin Deng	5d996a5caf	LLM: add benchmark script for deepspeed autotp on gpu (#10380 )	2024-03-12 15:19:57 +08:00
Keyan (Kyrie) Zhang	f9c144dc4c	Fix final logits ut failure (#10377 ) * Fix final logits ut failure * Fix final logits ut failure * Remove Falcon from completion test for now * Remove Falcon from unit test for now	2024-03-12 14:34:01 +08:00
Guancheng Fu	cc4148636d	[FastChat-integration] Add initial implementation for loader (#10323 ) * add initial implementation for loader * add test method for model_loader * data * Refine	2024-03-12 10:54:59 +08:00
WeiguangHan	17bdb1a60b	LLM: add whisper models into nightly test (#10193 ) * LLM: add whisper models into nightly test * small fix * small fix * add more whisper models * test all cases * test specific cases * collect the csv * store the resut * to html * small fix * small test * test all cases * modify whisper_csv_to_html	2024-03-11 20:00:47 +08:00
binbin Deng	dbcfc5c2fa	LLM: fix error of 'AI-ModelScope/phi-2' hosted by ModelScope hub (#10364 )	2024-03-11 16:19:17 +08:00
binbin Deng	fe27a6971c	LLM: update modelscope version (#10367 )	2024-03-11 16:18:27 +08:00
Chen, Zhentao	a425eaabfc	fix from_pretrained when device_map=None (#10361 ) * pr trigger * fix error when device_map=None * fix device_map=None	2024-03-11 16:06:12 +08:00
Yina Chen	d7b765fd3f	serving xpu memory opt (#10358 )	2024-03-11 15:21:22 +08:00
Ruonan Wang	be29833b2b	LLM: fix qwen2 (#10356 )	2024-03-11 09:29:08 +08:00
Zhicun	9026c08633	Fix llamaindex AutoTokenizer bug (#10345 ) * fix tokenizer * fix AutoTokenizer bug * modify code style	2024-03-08 16:24:50 +08:00
Zhicun	2a10b53d73	rename docqa.py->rag.py (#10353 )	2024-03-08 16:07:09 +08:00
Keyan (Kyrie) Zhang	f1825d7408	Add RMSNorm unit test (#10190 )	2024-03-08 15:51:03 +08:00
Shengsheng Huang	370c52090c	Langchain readme (#10348 ) * update langchain readme * update readme * create new README * Update README_nativeint4.md	2024-03-08 14:57:24 +08:00
Keyan (Kyrie) Zhang	7a621a4db0	Fix device_map bug by raise an error when using device_map=xpu (#10340 ) * Fix device_map bug by raise an error when using device_map=xpu * Fix sync error * Fix python style * Use invalidInputError instead of invalidOperationError	2024-03-08 13:38:52 +08:00
Yishuo Wang	1ac193ba02	add rope theta argument (#10343 )	2024-03-07 17:27:19 +08:00
Yuxuan Xia	0c8d3c9830	Add C-Eval HTML report (#10294 ) * Add C-Eval HTML report * Fix C-Eval workflow pr trigger path * Fix C-Eval workflow typos * Add permissions to C-Eval workflow * Fix C-Eval workflow typo * Add pandas dependency * Fix C-Eval workflow typo	2024-03-07 16:44:49 +08:00
Cengguang Zhang	496d18ab6d	LLM: add quantize kv cache support for baichuan 7b and 13b. (#10330 ) * add quantize kv cache for baichuan 7b and 13b. * fix typo. * fix. * fix style. * fix style.	2024-03-07 16:17:38 +08:00
hxsz1997	b7db21414e	Update llamaindex ut (#10338 ) * add test_llamaindex of gpu * add llamaindex gpu tests bash * add llamaindex cpu tests bash * update name of Run LLM langchain GPU test * import llama_index in llamaindex gpu ut * update the dependency of test_llamaindex * add Run LLM llamaindex GPU test * modify import dependency of llamaindex cpu test * add Run LLM llamaindex test * update llama_model_path * delete unused model path * add LLAMA2_7B_ORIGIN_PATH in llamaindex cpu test	2024-03-07 10:06:16 +08:00
ZehuaCao	267de7abc3	fix fschat DEP version error (#10325 )	2024-03-06 16:15:27 +08:00
Yina Chen	9ea499ca68	Optimize speculative decoding PVC memory usage (#10329 ) * optimize memory * update * update * update * support other models * update * fix style	2024-03-06 09:54:21 +08:00
dingbaorong	cc796848ea	fix typos (#10274 ) Co-authored-by: Ariadne <wyn2000330@126.com>	2024-03-05 18:38:22 +08:00
hxsz1997	af11c53473	Add the installation step of postgresql and pgvector on windows in LlamaIndex GPU support (#10328 ) * add the installation of postgresql and pgvector of windows * fix some format	2024-03-05 18:31:19 +08:00
Yishuo Wang	0011ff9f64	optimize bge large performance (#10324 )	2024-03-05 17:06:03 +08:00
Shaojun Liu	178eea5009	upload bigdl-llm wheel to sourceforge for backup (#10321 ) * test: upload to sourceforge * update scripts * revert	2024-03-05 16:36:01 +08:00
Cengguang Zhang	30d009bca7	LLM: support quantized kv cache for Mistral in transformers >=4.36.0 (#10326 ) * support quantize kv for mistral in transformers 4.36 * update mistral support. * fix style.	2024-03-05 16:23:50 +08:00
dingbaorong	1e6f0c6f1a	Add llamaindex gpu example (#10314 ) * add llamaindex example * fix core dump * refine readme * add trouble shooting * refine readme --------- Co-authored-by: Ariadne <wyn2000330@126.com>	2024-03-05 13:36:00 +08:00
dingbaorong	fc7f10cd12	add langchain gpu example (#10277 ) * first draft * fix * add readme for transformer_int4_gpu * fix doc * check device_map * add arc ut test * fix ut test * fix langchain ut * Refine README * fix gpu mem too high * fix ut test --------- Co-authored-by: Ariadne <wyn2000330@126.com>	2024-03-05 13:33:57 +08:00
Yuwen Hu	5dbbe1a826	[LLM] Support for new arc ut runner (#10311 ) * Support for new arc ut runner * Comment unnecessary OMP_NUM_THREADS related settings for arc uts	2024-03-04 18:42:02 +08:00
Yuwen Hu	d45e577d8c	[LLM] Test `load_low_bit` in iGPU perf test on Windows (#10313 )	2024-03-04 18:03:57 +08:00
WeiguangHan	fd81d66047	LLM: Compress some models to save space (#10315 ) * LLM: compress some models to save space * add deleted comments	2024-03-04 17:53:03 +08:00
Shaojun Liu	bab2ee5f9e	update nightly spr perf test (#10178 ) * update nightly spr perf test * update * update runner lable * update * update * update folder * revert	2024-03-04 13:46:33 +08:00
Cengguang Zhang	ab9fc2485f	LLM: add quantize kv support for llama transformer 4.36 (#10298 ) * add quantize kv support for llama transformer 4.36 * fix style. * fix style.	2024-03-04 10:33:35 +08:00
Xin Qiu	58208a5883	Update FAQ document. (#10300 ) * Update install_gpu.md * Update resolve_error.md * Update README.md * Update resolve_error.md * Update README.md * Update resolve_error.md	2024-03-04 08:35:11 +08:00
Yuwen Hu	27d9a14989	[LLM] all-on-one update: memory optimize and streaming output (#10302 ) * Memory saving for continous in-out pair run and add support for streaming output on MTL iGPU * Small fix * Small fix * Add things back	2024-03-01 18:02:30 +08:00
SONG Ge	0ab40917fb	[LLM] Split merged_qk to separated q/k linear (#10299 ) * modify merge_qk_linear to separated q/k linear * update	2024-03-01 16:48:55 +08:00
Yang Wang	f4d7dbcde2	use fused qkv forward in qwen2 (#10185 ) * use fused qkv forward in qwen2 * support both * fix style * fix rope * remove pring * fix style * clean up	2024-03-01 16:46:35 +08:00
Xin Qiu	509e206de0	update doc about gemma random and unreadable output. (#10297 ) * Update install_gpu.md * Update README.md * Update README.md	2024-03-01 15:41:16 +08:00
Wang, Jian4	beb9433cec	LLM: Reduce speculative _ipex_optimize_model memory use (#10281 ) * use tpp * update ipex	2024-03-01 13:48:23 +08:00
Yuwen Hu	f0ff0eebe1	[LLM] Support quantize kv cache for Baichuan2 7B (#10280 ) * Add quatized kv cache framework for Baichuan2 7B * Support quantize kv cache for baichuan2 * Small fix * Fix python style	2024-03-01 13:35:42 +08:00
SONG Ge	273de341d7	hot-fix silu error import (#10292 )	2024-03-01 10:11:37 +08:00
Shengsheng Huang	bcfad555df	revise llamaindex readme (#10283 )	2024-02-29 17:19:23 +08:00
Xin Qiu	232273a1b5	Enable Gemma fused mlp + Gelu (#10276 ) * update llama mlp forward * add all * fix style check * split * update * update * update * fix style	2024-02-29 16:53:24 +08:00
Guancheng Fu	2d930bdca8	Add vLLM bf16 support (#10278 ) * add argument load_in_low_bit * add docs * modify gpu doc * done --------- Co-authored-by: ivy-lv11 <lvzc@lamda.nju.edu.cn>	2024-02-29 16:33:42 +08:00
SONG Ge	13b0bc9075	[LLM] Add quantize_kv optimization for yuan2 model (#10243 ) * add initial quantize_kv support for yuan2 model * fix yuan2 quantize_kv generation * apply fp16 conv layer optimizations * disable mlp for quantize_kv	2024-02-29 16:33:26 +08:00
Zhicun	4e6cc424f1	Add LlamaIndex RAG (#10263 ) * run demo * format code * add llamaindex * add custom LLM with bigdl * update * add readme * begin ut * add unit test * add license * add license * revised * update * modify docs * remove data folder * update * modify prompt * fixed * fixed * fixed	2024-02-29 15:21:19 +08:00
Jin Qiao	5d7243067c	LLM: add Baichuan2-13B-Chat 2048-256 to MTL perf (#10273 )	2024-02-29 13:48:55 +08:00
Ruonan Wang	a9fd20b6ba	LLM: Update qkv fusion for GGUF-IQ2 (#10271 ) * first commit * update mistral * fix transformers==4.36.0 * fix * disable qk for mixtral now * fix style	2024-02-29 12:49:53 +08:00
Jiao Wang	6fb65bb9d2	fix in transformers 4.36 (#10150 )	2024-02-28 18:43:01 -08:00
Shengsheng Huang	43dac97e03	Update README.md (#10260 )	2024-02-29 10:41:14 +08:00
Ruonan Wang	4b08bc1417	LLM: relax batch check of flash atttention by double check attention mask (#10270 ) * relax batch check * fix * fix style	2024-02-29 09:39:55 +08:00
Yina Chen	07f36fbfcc	Fix gptj failed to extend (#10269 )	2024-02-29 09:39:27 +08:00
Yishuo Wang	cccb02dad1	fix baichuan2 13b 2k input (#10267 )	2024-02-28 17:20:20 +08:00
Heyang Sun	7244fd1ba5	Fix Arc StarCoder wrong query_shape when input is long (#10268 ) * Fix Arc StarCoder wrong query_shape when input is long * Update gptbigcode.py	2024-02-28 17:07:08 +08:00
Cengguang Zhang	a4de3095f3	LLM: Support quantize kv cache in mistral. (#10261 ) * init * update quantize kv.	2024-02-28 14:08:08 +08:00
Shengsheng Huang	db0d129226	Revert "Add rwkv example (#9432 )" (#10264 ) This reverts commit `6930422b42`.	2024-02-28 11:48:31 +08:00
Yining Wang	6930422b42	Add rwkv example (#9432 ) * codeshell fix wrong urls * restart runner * add RWKV CPU & GPU example (rwkv-4-world-7b) * restart runner * update submodule * fix runner * runner-test --------- Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>	2024-02-28 11:41:00 +08:00
Keyan (Kyrie) Zhang	59861f73e5	Add Deepseek-6.7B (#9991 ) * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * modify deepseek * modify deepseek * Add verified model in README * Turn cpu_embedding=True in Deepseek example --------- Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>	2024-02-28 11:36:39 +08:00
Yuxuan Xia	2524273198	Update AutoGen README (#10255 ) * Update AutoGen README * Fix AutoGen README typos * Update AutoGen README * Update AutoGen README	2024-02-28 11:34:45 +08:00
Zheng, Yi	2347f611cf	Add cpu and gpu examples of Mamba (#9797 ) * Add mamba cpu example * Add mamba gpu example * Use a smaller model as the example * minor fixes --------- Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>	2024-02-28 11:33:29 +08:00
Zhao Changmin	937e1f7c74	rebase (#9104 ) Co-authored-by: leonardozcm <leonardozcm@gmail.com>	2024-02-28 11:18:21 +08:00
JunX	4833067489	fix GPU example link in README.md (#9533 ) * fix GPU example link in README.md * fix GPU links in llm README.md	2024-02-28 11:13:18 +08:00
Zhicun	308e637d0d	Add DeepSeek-MoE-16B-Chat (#10155 ) * dsmoe-hf add * add dsmoe pytorch * update README * modify comment * remove GPU example * update model name * format code	2024-02-28 10:12:09 +08:00
Guoqiong Song	f4a2e32106	Stream llm example for both GPU and CPU (#9390 )	2024-02-27 15:54:47 -08:00
Yang Wang	c581c6db30	draft mmint4 (#10031 ) change to llm.cpp support transposed format revert implement qkv fuse fix style change to vertically pack change to enable_xetla fix mlp_fusion_check remove comments address comments add some comments fix style	2024-02-27 14:55:16 -08:00
hxsz1997	cba61a2909	Add html report of ppl (#10218 ) * remove include and language option, select the corresponding dataset based on the model name in Run * change the nightly test time * change the nightly test time of harness and ppl * save the ppl result to json file * generate csv file and print table result * generate html * modify the way to get parent folder * update html in parent folder * add llm-ppl-summary and llm-ppl-summary-html * modify echo single result * remove download fp16.csv * change model name of PR * move ppl nightly related files to llm/test folder * reformat * seperate make_table from make_table_and_csv.py * separate make_csv from make_table_and_csv.py * update llm-ppl-html * remove comment * add Download fp16.results	2024-02-27 17:37:08 +08:00
Zhicun	6d60982746	Env script: add license (#10257 ) * env script * update README.md * modify README * modify cpu info output * add env-check.sh * add env-check.bat * add windows * modify bat * add license	2024-02-27 15:29:20 +08:00
Yishuo Wang	b4fa4ab46f	optimize yuan 2.0 again (#10252 )	2024-02-27 14:51:42 +08:00
Zhicun	03b9c4930a	UX: Script to print env info (#10088 ) * env script * update README.md * modify README * modify cpu info output * add env-check.sh * add env-check.bat * add windows * modify bat	2024-02-27 14:45:36 +08:00
Keyan (Kyrie) Zhang	843fe546b0	Add CPU and GPU examples for DeciLM-7B (#9867 ) * Add cpu and gpu examples for DeciLM-7B * Add cpu and gpu examples for DeciLM-7B * Add DeciLM-7B to README table * modify deciLM * modify deciLM * modify deciLM * Add verified model in README * Add cpu_embedding=True	2024-02-27 13:15:49 +08:00
Yuwen Hu	38ae4b372f	Add yuan2-2b to win igpu perf test (#10250 )	2024-02-27 11:08:33 +08:00
Heyang Sun	36a9e88104	Speculative Starcoder on CPU (#10138 ) * Speculative Starcoder on CPU * enable kv-cache pre-allocation * refine codes * refine * fix style * fix style * fix style * refine * refine * Update speculative.py * Update gptbigcode.py * fix style * Update speculative.py * enable mixed-datatype layernorm on top of torch API * adaptive dtype * Update README.md	2024-02-27 09:57:29 +08:00
Yishuo Wang	a47989c860	optimize yuan 2.0 performance (#10244 )	2024-02-26 17:20:10 +08:00
Wang, Jian4	6c74b99a28	LLM: Update qwen readme (#10245 )	2024-02-26 17:03:09 +08:00
hxsz1997	15ad2fd72e	Merge pull request #10226 from zhentaocc/fix_harness Fix harness	2024-02-26 16:49:27 +08:00
Wang, Jian4	f9b75f900b	LLM: Enable qwen target_model ipex (#10232 ) * change order * enable qwen ipex * update qwen example * update * fix style * update	2024-02-26 16:41:12 +08:00
Jin Qiao	3e6d188553	LLM: add baichuan2-13b to mtl perf (#10238 )	2024-02-26 15:55:56 +08:00
Yuwen Hu	e38e29511c	[LLM] Yuan2 MLP and Rotary optimization (#10231 ) * Add optimization for rotary embedding * Add mlp fused optimizatgion * Python style fix * Fix rotary embedding due to logits difference * Small fix	2024-02-26 15:10:08 +08:00
Ziteng Zhang	ea23afc8ec	[LLM]update ipex part in mistral example readme (#10239 ) * update ipex part in mistral example readme	2024-02-26 14:35:20 +08:00
SONG Ge	df2f3885ba	[LLM] Enable kv_cache and forward_qkv optimizations for yuan2 (#10225 ) * add init kv_cache support for yuan2 * add forward qkv in yuan	2024-02-26 11:29:48 +08:00
Xiangyu Tian	85a99e13e8	LLM: Fix ChatGLM3 Speculative Example (#10236 ) Fix ChatGLM3 Speculative Example.	2024-02-26 10:57:28 +08:00
Chen, Zhentao	213ef06691	fix readme	2024-02-24 00:38:08 +08:00
Ruonan Wang	28513f3978	LLM: support fp16 embedding & add mlp fusion for iq2_xxs (#10219 ) * add fp16 embed * small fixes * fix style * fix style * fix comment	2024-02-23 17:26:24 +08:00
Yuwen Hu	eeecd9fc08	Python style fix (#10230 )	2024-02-23 17:21:23 +08:00
Yuwen Hu	e511bbd8f1	[LLM] Add basic optimization framework for Yuan2 (#10227 ) * Add basic optimization framework for Yuan2 * Small fix * Python style fix * Small fix * Small fix	2024-02-23 17:05:00 +08:00
Xin Qiu	8ef5482da2	update Gemma readme (#10229 ) * Update README.md * Update README.md * Update README.md * Update README.md	2024-02-23 16:57:08 +08:00
Chen, Zhentao	6fe5344fa6	separate make_csv from the file	2024-02-23 16:33:38 +08:00
Chen, Zhentao	bfa98666a6	fall back to make_table.py	2024-02-23 16:33:38 +08:00
Ruonan Wang	19260492c7	LLM: fix action/installation error of mpmath (#10223 ) * fix * test * fix * update	2024-02-23 16:14:53 +08:00
Xin Qiu	aabfc06977	add gemma example (#10224 ) * add gemma gpu example * Update README.md * add cpu example * Update README.md * Update README.md * Update generate.py * Update generate.py	2024-02-23 15:20:57 +08:00
yb-peng	a2c1675546	Add CPU and GPU examples for Yuan2-2B-hf (#9946 ) * Add a new CPU example of Yuan2-2B-hf * Add a new CPU generate.py of Yuan2-2B-hf example * Add a new GPU example of Yuan2-2B-hf * Add Yuan2 to README table * In CPU example:1.Use English as default prompt; 2.Provide modified files in yuan2-2B-instruct * In GPU example:1.Use English as default prompt;2.Provide modified files * GPU example:update README * update Yuan2-2B-hf in README table * Add CPU example for Yuan2-2B in Pytorch-Models * Add GPU example for Yuan2-2B in Pytorch-Models * Add license in generate.py; Modify README * In GPU Add license in generate.py; Modify README * In CPU yuan2 modify README * In GPU yuan2 modify README * In CPU yuan2 modify README * In GPU example, updated the readme for Windows GPU supports * In GPU torch example, updated the readme for Windows GPU supports * GPU hf example README modified * GPU example README modified	2024-02-23 14:09:30 +08:00
yb-peng	f1f4094a09	Add CPU and GPU examples of phi-2 (#10014 ) * Add CPU and GPU examples of phi-2 * In GPU hf example, updated the readme for Windows GPU supports * In GPU torch example, updated the readme for Windows GPU supports * update the table in BigDL/README.md * update the table in BigDL/python/llm/README.md	2024-02-23 14:05:53 +08:00
Chen, Zhentao	f315c7f93a	Move harness nightly related files to llm/test folder (#10209 ) * move harness nightly files to test folder * change workflow file path accordingly * use arc01 when pr * fix path * fix fp16 csv path	2024-02-23 11:12:36 +08:00
Xin Qiu	30795bdfbc	Gemma optimization: rms_norm, kv_cache, fused_rope, fused_rope+qkv (#10212 ) * gemma optimization * update * update * fix style * meet code review	2024-02-23 10:07:24 +08:00
Guoqiong Song	63681af97e	falcon for transformers 4.36 (#9960 ) * falcon for transformers 4.36	2024-02-22 17:04:40 -08:00
Jason Dai	84d5f40936	Update README.md (#10213 )	2024-02-22 17:22:59 +08:00
Yina Chen	ce5840a8b7	GPT-J rope optimization on xpu (#10182 ) * optimize * update * fix style & move use_fuse_rope * add ipex version check * fix style * update * fix style * meet comments * address comments * fix style	2024-02-22 16:25:12 +08:00

1 2 3 4 5 ...

1058 commits