ipex-llm

Author	SHA1	Message	Date
Jin Qiao	ca372f6dab	LLM: add save/load example for ModelScope (#10397 ) * LLM: add sl example for modelscope * fix according to comments * move file	2024-03-15 15:17:50 +08:00
Xin Qiu	24473e331a	Qwen2 fp16 sdp (#10427 ) * qwen2 sdp and refine * update * update * fix style * remove use_flash_attention	2024-03-15 13:12:03 +08:00
Kai Huang	1315150e64	Add baichuan2-13b 1k to arc nightly perf (#10406 )	2024-03-15 10:29:11 +08:00
Ruonan Wang	b036205be2	LLM: add fp8 sdp for chatglm2/3 (#10411 ) * add fp8 sdp for chatglm2 * fix style	2024-03-15 09:38:18 +08:00
Wang, Jian4	fe8976a00f	LLM: Support gguf models use low_bit and fix no json(#10408 ) * support others model use low_bit * update readme * update to add *.json	2024-03-15 09:34:18 +08:00
Xin Qiu	cda38f85a9	Qwen fp16 sdp (#10401 ) * qwen sdp * fix * update * update * update sdp * update * fix style check * add to origin type	2024-03-15 08:51:50 +08:00
dingbaorong	1c0f7ed3fa	add xpu support (#10419 )	2024-03-14 17:13:48 +08:00
Heyang Sun	7d29765092	refactor qwen2 forward to enable XPU (#10409 ) * refactor awen2 forward to enable XPU * Update qwen2.py	2024-03-14 11:03:05 +08:00
Yuxuan Xia	f36224aac4	Fix ceval run.sh (#10410 )	2024-03-14 10:57:25 +08:00
ZehuaCao	f66329e35d	Fix multiple get_enable_ipex function error (#10400 ) * fix multiple get_enable_ipex function error * remove get_enable_ipex_low_bit function	2024-03-14 10:14:13 +08:00
Kai Huang	76e30d8ec8	Empty cache for lm_head (#10317 ) * empty cache * add comments	2024-03-13 20:31:53 +08:00
Ruonan Wang	2be8bbd236	LLM: add cpp option in setup.py (#10403 ) * add llama_cpp option * meet code review	2024-03-13 20:12:59 +08:00
Ovo233	0dbce53464	LLM: Add decoder/layernorm unit tests (#10211 ) * add decoder/layernorm unit tests * update tests * delete decoder tests * address comments * remove none type check * restore nonetype checks * delete nonetype checks; add decoder tests for Llama * add gc * deal with tuple output	2024-03-13 19:41:47 +08:00
Yishuo Wang	06a851afa9	support new baichuan model (#10404 )	2024-03-13 17:45:50 +08:00
Yuxuan Xia	a90e9b6ec2	Fix C-Eval Workflow (#10359 ) * Fix Baichuan2 prompt format * Fix ceval workflow errors * Fix ceval workflow error * Fix ceval error * Fix ceval error * Test ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Add ceval dependency test * Fix ceval * Fix ceval * Test full ceval * Test full ceval * Fix ceval * Fix ceval	2024-03-13 17:23:17 +08:00
Yishuo Wang	b268baafd6	use fp8 sdp in llama (#10396 )	2024-03-13 16:45:38 +08:00
Xiangyu Tian	60043a3ae8	LLM: Support Baichuan2-13b in BigDL-vLLM (#10398 ) Support Baichuan2-13b in BigDL-vLLM.	2024-03-13 16:21:06 +08:00
Xiangyu Tian	e10de2c42d	[Fix] LLM: Fix condition check error for speculative decoding on CPU (#10402 ) Fix condition check error for speculative decoding on CPU	2024-03-13 16:05:06 +08:00
Keyan (Kyrie) Zhang	f158b49835	[LLM] Recover arc ut test for Falcon (#10385 )	2024-03-13 13:31:35 +08:00
Heyang Sun	d72c0fad0d	Qwen2 SDPA forward on CPU (#10395 ) * Fix Qwen1.5 CPU forward * Update convert.py * Update qwen2.py	2024-03-13 13:10:03 +08:00
Yishuo Wang	ca58a69b97	fix arc rms norm UT (#10394 )	2024-03-13 13:09:15 +08:00
Wang, Jian4	0193f29411	LLM : Enable gguf float16 and Yuan2 model (#10372 ) * enable float16 * add yun files * enable yun * enable set low_bit on yuan2 * update * update license * update generate * update readme * update python style * update	2024-03-13 10:19:18 +08:00
Yina Chen	f5d65203c0	First token lm_head optimization (#10318 ) * add lm head linear * update * address comments and fix style * address comment	2024-03-13 10:11:32 +08:00
Keyan (Kyrie) Zhang	7cf01e6ec8	Add LangChain upstream ut test (#10349 ) * Add LangChain upstream ut test * Add LangChain upstream ut test * Specify version numbers in yml script * Correct langchain-community version	2024-03-13 09:52:45 +08:00
Xin Qiu	28c4a8cf5c	Qwen fused qkv (#10368 ) * fused qkv + rope for qwen * quantized kv cache * fix * update qwen * fixed quantized qkv * fix * meet code review * update split * convert.py * extend when no enough kv * fix	2024-03-12 17:39:00 +08:00
Yishuo Wang	741c2bf1df	use new rms norm (#10384 )	2024-03-12 17:29:51 +08:00
Xiangyu Tian	0ded0b4b13	LLM: Enable BigDL IPEX optimization for int4 (#10319 ) Enable BigDL IPEX optimization for int4	2024-03-12 17:08:50 +08:00
binbin Deng	5d7e044dbc	LLM: add low bit option in deepspeed autotp example (#10382 )	2024-03-12 17:07:09 +08:00
binbin Deng	df3bcc0e65	LLM: remove english_quotes dataset (#10370 )	2024-03-12 16:57:40 +08:00
Zhao Changmin	df2b84f7de	Enable kv cache on arc batch (#10308 )	2024-03-12 16:46:04 +08:00
Lilac09	5809a3f5fe	Add run-hbm.sh & add user guide for spr and hbm (#10357 ) * add run-hbm.sh * add spr and hbm guide * only support quad mode * only support quad mode * update special cases * update special cases	2024-03-12 16:15:27 +08:00
binbin Deng	5d996a5caf	LLM: add benchmark script for deepspeed autotp on gpu (#10380 )	2024-03-12 15:19:57 +08:00
Keyan (Kyrie) Zhang	f9c144dc4c	Fix final logits ut failure (#10377 ) * Fix final logits ut failure * Fix final logits ut failure * Remove Falcon from completion test for now * Remove Falcon from unit test for now	2024-03-12 14:34:01 +08:00
Guancheng Fu	cc4148636d	[FastChat-integration] Add initial implementation for loader (#10323 ) * add initial implementation for loader * add test method for model_loader * data * Refine	2024-03-12 10:54:59 +08:00
WeiguangHan	17bdb1a60b	LLM: add whisper models into nightly test (#10193 ) * LLM: add whisper models into nightly test * small fix * small fix * add more whisper models * test all cases * test specific cases * collect the csv * store the resut * to html * small fix * small test * test all cases * modify whisper_csv_to_html	2024-03-11 20:00:47 +08:00
binbin Deng	dbcfc5c2fa	LLM: fix error of 'AI-ModelScope/phi-2' hosted by ModelScope hub (#10364 )	2024-03-11 16:19:17 +08:00
binbin Deng	fe27a6971c	LLM: update modelscope version (#10367 )	2024-03-11 16:18:27 +08:00
Chen, Zhentao	a425eaabfc	fix from_pretrained when device_map=None (#10361 ) * pr trigger * fix error when device_map=None * fix device_map=None	2024-03-11 16:06:12 +08:00
Yina Chen	d7b765fd3f	serving xpu memory opt (#10358 )	2024-03-11 15:21:22 +08:00
Ruonan Wang	be29833b2b	LLM: fix qwen2 (#10356 )	2024-03-11 09:29:08 +08:00
Zhicun	9026c08633	Fix llamaindex AutoTokenizer bug (#10345 ) * fix tokenizer * fix AutoTokenizer bug * modify code style	2024-03-08 16:24:50 +08:00
Zhicun	2a10b53d73	rename docqa.py->rag.py (#10353 )	2024-03-08 16:07:09 +08:00
Keyan (Kyrie) Zhang	f1825d7408	Add RMSNorm unit test (#10190 )	2024-03-08 15:51:03 +08:00
Shengsheng Huang	370c52090c	Langchain readme (#10348 ) * update langchain readme * update readme * create new README * Update README_nativeint4.md	2024-03-08 14:57:24 +08:00
Keyan (Kyrie) Zhang	7a621a4db0	Fix device_map bug by raise an error when using device_map=xpu (#10340 ) * Fix device_map bug by raise an error when using device_map=xpu * Fix sync error * Fix python style * Use invalidInputError instead of invalidOperationError	2024-03-08 13:38:52 +08:00
Yishuo Wang	1ac193ba02	add rope theta argument (#10343 )	2024-03-07 17:27:19 +08:00
Yuxuan Xia	0c8d3c9830	Add C-Eval HTML report (#10294 ) * Add C-Eval HTML report * Fix C-Eval workflow pr trigger path * Fix C-Eval workflow typos * Add permissions to C-Eval workflow * Fix C-Eval workflow typo * Add pandas dependency * Fix C-Eval workflow typo	2024-03-07 16:44:49 +08:00
Cengguang Zhang	496d18ab6d	LLM: add quantize kv cache support for baichuan 7b and 13b. (#10330 ) * add quantize kv cache for baichuan 7b and 13b. * fix typo. * fix. * fix style. * fix style.	2024-03-07 16:17:38 +08:00
hxsz1997	b7db21414e	Update llamaindex ut (#10338 ) * add test_llamaindex of gpu * add llamaindex gpu tests bash * add llamaindex cpu tests bash * update name of Run LLM langchain GPU test * import llama_index in llamaindex gpu ut * update the dependency of test_llamaindex * add Run LLM llamaindex GPU test * modify import dependency of llamaindex cpu test * add Run LLM llamaindex test * update llama_model_path * delete unused model path * add LLAMA2_7B_ORIGIN_PATH in llamaindex cpu test	2024-03-07 10:06:16 +08:00
ZehuaCao	267de7abc3	fix fschat DEP version error (#10325 )	2024-03-06 16:15:27 +08:00
Yina Chen	9ea499ca68	Optimize speculative decoding PVC memory usage (#10329 ) * optimize memory * update * update * update * support other models * update * fix style	2024-03-06 09:54:21 +08:00
dingbaorong	cc796848ea	fix typos (#10274 ) Co-authored-by: Ariadne <wyn2000330@126.com>	2024-03-05 18:38:22 +08:00
hxsz1997	af11c53473	Add the installation step of postgresql and pgvector on windows in LlamaIndex GPU support (#10328 ) * add the installation of postgresql and pgvector of windows * fix some format	2024-03-05 18:31:19 +08:00
Yishuo Wang	0011ff9f64	optimize bge large performance (#10324 )	2024-03-05 17:06:03 +08:00
Shaojun Liu	178eea5009	upload bigdl-llm wheel to sourceforge for backup (#10321 ) * test: upload to sourceforge * update scripts * revert	2024-03-05 16:36:01 +08:00
Cengguang Zhang	30d009bca7	LLM: support quantized kv cache for Mistral in transformers >=4.36.0 (#10326 ) * support quantize kv for mistral in transformers 4.36 * update mistral support. * fix style.	2024-03-05 16:23:50 +08:00
dingbaorong	1e6f0c6f1a	Add llamaindex gpu example (#10314 ) * add llamaindex example * fix core dump * refine readme * add trouble shooting * refine readme --------- Co-authored-by: Ariadne <wyn2000330@126.com>	2024-03-05 13:36:00 +08:00
dingbaorong	fc7f10cd12	add langchain gpu example (#10277 ) * first draft * fix * add readme for transformer_int4_gpu * fix doc * check device_map * add arc ut test * fix ut test * fix langchain ut * Refine README * fix gpu mem too high * fix ut test --------- Co-authored-by: Ariadne <wyn2000330@126.com>	2024-03-05 13:33:57 +08:00
Yuwen Hu	5dbbe1a826	[LLM] Support for new arc ut runner (#10311 ) * Support for new arc ut runner * Comment unnecessary OMP_NUM_THREADS related settings for arc uts	2024-03-04 18:42:02 +08:00
Yuwen Hu	d45e577d8c	[LLM] Test `load_low_bit` in iGPU perf test on Windows (#10313 )	2024-03-04 18:03:57 +08:00
WeiguangHan	fd81d66047	LLM: Compress some models to save space (#10315 ) * LLM: compress some models to save space * add deleted comments	2024-03-04 17:53:03 +08:00
Shaojun Liu	bab2ee5f9e	update nightly spr perf test (#10178 ) * update nightly spr perf test * update * update runner lable * update * update * update folder * revert	2024-03-04 13:46:33 +08:00
Cengguang Zhang	ab9fc2485f	LLM: add quantize kv support for llama transformer 4.36 (#10298 ) * add quantize kv support for llama transformer 4.36 * fix style. * fix style.	2024-03-04 10:33:35 +08:00
Xin Qiu	58208a5883	Update FAQ document. (#10300 ) * Update install_gpu.md * Update resolve_error.md * Update README.md * Update resolve_error.md * Update README.md * Update resolve_error.md	2024-03-04 08:35:11 +08:00
Yuwen Hu	27d9a14989	[LLM] all-on-one update: memory optimize and streaming output (#10302 ) * Memory saving for continous in-out pair run and add support for streaming output on MTL iGPU * Small fix * Small fix * Add things back	2024-03-01 18:02:30 +08:00
SONG Ge	0ab40917fb	[LLM] Split merged_qk to separated q/k linear (#10299 ) * modify merge_qk_linear to separated q/k linear * update	2024-03-01 16:48:55 +08:00
Yang Wang	f4d7dbcde2	use fused qkv forward in qwen2 (#10185 ) * use fused qkv forward in qwen2 * support both * fix style * fix rope * remove pring * fix style * clean up	2024-03-01 16:46:35 +08:00
Xin Qiu	509e206de0	update doc about gemma random and unreadable output. (#10297 ) * Update install_gpu.md * Update README.md * Update README.md	2024-03-01 15:41:16 +08:00
Wang, Jian4	beb9433cec	LLM: Reduce speculative _ipex_optimize_model memory use (#10281 ) * use tpp * update ipex	2024-03-01 13:48:23 +08:00
Yuwen Hu	f0ff0eebe1	[LLM] Support quantize kv cache for Baichuan2 7B (#10280 ) * Add quatized kv cache framework for Baichuan2 7B * Support quantize kv cache for baichuan2 * Small fix * Fix python style	2024-03-01 13:35:42 +08:00
SONG Ge	273de341d7	hot-fix silu error import (#10292 )	2024-03-01 10:11:37 +08:00
Shengsheng Huang	bcfad555df	revise llamaindex readme (#10283 )	2024-02-29 17:19:23 +08:00
Xin Qiu	232273a1b5	Enable Gemma fused mlp + Gelu (#10276 ) * update llama mlp forward * add all * fix style check * split * update * update * update * fix style	2024-02-29 16:53:24 +08:00
Guancheng Fu	2d930bdca8	Add vLLM bf16 support (#10278 ) * add argument load_in_low_bit * add docs * modify gpu doc * done --------- Co-authored-by: ivy-lv11 <lvzc@lamda.nju.edu.cn>	2024-02-29 16:33:42 +08:00
SONG Ge	13b0bc9075	[LLM] Add quantize_kv optimization for yuan2 model (#10243 ) * add initial quantize_kv support for yuan2 model * fix yuan2 quantize_kv generation * apply fp16 conv layer optimizations * disable mlp for quantize_kv	2024-02-29 16:33:26 +08:00
Zhicun	4e6cc424f1	Add LlamaIndex RAG (#10263 ) * run demo * format code * add llamaindex * add custom LLM with bigdl * update * add readme * begin ut * add unit test * add license * add license * revised * update * modify docs * remove data folder * update * modify prompt * fixed * fixed * fixed	2024-02-29 15:21:19 +08:00
Jin Qiao	5d7243067c	LLM: add Baichuan2-13B-Chat 2048-256 to MTL perf (#10273 )	2024-02-29 13:48:55 +08:00
Ruonan Wang	a9fd20b6ba	LLM: Update qkv fusion for GGUF-IQ2 (#10271 ) * first commit * update mistral * fix transformers==4.36.0 * fix * disable qk for mixtral now * fix style	2024-02-29 12:49:53 +08:00
Jiao Wang	6fb65bb9d2	fix in transformers 4.36 (#10150 )	2024-02-28 18:43:01 -08:00
Shengsheng Huang	43dac97e03	Update README.md (#10260 )	2024-02-29 10:41:14 +08:00
Ruonan Wang	4b08bc1417	LLM: relax batch check of flash atttention by double check attention mask (#10270 ) * relax batch check * fix * fix style	2024-02-29 09:39:55 +08:00
Yina Chen	07f36fbfcc	Fix gptj failed to extend (#10269 )	2024-02-29 09:39:27 +08:00
Yishuo Wang	cccb02dad1	fix baichuan2 13b 2k input (#10267 )	2024-02-28 17:20:20 +08:00
Heyang Sun	7244fd1ba5	Fix Arc StarCoder wrong query_shape when input is long (#10268 ) * Fix Arc StarCoder wrong query_shape when input is long * Update gptbigcode.py	2024-02-28 17:07:08 +08:00
Cengguang Zhang	a4de3095f3	LLM: Support quantize kv cache in mistral. (#10261 ) * init * update quantize kv.	2024-02-28 14:08:08 +08:00
Shengsheng Huang	db0d129226	Revert "Add rwkv example (#9432 )" (#10264 ) This reverts commit `6930422b42`.	2024-02-28 11:48:31 +08:00
Yining Wang	6930422b42	Add rwkv example (#9432 ) * codeshell fix wrong urls * restart runner * add RWKV CPU & GPU example (rwkv-4-world-7b) * restart runner * update submodule * fix runner * runner-test --------- Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>	2024-02-28 11:41:00 +08:00
Keyan (Kyrie) Zhang	59861f73e5	Add Deepseek-6.7B (#9991 ) * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * modify deepseek * modify deepseek * Add verified model in README * Turn cpu_embedding=True in Deepseek example --------- Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>	2024-02-28 11:36:39 +08:00
Yuxuan Xia	2524273198	Update AutoGen README (#10255 ) * Update AutoGen README * Fix AutoGen README typos * Update AutoGen README * Update AutoGen README	2024-02-28 11:34:45 +08:00
Zheng, Yi	2347f611cf	Add cpu and gpu examples of Mamba (#9797 ) * Add mamba cpu example * Add mamba gpu example * Use a smaller model as the example * minor fixes --------- Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>	2024-02-28 11:33:29 +08:00
Zhao Changmin	937e1f7c74	rebase (#9104 ) Co-authored-by: leonardozcm <leonardozcm@gmail.com>	2024-02-28 11:18:21 +08:00
JunX	4833067489	fix GPU example link in README.md (#9533 ) * fix GPU example link in README.md * fix GPU links in llm README.md	2024-02-28 11:13:18 +08:00
Zhicun	308e637d0d	Add DeepSeek-MoE-16B-Chat (#10155 ) * dsmoe-hf add * add dsmoe pytorch * update README * modify comment * remove GPU example * update model name * format code	2024-02-28 10:12:09 +08:00
Guoqiong Song	f4a2e32106	Stream llm example for both GPU and CPU (#9390 )	2024-02-27 15:54:47 -08:00
Yang Wang	c581c6db30	draft mmint4 (#10031 ) change to llm.cpp support transposed format revert implement qkv fuse fix style change to vertically pack change to enable_xetla fix mlp_fusion_check remove comments address comments add some comments fix style	2024-02-27 14:55:16 -08:00
hxsz1997	cba61a2909	Add html report of ppl (#10218 ) * remove include and language option, select the corresponding dataset based on the model name in Run * change the nightly test time * change the nightly test time of harness and ppl * save the ppl result to json file * generate csv file and print table result * generate html * modify the way to get parent folder * update html in parent folder * add llm-ppl-summary and llm-ppl-summary-html * modify echo single result * remove download fp16.csv * change model name of PR * move ppl nightly related files to llm/test folder * reformat * seperate make_table from make_table_and_csv.py * separate make_csv from make_table_and_csv.py * update llm-ppl-html * remove comment * add Download fp16.results	2024-02-27 17:37:08 +08:00
Zhicun	6d60982746	Env script: add license (#10257 ) * env script * update README.md * modify README * modify cpu info output * add env-check.sh * add env-check.bat * add windows * modify bat * add license	2024-02-27 15:29:20 +08:00
Yishuo Wang	b4fa4ab46f	optimize yuan 2.0 again (#10252 )	2024-02-27 14:51:42 +08:00
Zhicun	03b9c4930a	UX: Script to print env info (#10088 ) * env script * update README.md * modify README * modify cpu info output * add env-check.sh * add env-check.bat * add windows * modify bat	2024-02-27 14:45:36 +08:00
Keyan (Kyrie) Zhang	843fe546b0	Add CPU and GPU examples for DeciLM-7B (#9867 ) * Add cpu and gpu examples for DeciLM-7B * Add cpu and gpu examples for DeciLM-7B * Add DeciLM-7B to README table * modify deciLM * modify deciLM * modify deciLM * Add verified model in README * Add cpu_embedding=True	2024-02-27 13:15:49 +08:00

1 2 3 4 5 ...

1085 commits