ipex-llm

Author	SHA1	Message	Date
Wenjing Margaret Mao	50dfcaa8fa	Update llm-ppl-evaluation.yml -- Update llm-ppl-evaluation.yml -- Update HTML file: change from ppl/update_in_parent_folder into harness/update_in_parent_folder ppl test and harness test are using the same update_in_parent_folder file. To reduce the repetition, change the ppl update HTML file to the same one under the harness folder and delete the HTML file under the ppl folder.	2024-04-11 07:15:18 +08:00
Cengguang Zhang	4b024b7aac	LLM: optimize chatglm2 8k input. (#10723 ) * LLM: optimize chatglm2 8k input. * rename.	2024-04-10 16:59:06 +08:00
Yuxuan Xia	cd22cb8257	Update Env check Script (#10709 ) * Update env check bash file * Update env-check	2024-04-10 15:06:00 +08:00
Shaojun Liu	29bf28bd6f	Upgrade python to 3.11 in Docker Image (#10718 ) * install python 3.11 for cpu-inference docker image * update xpu-inference dockerfile * update cpu-serving image * update qlora image * update lora image * update document	2024-04-10 14:41:27 +08:00
Qiyuan Gong	b727767f00	Add axolotl v0.3.0 with ipex-llm on Intel GPU (#10717 ) * Add axolotl v0.3.0 support on Intel GPU. * Add finetune example on llama-2-7B with Alpaca dataset.	2024-04-10 14:38:29 +08:00
Shengsheng Huang	0ccd7bfca9	revise quickstart (#10721 )	2024-04-10 14:24:53 +08:00
yb-peng	a81f9e61a6	Revise open_webui_with_ollama_quickstart.md (#10720 )	2024-04-10 14:04:13 +08:00
Wang, Jian4	c9e6d42ad1	LLM: Fix chatglm3-6b-32k error (#10719 ) * fix chatglm3-6b-32k * update style	2024-04-10 11:24:06 +08:00
Keyan (Kyrie) Zhang	585c174e92	Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707 ) * Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables. * Fix style	2024-04-10 10:48:46 +08:00
Jiao Wang	d1eaea509f	update chatglm readme (#10659 )	2024-04-09 14:24:46 -07:00
Jiao Wang	878a97077b	Fix llava example to support transformerds 4.36 (#10614 ) * fix llava example * update	2024-04-09 13:47:07 -07:00
Jiao Wang	1e817926ba	Fix low memory generation example issue in transformers 4.36 (#10702 ) * update cache in low memory generate * update	2024-04-09 09:56:52 -07:00
Shengsheng Huang	6e7da0d92c	small fix in document	2024-04-09 23:04:26 +08:00
Shengsheng Huang	8924dbc3f9	revise open webui quickstart and some indexes (#10715 ) * update readme * update openwebui readme and update index	2024-04-09 22:44:03 +08:00
Yuwen Hu	a0244527aa	Small updates to langchain-chatchat quickstart readme (#10714 )	2024-04-09 19:37:41 +08:00
Yuwen Hu	fde6ab50d0	Further fix to python 3.11 document (#10712 )	2024-04-09 19:13:01 +08:00
yb-peng	447f48499a	Init commit of open-webui quickstart (#10682 ) * init commit of open-webui quickstart * add links into open-webui quickstart * Update open_webui_with_ollama_quickstart.md	2024-04-09 18:21:42 +08:00
Yuwen Hu	97db2492c8	Update setup.py for `bigdl-core-xe-esimd-21` on Windows (#10705 ) * Support bigdl-core-xe-esimd-21 for windows in setup.py * Update setup-llm-env accordingly	2024-04-09 18:21:21 +08:00
Zhicun	b4147a97bb	Fix dtype mismatch error (#10609 ) * fix llama * fix * fix code style * add torch type in model.py --------- Co-authored-by: arda <arda@arda-arc19.sh.intel.com>	2024-04-09 17:50:33 +08:00
Shaojun Liu	f37a1f2a81	Upgrade to python 3.11 (#10711 ) * create conda env with python 3.11 * recommend to use Python 3.11 * update	2024-04-09 17:41:17 +08:00
Yishuo Wang	8f45e22072	fix llama2 (#10710 )	2024-04-09 17:28:37 +08:00
Shaojun Liu	e10040b7f1	upgrade to python 3.11 (#10695 )	2024-04-09 17:04:42 +08:00
Yishuo Wang	e438f941f2	disable rwkv5 fp16 (#10699 )	2024-04-09 16:42:11 +08:00
Cengguang Zhang	6a32216269	LLM: add llama2 8k input example. (#10696 ) * LLM: add llama2-32K example. * refactor name. * fix comments. * add IPEX_LLM_LOW_MEM notes and update sample output.	2024-04-09 16:02:37 +08:00
Wenjing Margaret Mao	289cc99cd6	Update README.md (#10700 ) Edit "summarize the results"	2024-04-09 16:01:12 +08:00
Jason Dai	3e4fbee87c	Update readme & quickstart (#10685 )	2024-04-09 15:59:17 +08:00
Ikko Eltociear Ashimine	39ff586454	docs: update README.md (#10662 ) inital -> initial	2024-04-09 15:55:57 +08:00
Wenjing Margaret Mao	d3116de0db	Update README.md (#10701 ) edit "summarize the results"	2024-04-09 15:50:25 +08:00
Chen, Zhentao	d59e0cce5c	Migrate harness to ipexllm (#10703 ) * migrate to ipexlm * fix workflow * fix run_multi * fix precision map * rename ipexlm to ipexllm * rename bigdl to ipex in comments	2024-04-09 15:48:53 +08:00
yb-peng	8cf26d8d08	Update ollama_quickstart.md (#10708 )	2024-04-09 15:47:41 +08:00
Keyan (Kyrie) Zhang	1e27e08322	Modify example from fp32 to fp16 (#10528 ) * Modify example from fp32 to fp16 * Remove Falcon from fp16 example for now * Remove MPT from fp16 example	2024-04-09 15:45:49 +08:00
binbin Deng	44922bb5c2	LLM: support baichuan2-13b using AutoTP (#10691 )	2024-04-09 14:06:01 +08:00
Yina Chen	c7422712fc	mistral 4.36 use fp16 sdp (#10704 )	2024-04-09 13:50:33 +08:00
Ovo233	dcb2038aad	Enable optimization for sentence_transformers (#10679 ) * enable optimization for sentence_transformers * fix python style check failure	2024-04-09 12:33:46 +08:00
Zhicun	f03c029914	pydantic version>=2.0.0 for llamaindex (#10694 ) * pydantic version * pydantic version * upgrade version	2024-04-09 09:48:42 +08:00
Yang Wang	5a1f446d3c	support fp8 in xetla (#10555 ) * support fp8 in xetla * change name * adjust model file * support convert back to cpu * factor * fix bug * fix style	2024-04-08 13:22:09 -07:00
jenniew	591bae092c	combine english and chinese, remove nan	2024-04-08 19:37:51 +08:00
Cengguang Zhang	7c43ac0164	LLM: optimize llama natvie sdp for split qkv tensor (#10693 ) * LLM: optimize llama natvie sdp for split qkv tensor. * fix block real size. * fix comment. * fix style. * refactor.	2024-04-08 17:48:11 +08:00
Xin Qiu	1274cba79b	stablelm fp8 kv cache (#10672 ) * stablelm fp8 kvcache * update * fix * change to fp8 matmul * fix style * fix * fix * meet code review * add comment	2024-04-08 15:16:46 +08:00
Yishuo Wang	65127622aa	fix UT threshold (#10689 )	2024-04-08 14:58:20 +08:00
Cengguang Zhang	c0cd238e40	LLM: support llama2 8k input with w4a16. (#10677 ) * LLM: support llama2 8k input with w4a16. * fix comment and style. * fix style. * fix comments and split tensor to quantized attention forward. * fix style. * refactor name. * fix style. * fix style. * fix style. * refactor checker name. * refactor native sdp split qkv tensor name. * fix style. * fix comment rename variables. * fix co-exist of intermedia results.	2024-04-08 11:43:15 +08:00
Shaojun Liu	db7c5cb78f	update model path for spr perf test (#10687 ) * update model path for spr perf test * revert	2024-04-08 10:21:56 +08:00
Zhicun	321bc69307	Fix llamaindex ut (#10673 ) * fix llamaindex ut * add GPU ut	2024-04-08 09:47:51 +08:00
Keyan (Kyrie) Zhang	a11b708135	Modify the .md link in chatchat readthedoc (#10681 )	2024-04-07 16:33:32 +08:00
yb-peng	2d88bb9b4b	add test api transformer_int4_fp16_gpu (#10627 ) * add test api transformer_int4_fp16_gpu * update config.yaml and README.md in all-in-one * modify run.py in all-in-one * re-order test-api * re-order test-api in config * modify README.md in all-in-one * modify README.md in all-in-one * modify config.yaml --------- Co-authored-by: pengyb2001 <arda@arda-arc21.sh.intel.com> Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>	2024-04-07 15:47:17 +08:00
Wang, Jian4	47cabe8fcc	LLM: Fix no return_last_logit running bigdl_ipex chatglm3 (#10678 ) * fix no return_last_logits * update only for chatglm	2024-04-07 15:27:58 +08:00
Shengsheng Huang	33f90beda0	fix quickstart docs (#10676 )	2024-04-07 14:26:59 +08:00
Wang, Jian4	9ad4b29697	LLM: CPU benchmark using tcmalloc (#10675 )	2024-04-07 14:17:01 +08:00
binbin Deng	d9a1153b4e	LLM: upgrade deepspeed in AutoTP on GPU (#10647 )	2024-04-07 14:05:19 +08:00
Jin Qiao	56dfcb2ade	Migrate portable zip to ipex-llm (#10617 ) * change portable zip prompt to ipex-llm * fix chat with ui * add no proxy	2024-04-07 13:58:58 +08:00

1 2 3 4 5 ...

2680 commits