ipex-llm

Author	SHA1	Message	Date
Jean Yu	ab476c7fe2	Eagle Speculative Sampling examples (#11104 ) * Eagle Speculative Sampling examples * rm multi-gpu and ray content * updated README to include Arc A770	2024-05-24 11:13:43 -07:00
Guancheng Fu	fabc395d0d	add langchain vllm interface (#11121 ) * done * fix * fix * add vllm * add langchain vllm exampels * add docs * temp	2024-05-24 17:19:27 +08:00
ZehuaCao	63e95698eb	[LLM]Reopen autotp generate_stream (#11120 ) * reopen autotp generate_stream * fix style error * update	2024-05-24 17:16:14 +08:00
Yishuo Wang	1dc680341b	fix phi-3-vision import (#11129 )	2024-05-24 15:57:15 +08:00
Guancheng Fu	7f772c5a4f	Add half precision for fastchat models (#11130 )	2024-05-24 15:41:14 +08:00
Zhao Changmin	65f4212f89	Fix qwen 14b run into register attention fwd (#11128 ) * fix qwen 14b	2024-05-24 14:45:07 +08:00
Shaojun Liu	373f9e6c79	add ipex-llm-init.bat for Windows (#11082 ) * add ipex-llm-init.bat for Windows * update setup.py	2024-05-24 14:26:25 +08:00
Qiyuan Gong	120a0035ac	Fix type mismatch in eval for Baichuan2 QLora example (#11117 ) * During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.	2024-05-24 14:14:30 +08:00
Yishuo Wang	1db9d9a63b	optimize internlm2 xcomposer agin (#11124 )	2024-05-24 13:44:52 +08:00
Yishuo Wang	9372ce87ce	fix internlm xcomposer2 fp16 (#11123 )	2024-05-24 11:03:31 +08:00
Cengguang Zhang	011b9faa5c	LLM: unify baichuan2-13b alibi mask dtype with model dtype. (#11107 ) * LLM: unify alibi mask dtype. * fix comments.	2024-05-24 10:27:53 +08:00
Jiao Wang	0a06a6e1d4	Update tests for transformers 4.36 (#10858 ) * update unit test * update * update * update * update * update * fix gpu attention test * update * update * update * update * update * update * update example test * replace replit code * update * update * update * update * set safe_serialization false * perf test * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * delete * update * update * update * update * update * update * revert * update	2024-05-24 10:26:38 +08:00
Xiangyu Tian	b3f6faa038	LLM: Add CPU vLLM entrypoint (#11083 ) Add CPU vLLM entrypoint and update CPU vLLM serving example.	2024-05-24 09:16:59 +08:00
Yishuo Wang	797dbc48b8	fix phi-2 and phi-3 convert (#11116 )	2024-05-23 17:37:37 +08:00
Yishuo Wang	37b98a531f	support running internlm xcomposer2 on gpu and add sdp optimization (#11115 )	2024-05-23 17:26:24 +08:00
Zhao Changmin	c5e8b90c8d	Add Qwen register attention implemention (#11110 ) * qwen_register	2024-05-23 17:17:45 +08:00
Yishuo Wang	0e53f20edb	support running internlm-xcomposer2 on cpu (#11111 )	2024-05-23 16:36:09 +08:00
Yuwen Hu	d36b41d59e	Add `setuptools` limitation for `ipex-llm[xpu]` (#11102 ) * Add setuptool limitation for ipex-llm[xpu] * llamaindex option update	2024-05-22 18:20:30 +08:00
Yishuo Wang	cd4dff09ee	support phi-3 vision (#11101 )	2024-05-22 17:43:50 +08:00
Zhao Changmin	15d906a97b	Update linux igpu run script (#11098 ) * update run script	2024-05-22 17:18:07 +08:00
Kai Huang	f63172ef63	Align ppl with llama.cpp (#11055 ) * update script * remove * add header * update readme	2024-05-22 16:43:11 +08:00
Qiyuan Gong	f6c9ffe4dc	Add WANDB_MODE and HF_HUB_OFFLINE to XPU finetune README (#11097 ) * Add WANDB_MODE=offline to avoid multi-GPUs finetune errors. * Add HF_HUB_OFFLINE=1 to avoid Hugging Face related errors.	2024-05-22 15:20:53 +08:00
Shaojun Liu	584439e498	update homepage url for ipex-llm (#11094 ) * update homepage url * Update python version to 3.11 * Update long description	2024-05-22 11:10:44 +08:00
Xin Qiu	71bcd18f44	fix qwen vl (#11090 )	2024-05-21 18:40:29 +08:00
Yishuo Wang	f00625f9a4	refactor qwen2 (#11087 )	2024-05-21 16:53:42 +08:00
Qiyuan Gong	492ed3fd41	Add verified models to GPU finetune README (#11088 ) * Add verified models to GPU finetune README	2024-05-21 15:49:15 +08:00
Qiyuan Gong	1210491748	ChatGLM3, Baichuan2 and Qwen1.5 QLoRA example (#11078 ) * Add chatglm3, qwen15-7b and baichuan-7b QLoRA alpaca example * Remove unnecessary tokenization setting.	2024-05-21 15:29:43 +08:00
ZehuaCao	842d6dfc2d	Further Modify CPU example (#11081 ) * modify CPU example * update	2024-05-21 13:55:47 +08:00
Yishuo Wang	d830a63bb7	refactor qwen (#11074 )	2024-05-20 18:08:37 +08:00
Wang, Jian4	74950a152a	Fix tgi_api_server error file name (#11075 )	2024-05-20 16:48:40 +08:00
Yishuo Wang	4e97047d70	fix baichuan2 13b fp16 (#11071 )	2024-05-20 11:21:20 +08:00
binbin Deng	7170dd9192	Update guide for running qwen with AutoTP (#11065 )	2024-05-20 10:53:17 +08:00
Wang, Jian4	a2e1578fd9	Merge tgi_api_server to main (#11036 ) * init * fix style * speculative can not use benchmark * add tgi server readme	2024-05-20 09:15:03 +08:00
Yishuo Wang	31ce3e0c13	refactor baichuan2-13b (#11064 )	2024-05-17 16:25:30 +08:00
ZehuaCao	56cb992497	LLM: Modify CPU Installation Command for most examples (#11049 ) * init * refine * refine * refine * modify hf-agent example * modify all CPU model example * remove readthedoc modify * replace powershell with cmd * fix repo * fix repo * update * remove comment on windows code block * update * update * update * update --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com>	2024-05-17 15:52:20 +08:00
Ruonan Wang	f1156e6b20	support gguf_q4k_m / gguf_q4k_s (#10887 ) * initial commit * UPDATE * fix style * fix style * add gguf_q4k_s * update comment * fix	2024-05-17 14:30:09 +08:00
Yishuo Wang	981d668be6	refactor baichuan2-7b (#11062 )	2024-05-17 13:01:34 +08:00
Xiangyu Tian	d963e95363	LLM: Modify CPU Installation Command for documentation (#11042 ) * init * refine * refine * refine * refine comments	2024-05-17 10:14:00 +08:00
Ruonan Wang	3a72e5df8c	disable mlp fusion of fp6 on mtl (#11059 )	2024-05-17 10:10:16 +08:00
SONG Ge	192ae35012	Add support for llama2 quantize_kv with transformers 4.38.0 (#11054 ) * add support for llama2 quantize_kv with transformers 4.38.0 * fix code style * fix code style	2024-05-16 22:23:39 +08:00
SONG Ge	16b2a418be	hotfix native_sdp ut (#11046 ) * hotfix native_sdp * update	2024-05-16 17:15:37 +08:00
Xin Qiu	6be70283b7	fix chatglm run error (#11045 ) * fix chatglm * update * fix style	2024-05-16 15:39:18 +08:00
Yishuo Wang	8cae897643	use new rope in phi3 (#11047 )	2024-05-16 15:12:35 +08:00
Jin Qiao	9a96af4232	Remove oneAPI pip install command in related examples (#11030 ) * Remove pip install command in windows installation guide * fix chatglm3 installation guide * Fix gemma cpu example * Apply on other examples * fix	2024-05-16 10:46:29 +08:00
Xiangyu Tian	612a365479	LLM: Install CPU version torch with extras [all] (#10868 ) Modify setup.py to install CPU version torch with extras [all]	2024-05-16 10:39:55 +08:00
Yishuo Wang	59df750326	Use new sdp again (#11025 )	2024-05-16 09:33:34 +08:00
SONG Ge	9942a4ba69	[WIP] Support llama2 with transformers==4.38.0 (#11024 ) * support llama2 with transformers==4.38.0 * add supprot for quantize_qkv * add original support for 4.38.0 now * code style fix	2024-05-15 18:07:00 +08:00
Yina Chen	686f6038a8	Support fp6 save & load (#11034 )	2024-05-15 17:52:02 +08:00
Ruonan Wang	ac384e0f45	add fp6 mlp fusion (#11032 ) * add fp6 fusion * add qkv fusion for fp6 * remove qkv first	2024-05-15 17:42:50 +08:00
Wang, Jian4	2084ebe4ee	Enable fastchat benchmark latency (#11017 ) * enable fastchat benchmark * add readme * update readme * update	2024-05-15 14:52:09 +08:00

1 2 3 4 5 ...

1364 commits