ipex-llm

Author	SHA1	Message	Date
Ch1y0q	17c23cd759	add llama3.2 GPU example (#12137 ) * add llama3.2 GPU example * change prompt format reference url * update * add Meta-Llama-3.2-1B-Instruct sample output * update wording	2024-09-29 14:41:54 +08:00
Yuwen Hu	f71b38a994	Update MiniCPM_V_26 GPU example with save & load (#12127 )	2024-09-26 17:40:22 +08:00
Yishuo Wang	669ff1a97b	fix sd1.5 (#12129 )	2024-09-26 17:15:16 +08:00
Yishuo Wang	a266528719	optimize llama 3.2 rope (#12128 )	2024-09-26 16:08:10 +08:00
Yishuo Wang	584c3489e7	add basic support for llama3.2 (#12125 )	2024-09-26 15:46:19 +08:00
Yishuo Wang	66f419f8b7	fix qwen2 vl (#12126 )	2024-09-26 15:44:02 +08:00
Ch1y0q	2ea13d502f	Add minicpm3 gpu example (#12114 ) * add minicpm3 gpu example * update GPU example * update --------- Co-authored-by: Huang, Xinshengzi <xinshengzi.huang@intel.com>	2024-09-26 13:51:37 +08:00
Yishuo Wang	77af9bc5fa	support passing None to low_bit in optimize_model (#12121 )	2024-09-26 11:09:35 +08:00
Yishuo Wang	47e0b83cbf	optimize sd 1.5 (#12119 )	2024-09-25 15:45:13 +08:00
Jin, Qiao	2bedb17be7	Add Qwen2.5 NPU Example (#12110 ) * Add Qwen2.5 NPU Example * fix * Merge qwen2.py and qwen2.5.py into qwen.py * Fix description	2024-09-25 15:20:03 +08:00
Shaojun Liu	657889e3e4	use english prompt by default (#12115 )	2024-09-24 17:40:50 +08:00
Yishuo Wang	5d63aef60b	optimize qwen2 vl again (#12109 )	2024-09-23 13:22:01 +08:00
Ruonan Wang	03bd01c99c	optimize npu qwen2 (#12107 )	2024-09-20 19:46:16 +08:00
Jinhe	02399021d6	add npu load_low_bit api in all-in-one benchmark (#12103 )	2024-09-20 17:56:08 +08:00
Yuwen Hu	47a9597f24	Add missing link for Qwen2.5 to CN-ZH readme (#12106 )	2024-09-20 17:30:30 +08:00
Yishuo Wang	9239fd4f12	add basic support and optimization for qwen2-vl (#12104 )	2024-09-20 17:23:06 +08:00
Yuwen Hu	828fa01ad3	[NPU] Add `mixed_precision` for Qwen2 7B (#12098 ) * Add mix_precision argument to control whether use INT8 lm_head for Qwen2-7B-Instruct * Small fix * Fixed on load low bit with mixed precision * Small fix * Update example accordingly * Update for default prompt * Update base on comments * Final fix	2024-09-20 16:36:21 +08:00
Ch1y0q	2269768e71	add internvl2 example (#12102 ) * add internvl2 example * add to README.md * update * add link to zh-CN readme	2024-09-20 16:31:54 +08:00
joan726	ad1fe77fe6	Add language switching (#12096 )	2024-09-20 16:05:20 +08:00
Ruonan Wang	09b8c80d9d	update code for NPU qwen2 (#12094 ) * update code * fix	2024-09-20 15:58:32 +08:00
Jin, Qiao	db7500bfd4	Add Qwen2.5 GPU example (#12101 ) * Add Qwen2.5 GPU example * fix end line * fix description	2024-09-20 15:55:57 +08:00
Guancheng Fu	b36359e2ab	Fix xpu serving image oneccl (#12100 )	2024-09-20 15:25:41 +08:00
Yishuo Wang	54b973c744	fix ipex_llm import in transformers 4.45 (#12099 )	2024-09-20 15:24:59 +08:00
Guancheng Fu	a6cbc01911	Use new oneccl for ipex-llm serving image (#12097 )	2024-09-20 14:52:49 +08:00
Shaojun Liu	1295898830	update vllm_online_benchmark script to support long input (#12095 ) * update vllm_online_benchmark script to support long input * update guide	2024-09-20 14:18:30 +08:00
Ch1y0q	9650bf616a	add `transpose_value_cache` for NPU benchmark (#12092 ) * add `transpose_value_cache` * update * update	2024-09-19 18:45:05 +08:00
Yuwen Hu	f7fb3c896c	Update lm_head optimization for Qwen2 7B (#12090 )	2024-09-18 17:02:02 +08:00
Xu, Shuo	ee33b93464	Longbench: NV code to ipex-llm (#11662 ) * add nv longbench * LongBench: NV code to ipex-llm * ammend * add more models support * ammend * optimize LongBench's user experience * ammend * ammend * fix typo * ammend * remove cuda related information & add a readme * add license to python scripts & polish the readme * ammend * ammend --------- Co-authored-by: cyita <yitastudy@gmail.com> Co-authored-by: ATMxsp01 <shou.xu@intel.com> Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2024-09-18 15:55:14 +08:00
Wang, Jian4	40e463c66b	Enable vllm load gptq model (#12083 ) * enable vllm load gptq model * update * update * update * update style	2024-09-18 14:41:00 +08:00
Xiangyu Tian	c2774e1a43	Update oneccl to 0.0.3 in serving-xpu image (#12088 )	2024-09-18 14:29:17 +08:00
Ruonan Wang	081af41def	[NPU] Optimize Qwen2 lm_head to use INT4 (#12072 ) * temp save * update * fix * fix * Split lm_head into 7 parts & remove int8 for lm_head when sym_int4 * Simlify and add condition to code * Small fix * refactor some code * fix style * fix style * fix style * fix * fix * temp sav e * refactor * fix style * further refactor * simplify code * meet code review * fix style --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-09-14 15:26:46 +08:00
joan726	18714ceac7	Update README.md (#12084 ) Modify vLLM related links	2024-09-14 15:24:08 +08:00
Ch1y0q	b4b8c3e495	add `lowbit_path` for `generate.py`, fix `npu_model` (#12077 ) * add `lowbit_path` for `generate.py`, fix `npu_model` * update `README.md`	2024-09-13 17:28:05 +08:00
Wang, Jian4	d703e4f127	Enable vllm multimodal minicpm-v-2-6 (#12074 ) * enable minicpm-v-2-6 * add image_url readme	2024-09-13 13:28:35 +08:00
Ruonan Wang	a767438546	fix typo (#12076 ) * fix typo * fix	2024-09-13 11:44:42 +08:00
Ruonan Wang	3f0b24ae2b	update cpp quickstart (#12075 ) * update cpp quickstart * fix style	2024-09-13 11:35:32 +08:00
Shaojun Liu	9b4fee8b5b	disable nightly release for finetune images (#12070 )	2024-09-12 15:10:50 +08:00
Shaojun Liu	beb876665d	pin gradio version to fix connection error (#12069 )	2024-09-12 14:36:09 +08:00
Ruonan Wang	48d9092b5a	upgrade OneAPI version for cpp Windows (#12063 ) * update version * update quickstart	2024-09-12 11:12:12 +08:00
Jinhe	e78e45ee01	update NPU readme: run conhost as administrator (#12066 )	2024-09-11 17:54:04 +08:00
Jinhe	4ca330da15	Fix NPU load error message and add minicpm npu lowbit feat (#12064 ) * fix npu_model raise sym_int4 error * add load_lowbit * remove print&perf	2024-09-11 16:56:35 +08:00
Jinhe	32e8362da7	added minicpm cpu examples (#12027 ) * minicpm cpu examples * add link for minicpm-2	2024-09-11 15:51:21 +08:00
Ruonan Wang	a0c73c26d8	clean NPU code (#12060 ) * clean code * remove time.perf_counter()	2024-09-11 15:10:35 +08:00
Wang, Jian4	c75f3dd874	vllm no padding glm4 to avoid nan error (#12062 ) * no padding glm4 * add codegeex	2024-09-11 13:44:40 +08:00
Chu,Youcheng	649390c464	fix: textual and env variable adjustment (#12038 )	2024-09-11 13:38:01 +08:00
Yuwen Hu	c94032f97e	Try to fix llamaindex ut again (#12061 )	2024-09-11 12:11:04 +08:00
Shaojun Liu	7e1e51d91a	Update vllm setting (#12059 ) * revert * update * update * update	2024-09-11 11:45:08 +08:00
Wang, Jian4	30a8680645	Update for vllm one card padding (#12058 )	2024-09-11 10:52:55 +08:00
Zijie Li	c5fdfde1bd	fix npu-model prompt (#12057 )	2024-09-11 10:06:45 +08:00
Yuwen Hu	94dade9aca	Fix UT of ipex_llm.llamaindex (#12055 )	2024-09-11 09:58:43 +08:00

1 2 3 4 5 ...

3492 commits