ipex-llm

Author	SHA1	Message	Date
Guancheng Fu	b36359e2ab	Fix xpu serving image oneccl (#12100 )	2024-09-20 15:25:41 +08:00
Yishuo Wang	54b973c744	fix ipex_llm import in transformers 4.45 (#12099 )	2024-09-20 15:24:59 +08:00
Guancheng Fu	a6cbc01911	Use new oneccl for ipex-llm serving image (#12097 )	2024-09-20 14:52:49 +08:00
Shaojun Liu	1295898830	update vllm_online_benchmark script to support long input (#12095 ) * update vllm_online_benchmark script to support long input * update guide	2024-09-20 14:18:30 +08:00
Ch1y0q	9650bf616a	add `transpose_value_cache` for NPU benchmark (#12092 ) * add `transpose_value_cache` * update * update	2024-09-19 18:45:05 +08:00
Yuwen Hu	f7fb3c896c	Update lm_head optimization for Qwen2 7B (#12090 )	2024-09-18 17:02:02 +08:00
Xu, Shuo	ee33b93464	Longbench: NV code to ipex-llm (#11662 ) * add nv longbench * LongBench: NV code to ipex-llm * ammend * add more models support * ammend * optimize LongBench's user experience * ammend * ammend * fix typo * ammend * remove cuda related information & add a readme * add license to python scripts & polish the readme * ammend * ammend --------- Co-authored-by: cyita <yitastudy@gmail.com> Co-authored-by: ATMxsp01 <shou.xu@intel.com> Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2024-09-18 15:55:14 +08:00
Wang, Jian4	40e463c66b	Enable vllm load gptq model (#12083 ) * enable vllm load gptq model * update * update * update * update style	2024-09-18 14:41:00 +08:00
Xiangyu Tian	c2774e1a43	Update oneccl to 0.0.3 in serving-xpu image (#12088 )	2024-09-18 14:29:17 +08:00
Ruonan Wang	081af41def	[NPU] Optimize Qwen2 lm_head to use INT4 (#12072 ) * temp save * update * fix * fix * Split lm_head into 7 parts & remove int8 for lm_head when sym_int4 * Simlify and add condition to code * Small fix * refactor some code * fix style * fix style * fix style * fix * fix * temp sav e * refactor * fix style * further refactor * simplify code * meet code review * fix style --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-09-14 15:26:46 +08:00
joan726	18714ceac7	Update README.md (#12084 ) Modify vLLM related links	2024-09-14 15:24:08 +08:00
Ch1y0q	b4b8c3e495	add `lowbit_path` for `generate.py`, fix `npu_model` (#12077 ) * add `lowbit_path` for `generate.py`, fix `npu_model` * update `README.md`	2024-09-13 17:28:05 +08:00
Wang, Jian4	d703e4f127	Enable vllm multimodal minicpm-v-2-6 (#12074 ) * enable minicpm-v-2-6 * add image_url readme	2024-09-13 13:28:35 +08:00
Ruonan Wang	a767438546	fix typo (#12076 ) * fix typo * fix	2024-09-13 11:44:42 +08:00
Ruonan Wang	3f0b24ae2b	update cpp quickstart (#12075 ) * update cpp quickstart * fix style	2024-09-13 11:35:32 +08:00
Shaojun Liu	9b4fee8b5b	disable nightly release for finetune images (#12070 )	2024-09-12 15:10:50 +08:00
Shaojun Liu	beb876665d	pin gradio version to fix connection error (#12069 )	2024-09-12 14:36:09 +08:00
Ruonan Wang	48d9092b5a	upgrade OneAPI version for cpp Windows (#12063 ) * update version * update quickstart	2024-09-12 11:12:12 +08:00
Jinhe	e78e45ee01	update NPU readme: run conhost as administrator (#12066 )	2024-09-11 17:54:04 +08:00
Jinhe	4ca330da15	Fix NPU load error message and add minicpm npu lowbit feat (#12064 ) * fix npu_model raise sym_int4 error * add load_lowbit * remove print&perf	2024-09-11 16:56:35 +08:00
Jinhe	32e8362da7	added minicpm cpu examples (#12027 ) * minicpm cpu examples * add link for minicpm-2	2024-09-11 15:51:21 +08:00
Ruonan Wang	a0c73c26d8	clean NPU code (#12060 ) * clean code * remove time.perf_counter()	2024-09-11 15:10:35 +08:00
Wang, Jian4	c75f3dd874	vllm no padding glm4 to avoid nan error (#12062 ) * no padding glm4 * add codegeex	2024-09-11 13:44:40 +08:00
Chu,Youcheng	649390c464	fix: textual and env variable adjustment (#12038 )	2024-09-11 13:38:01 +08:00
Yuwen Hu	c94032f97e	Try to fix llamaindex ut again (#12061 )	2024-09-11 12:11:04 +08:00
Shaojun Liu	7e1e51d91a	Update vllm setting (#12059 ) * revert * update * update * update	2024-09-11 11:45:08 +08:00
Wang, Jian4	30a8680645	Update for vllm one card padding (#12058 )	2024-09-11 10:52:55 +08:00
Zijie Li	c5fdfde1bd	fix npu-model prompt (#12057 )	2024-09-11 10:06:45 +08:00
Yuwen Hu	94dade9aca	Fix UT of ipex_llm.llamaindex (#12055 )	2024-09-11 09:58:43 +08:00
Shaojun Liu	52863dd567	fix vllm_online_benchmark.py (#12056 )	2024-09-11 09:45:30 +08:00
Yishuo Wang	d8c044e79d	optimize minicpm3 kv cache (#12052 )	2024-09-10 16:51:21 +08:00
Wang, Jian4	5d3ab16a80	Add vllm glm and baichuan padding (#12053 )	2024-09-10 15:57:28 +08:00
Guancheng Fu	69c8d36f16	Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042 ) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * Remove duplicate layer * LLM: Update vLLM to v0.5.4 (#11746) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * update 0.5.4 api_server * add dockerfile * fix * fix * refine * fix --------- Co-authored-by: gc-fu <guancheng.fu@intel.com> * Add vllm-0.5.4 Dockerfile (#11838) * Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957) * Fix vLLM not convert issues (#11817) (#11918) * Fix not convert issues * refine Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com> * Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969) * init * update mlp forward * fix minicpm error in vllm 0.5.4 * fix dependabot alerts (#12008) * Update 0.5.4 dockerfile (#12021) * Add vllm awq loading logic (#11987) * [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error * Enable Qwen padding mlp to 256 to support batch_forward (#12030) * Enable padding mlp * padding to 256 * update style * Install 27191 runtime in 0.5.4 docker image (#12040) * fix rebase error * fix rebase error * vLLM: format for 0.5.4 rebase (#12043) * format * Update model_convert.py * Fix serving docker related modifications (#12046) * Fix undesired modifications (#12048) * fix * Refine offline_inference arguments --------- Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com> Co-authored-by: Jun Wang <thoughts.times@gmail.com> Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com> Co-authored-by: liu-shaojun <johnssalyn@outlook.com> Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>	2024-09-10 15:37:43 +08:00
Ch1y0q	73a4360f3f	update lowbit path for baichuan2, qwen2, `generate.py` (#12051 ) * update lowbit path for baichuan2, qwen2, `generate.py` * update readme	2024-09-10 15:35:24 +08:00
Ruonan Wang	dc4af02b2a	Fix qwen2 1.5B NPU load error (#12049 )	2024-09-10 14:41:18 +08:00
Yishuo Wang	abc370728c	optimize minicpm3 again (#12047 )	2024-09-10 14:19:57 +08:00
Ch1y0q	f0061a9916	remove local import os to fix Baichuan NPU load issue (#12044 )	2024-09-10 14:13:24 +08:00
Ruonan Wang	640998edea	update inter_pp of qwen2 (#12041 )	2024-09-10 10:34:17 +08:00
Yishuo Wang	048b4590aa	add basic minicpm3 optimization (#12039 )	2024-09-09 17:25:08 +08:00
Chu,Youcheng	16c658e732	LLM: add known issues to harness evaluation (#12036 ) * feat: 在harness中添加known issue * fix: resolve comments * fix: small fixes	2024-09-09 14:15:42 +08:00
Yishuo Wang	6cedb601e4	remove some useless code (#12035 )	2024-09-06 17:51:08 +08:00
binbin Deng	d2e1b9aaff	Add input padding during prefill for qwen2-7b (#12033 )	2024-09-06 16:39:59 +08:00
Yuwen Hu	f61b1785fb	Small update to NPU example readme (#12034 ) * Small update to NPU example readme * Small fix	2024-09-06 15:54:23 +08:00
Ruonan Wang	0d04531ae0	update NPU readme of Qwen2 (#12032 ) * update readme * update broadcast	2024-09-06 15:02:39 +08:00
Yang Wang	58555bd9de	Optimize broadcast for npu llama (#12028 )	2024-09-06 13:28:20 +08:00
Shaojun Liu	e5581e6ded	Select the Appropriate APT Repository Based on CPU Type (#12023 )	2024-09-05 17:06:07 +08:00
binbin Deng	5b18bb3c4a	Add recommend version for mtl npu (#12024 )	2024-09-05 16:28:53 +08:00
binbin Deng	845e5dc89e	Support lm_head of minicpm-2b on NPU (#12019 )	2024-09-05 16:19:22 +08:00
Ch1y0q	820f8a4554	add `--lowbit-path` option for NPU llama example (#12020 ) * add option" `--lowbit-path` * add descriptions in `README.md` and formatting * Update llama.py	2024-09-05 15:31:01 +08:00
Guoqiong Song	8803242f5c	fix llama on cpu (#12018 )	2024-09-04 19:17:54 -07:00

... 2 3 4 5 6 ...

3621 commits