ipex-llm

Author	SHA1	Message	Date
Yuwen Hu	8d94752c4b	Ollama portable zip QuickStart updates regarding more tips (#12905 ) * Update for select multiple GPUs * Update Ollama portable zip quickstarts regarding more tips * Small fix	2025-02-28 15:10:56 +08:00
Yishuo Wang	39e360fe9d	add grouped topk optimization for moonlight (#12903 )	2025-02-28 13:25:56 +08:00
Xin Qiu	e946127613	glm 4v 1st sdp for vision (#12904 ) * glm4v 1st sdp * update glm4v example * meet code review * fix style	2025-02-28 13:23:27 +08:00
Shaojun Liu	5c100ac105	Add ENTRYPOINT to Dockerfile to auto-start vllm service on container launch (for CVTE customer) (#12901 ) * Add ENTRYPOINT to Dockerfile to auto-start service on container launch (for CVTE client) * Update start-vllm-service.sh * Update README.md * Update README.md * Update start-vllm-service.sh * Update README.md	2025-02-27 17:33:58 +08:00
Yishuo Wang	be1f073866	add fuse moe optimization for moonlight (#12898 )	2025-02-27 09:15:24 +08:00
Jason Dai	ad65e2b03a	Update README.md (#12900 )	2025-02-27 08:30:06 +08:00
Yishuo Wang	5faba06409	simple optimization for moonlight moe decoding forward (#12891 )	2025-02-25 16:18:27 +08:00
Xiangyu Tian	ae9f5320da	vLLM CPU: Fix Triton Version to Resolve Related Error(#12893 )	2025-02-25 15:00:41 +08:00
Yishuo Wang	ab3fc66eb7	optimize attention part of moonlight-14B-A3B (#12886 )	2025-02-25 09:38:13 +08:00
Shaojun Liu	dd30d12cb6	Fix serving-cpu image: setuptools-scm requires setuptools>=61 (#12876 ) * setuptools-scm requires setuptools>=61 * Update Dockerfile * Update Dockerfile * Update Dockerfile	2025-02-25 09:10:14 +08:00
Yuwen Hu	06694ba61a	Further fix portable zip file link (#12885 )	2025-02-24 18:06:57 +08:00
Yuwen Hu	671ddfd847	Update wrong file name for portable zip quickstart (#12883 )	2025-02-24 17:52:09 +08:00
Yuwen Hu	a9c8e73a77	Update llama.cpp Prerequisites guide regarding oneAPI 2025.0 (#12881 ) * Update llama.cpp Prerequisites guide regarding oneAPI 2025.0 * Update based on comments * Small fix * Small fix	2025-02-24 16:32:23 +08:00
Wang, Jian4	4f2f92afa3	Update inference-cpp docker (#12882 ) * remove nouse run.py * add WORKDIR /llm	2025-02-24 14:32:44 +08:00
Yishuo Wang	3f6ecce508	support using xgrammar to get json output (#12870 )	2025-02-24 14:10:58 +08:00
Shaojun Liu	afad979168	Add Apache 2.0 License Information in Dockerfile to Comply with OSPDT Requirements (#12878 ) * ospdt: add Header for Dockerfile * OSPDT: add Header for Dockerfile * OSPDT: add Header for Dockerfile * OSPDT: add Header for Dockerfile	2025-02-24 14:00:46 +08:00
Guancheng Fu	02ec313eab	Update README.md (#12877 )	2025-02-24 09:59:17 +08:00
Shaojun Liu	10400abfb7	Fix CodeQL workflow (#12875 ) * Update codeql.yml * Update codeql.yml	2025-02-24 09:16:54 +08:00
Xu, Shuo	1e00bed001	Add GPU example for Janus-Pro (#12869 ) * Add example for Janus-Pro * Update model link * Fixes * Fixes --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com> Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2025-02-21 18:36:50 +08:00
Yuwen Hu	21d6a78be0	Update Ollama portable zip QuickStart to fit new version (#12871 ) * Update ollama portable zip quickstart * Update demo images	2025-02-21 17:54:14 +08:00
Wang, Jian4	3ea5389a99	Fix vllm api_server v1/models error (#12867 )	2025-02-21 11:08:29 +08:00
binbin Deng	8077850452	[NPU GGUF] Add simple example (#12853 )	2025-02-21 09:58:00 +08:00
Wang, Jian4	348dc8056d	Fix vllm gptq awq error (#12863 ) * fix gptq awq error * fix python style	2025-02-20 16:27:23 +08:00
Yuwen Hu	a488981f3f	Ollama portable zip QuickStart tiny fix (#12862 ) * Tiny fix to ollama portable zip quickstart * Tiny fix	2025-02-20 14:11:12 +08:00
Yuwen Hu	0f2706be42	Update CN Ollama portable zip QuickStart for troubleshooting & tips (#12860 ) * Small fix for english version * Update CN ollama portable zip quickstart for troubleshooting & tips * Small fix	2025-02-20 11:32:06 +08:00
Jason Dai	38a682adb1	Update Readme (#12855 )	2025-02-19 19:55:29 +08:00
Guancheng Fu	4eed0c7d99	initial implementation for low_bit_loader vLLM (#12838 ) * initial * add logic for handling tensor parallel models * fix * Add some comments * add doc * fix done	2025-02-19 19:45:34 +08:00
Xin Qiu	c81b7fc003	Add Portable zip Linux QuickStart (#12849 ) * linux doc * update * Update ollama_portablze_zip_quickstart.md * Update ollama_portablze_zip_quickstart.md * Update ollama_portablze_zip_quickstart.zh-CN.md * Update ollama_portablze_zip_quickstart.md * meet code review * update * Add tips & troubleshooting sections for both Linux & Windows * Rebase * Fix based on comments * Small fix * Fix img * Update table for linux * Small fix --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2025-02-19 19:13:55 +08:00
Xiangyu Tian	b26409d53f	R1 Hybrid: Add Benchmark for DeepSeek R1 transformers example (#12854 ) * init * fix * update * update * fix * fix	2025-02-19 18:33:21 +08:00
SONG Ge	5d041f9ebf	Add latest models list in ollama quickstart (#12850 ) * Add latest models llist on ollama quickstart * update oneapi version describe * move models list to ollama_portable_zip doc * update CN readme	2025-02-19 18:29:43 +08:00
Yishuo Wang	aee2db30f9	update sdp support (#12847 )	2025-02-19 12:07:00 +08:00
Xiangyu Tian	93c10be762	LLM: Support hybrid convert for DeepSeek V3/R1 (#12834 ) LLM: Support hybrid convert for DeepSeek V3/R1	2025-02-19 11:31:19 +08:00
Yuwen Hu	637543e135	Update Ollama portable zip QuickStart with troubleshooting (#12846 ) * Update ollama portable zip quickstart with runtime configurations * Small fix * Update based on comments * Small fix * Small fix	2025-02-19 11:04:03 +08:00
binbin Deng	bde8acc303	[NPU] Update doc of gguf support (#12837 )	2025-02-19 10:46:35 +08:00
Wang, Jian4	e1809a6295	Update multimodal on vllm 0.6.6 (#12816 ) * add glm4v and minicpmv example * fix	2025-02-19 10:04:42 +08:00
Xiangyu Tian	09150b6058	Initiate CPU-XPU Hybrid Inference for DeepSeek-R1 (#12832 ) Initiate CPU-XPU Hybrid Inference for DeepSeek-R1 with DeepseekV3Attention and DeepseekV3MLP to XPU	2025-02-18 13:34:14 +08:00
Xiangyu Tian	09ed96082b	Add DeepSeek V3/R1 CPU example (#12836 ) Add DeepSeek V3/R1 CPU example for bf16 model	2025-02-18 12:45:49 +08:00
Yishuo Wang	8418450300	optimize minicpm-o's tts part (#12833 )	2025-02-17 14:53:37 +08:00
Shaojun Liu	f7b5a093a7	Merge CPU & XPU Dockerfiles with Serving Images and Refactor (#12815 ) * Update Dockerfile * Update Dockerfile * Ensure scripts are executable * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * update * Update Dockerfile * remove inference-cpu and inference-xpu * update README	2025-02-17 14:23:22 +08:00
Jason Dai	eaec64baca	Update README.md (#12826 )	2025-02-14 21:20:57 +08:00
joan726	59e8e1e91e	Added ollama_portablze_zip_quickstart.zh-CN.md (#12822 )	2025-02-14 18:54:12 +08:00
Jason Dai	a09552e59a	Update ollama quickstart (#12823 )	2025-02-14 09:55:48 +08:00
Yuwen Hu	f67986021c	Update download link for Ollama portable zip QuickStart (#12821 ) * Update download link for Ollama portable zip quickstart * Update based on comments	2025-02-13 17:48:02 +08:00
Jason Dai	16e63cbc18	Update readme (#12820 )	2025-02-13 14:26:04 +08:00
Yuwen Hu	68414afcb9	Add initial QuickStart for Ollama portable zip (#12817 ) * Add initial quickstart for Ollama portable zip * Small fix * Fixed based on comments * Small fix * Add demo image for run ollama * Update download link	2025-02-13 13:18:14 +08:00
Wang, Jian4	1083fe5508	Reenable pp and lightweight-serving serving on 0.6.6 (#12814 ) * reenable pp ang lightweight serving on 066 * update readme * updat * update tag	2025-02-13 10:16:00 +08:00
Guancheng Fu	af693425f1	Upgrade to vLLM 0.6.6 (#12796 ) * init * update engine init * fix serving load_in_low_bit problem * temp * temp * temp * temp * temp * fix * fixed * done * fix * fix all arguments * fix * fix throughput script * fix * fix * use official ipex-llm * Fix readme * fix --------- Co-authored-by: hzjane <a1015616934@qq.com>	2025-02-12 16:47:51 +08:00
Yishuo Wang	f8ab833f74	support and optimize janus pro (#12813 )	2025-02-12 15:07:24 +08:00
Shaojun Liu	bd815a4d96	Update the base image of inference-cpp image to oneapi 2025.0.2 (#12802 ) * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile	2025-02-12 14:15:08 +08:00
Yishuo Wang	73cfe293fa	add basic support for Baichuan-M1-14B-Instruct (#12808 )	2025-02-11 17:27:42 +08:00

1 2 3 4 5 ...

4030 commits