Commit graph

  • cd109bb061
    Gemma QLoRA example (#12969) Heyang Sun 2025-03-14 14:27:51 +0800
  • 8bc41c13ab
    Support PyTorch 2.6 with Arrow Lake-H AOT on Windows (#12967) Yuwen Hu 2025-03-13 15:29:47 +0800
  • c8a0462507
    Add vllm api_server input output log (#12962) Wang, Jian4 2025-03-12 20:58:04 +0800
  • 3941f322c5
    Update issue templates Jason Dai 2025-03-11 08:54:15 +0800
  • d0e443e893
    Update issue templates Jason Dai 2025-03-11 08:53:01 +0800
  • 6a2d87e40f
    add --entrypoint /bin/bash (#12957) Shaojun Liu 2025-03-10 10:10:27 +0800
  • 2a8f624f4b
    Update README (#12956) Jason Dai 2025-03-09 09:04:13 +0800
  • 5ee09b4b28
    [NPU] Small update about zip doc (#12951) binbin Deng 2025-03-07 15:22:14 +0800
  • 015a4c8c43
    Add CPU and GPU Frequency Locking Instructions to Documentation (#12947) Shaojun Liu 2025-03-07 09:20:40 +0800
  • cb3c4b26ad
    Update llamacpp_portable_zip_gpu_quickstart.md (#12945) Jason Dai 2025-03-06 11:58:11 +0800
  • 1432c5d9a0
    Update llamacpp_portable_zip_gpu_quickstart (#12941) Jason Dai 2025-03-06 10:01:56 +0800
  • 32480cc8ed
    Update llamacpp_portable_zip_gpu_quickstart (#12940) Jason Dai 2025-03-06 08:42:18 +0800
  • 975cf5f21f
    Update README.md (#12939) Jason Dai 2025-03-06 08:04:27 +0800
  • eccb5b817e
    Add llamacpp_portable_zip_gpu_quickstart.zh-CN.md (#12930) joan726 2025-03-05 14:55:44 +0800
  • 7c0c77cce3
    Tiny fixes (#12936) Yuwen Hu 2025-03-05 14:55:26 +0800
  • 68a770745b
    Add moonlight GPU example (#12929) Yuwen Hu 2025-03-05 11:31:14 +0800
  • 33da3a3cb7
    Update llama cpp portable zip quickstart (#12928) Xin Qiu 2025-03-05 09:22:10 +0800
  • de09590ca3
    Update llamacpp_portable_zip_gpu_quickstart.md (#12932) Jason Dai 2025-03-05 07:59:32 +0800
  • 69edc8b6f6
    Update quickstart (#12927) Jason Dai 2025-03-04 15:34:52 +0800
  • 0b5079833c
    llama.cpp portable Zip for Linux quickstart (#12923) Qiyuan Gong 2025-03-04 14:50:21 +0800
  • 091ab2bd59
    [NPU] Add troubleshooting in portable zip doc (#12924) binbin Deng 2025-03-04 10:41:39 +0800
  • b2d676f1c6
    Further update Ollama portable zip quickstart (#12921) Yuwen Hu 2025-03-03 18:07:57 +0800
  • f81d89d908
    Remove Unnecessary --privileged Flag While Keeping It for WSL Users (#12920) Shaojun Liu 2025-03-03 11:11:42 +0800
  • 7810b8fb49
    OSPDT: update dockerfile header (#12908) Shaojun Liu 2025-03-03 09:59:11 +0800
  • b6f33d5c4d
    optimize moonlight again (#12909) Yishuo Wang 2025-03-03 09:21:15 +0800
  • 35e5fa851c
    Update README.md (#12911) Jason Dai 2025-02-28 17:55:45 +0800
  • 8351f6c455
    [NPU] Add QuickStart for llama.cpp NPU portable zip (#12899) binbin Deng 2025-02-28 17:19:18 +0800
  • 029480f4a8
    llama cpp portable zip Quickstart (#12894) Xin Qiu 2025-02-28 15:45:11 +0800
  • 443cb5d4e0
    Update Janus-Pro GPU example (#12906) Yuwen Hu 2025-02-28 15:39:03 +0800
  • 8d94752c4b
    Ollama portable zip QuickStart updates regarding more tips (#12905) Yuwen Hu 2025-02-28 15:10:56 +0800
  • 39e360fe9d
    add grouped topk optimization for moonlight (#12903) Yishuo Wang 2025-02-28 13:25:56 +0800
  • e946127613
    glm 4v 1st sdp for vision (#12904) Xin Qiu 2025-02-28 13:23:27 +0800
  • 5c100ac105
    Add ENTRYPOINT to Dockerfile to auto-start vllm service on container launch (for CVTE customer) (#12901) Shaojun Liu 2025-02-27 17:33:58 +0800
  • be1f073866
    add fuse moe optimization for moonlight (#12898) Yishuo Wang 2025-02-27 09:15:24 +0800
  • ad65e2b03a
    Update README.md (#12900) Jason Dai 2025-02-27 08:30:06 +0800
  • 5faba06409
    simple optimization for moonlight moe decoding forward (#12891) Yishuo Wang 2025-02-25 16:18:27 +0800
  • ae9f5320da
    vLLM CPU: Fix Triton Version to Resolve Related Error(#12893) Xiangyu Tian 2025-02-25 15:00:41 +0800
  • ab3fc66eb7
    optimize attention part of moonlight-14B-A3B (#12886) Yishuo Wang 2025-02-25 09:38:13 +0800
  • dd30d12cb6
    Fix serving-cpu image: setuptools-scm requires setuptools>=61 (#12876) Shaojun Liu 2025-02-25 09:10:14 +0800
  • 06694ba61a
    Further fix portable zip file link (#12885) Yuwen Hu 2025-02-24 18:06:57 +0800
  • 671ddfd847
    Update wrong file name for portable zip quickstart (#12883) Yuwen Hu 2025-02-24 17:52:09 +0800
  • a9c8e73a77
    Update llama.cpp Prerequisites guide regarding oneAPI 2025.0 (#12881) Yuwen Hu 2025-02-24 16:32:23 +0800
  • 4f2f92afa3
    Update inference-cpp docker (#12882) Wang, Jian4 2025-02-24 14:32:44 +0800
  • 3f6ecce508
    support using xgrammar to get json output (#12870) Yishuo Wang 2025-02-24 14:10:58 +0800
  • afad979168
    Add Apache 2.0 License Information in Dockerfile to Comply with OSPDT Requirements (#12878) Shaojun Liu 2025-02-24 14:00:46 +0800
  • 02ec313eab
    Update README.md (#12877) Guancheng Fu 2025-02-24 09:59:17 +0800
  • 10400abfb7
    Fix CodeQL workflow (#12875) Shaojun Liu 2025-02-24 09:16:54 +0800
  • 1e00bed001
    Add GPU example for Janus-Pro (#12869) Xu, Shuo 2025-02-21 18:36:50 +0800
  • 21d6a78be0
    Update Ollama portable zip QuickStart to fit new version (#12871) Yuwen Hu 2025-02-21 17:54:14 +0800
  • 3ea5389a99
    Fix vllm api_server v1/models error (#12867) Wang, Jian4 2025-02-21 11:08:29 +0800
  • 8077850452
    [NPU GGUF] Add simple example (#12853) binbin Deng 2025-02-21 09:58:00 +0800
  • 348dc8056d
    Fix vllm gptq awq error (#12863) Wang, Jian4 2025-02-20 16:27:23 +0800
  • a488981f3f
    Ollama portable zip QuickStart tiny fix (#12862) Yuwen Hu 2025-02-20 14:11:12 +0800
  • 0f2706be42
    Update CN Ollama portable zip QuickStart for troubleshooting & tips (#12860) Yuwen Hu 2025-02-20 11:32:06 +0800
  • 38a682adb1
    Update Readme (#12855) Jason Dai 2025-02-19 19:55:29 +0800
  • 4eed0c7d99
    initial implementation for low_bit_loader vLLM (#12838) Guancheng Fu 2025-02-19 19:45:34 +0800
  • c81b7fc003
    Add Portable zip Linux QuickStart (#12849) Xin Qiu 2025-02-19 19:13:55 +0800
  • b26409d53f
    R1 Hybrid: Add Benchmark for DeepSeek R1 transformers example (#12854) Xiangyu Tian 2025-02-19 18:33:21 +0800
  • 5d041f9ebf
    Add latest models list in ollama quickstart (#12850) SONG Ge 2025-02-19 18:29:43 +0800
  • aee2db30f9
    update sdp support (#12847) Yishuo Wang 2025-02-19 12:07:00 +0800
  • 93c10be762
    LLM: Support hybrid convert for DeepSeek V3/R1 (#12834) Xiangyu Tian 2025-02-19 11:31:19 +0800
  • 637543e135
    Update Ollama portable zip QuickStart with troubleshooting (#12846) Yuwen Hu 2025-02-19 11:04:03 +0800
  • bde8acc303
    [NPU] Update doc of gguf support (#12837) binbin Deng 2025-02-19 10:46:35 +0800
  • e1809a6295
    Update multimodal on vllm 0.6.6 (#12816) Wang, Jian4 2025-02-19 10:04:42 +0800
  • 09150b6058
    Initiate CPU-XPU Hybrid Inference for DeepSeek-R1 (#12832) Xiangyu Tian 2025-02-18 13:34:14 +0800
  • 09ed96082b
    Add DeepSeek V3/R1 CPU example (#12836) Xiangyu Tian 2025-02-18 12:45:49 +0800
  • 8418450300
    optimize minicpm-o's tts part (#12833) Yishuo Wang 2025-02-17 14:53:37 +0800
  • f7b5a093a7
    Merge CPU & XPU Dockerfiles with Serving Images and Refactor (#12815) Shaojun Liu 2025-02-17 14:23:22 +0800
  • eaec64baca
    Update README.md (#12826) Jason Dai 2025-02-14 21:20:57 +0800
  • 59e8e1e91e
    Added ollama_portablze_zip_quickstart.zh-CN.md (#12822) joan726 2025-02-14 18:54:12 +0800
  • a09552e59a
    Update ollama quickstart (#12823) Jason Dai 2025-02-14 09:55:48 +0800
  • f67986021c
    Update download link for Ollama portable zip QuickStart (#12821) Yuwen Hu 2025-02-13 17:48:02 +0800
  • 16e63cbc18
    Update readme (#12820) Jason Dai 2025-02-13 14:26:04 +0800
  • 68414afcb9
    Add initial QuickStart for Ollama portable zip (#12817) Yuwen Hu 2025-02-13 13:18:14 +0800
  • 1083fe5508
    Reenable pp and lightweight-serving serving on 0.6.6 (#12814) Wang, Jian4 2025-02-13 10:16:00 +0800
  • af693425f1
    Upgrade to vLLM 0.6.6 (#12796) Guancheng Fu 2025-02-12 16:47:51 +0800
  • f8ab833f74
    support and optimize janus pro (#12813) Yishuo Wang 2025-02-12 15:07:24 +0800
  • bd815a4d96
    Update the base image of inference-cpp image to oneapi 2025.0.2 (#12802) Shaojun Liu 2025-02-12 14:15:08 +0800
  • 73cfe293fa
    add basic support for Baichuan-M1-14B-Instruct (#12808) Yishuo Wang 2025-02-11 17:27:42 +0800
  • d093b75aa0
    [NPU] Update driver installation in QuickStart (#12807) binbin Deng 2025-02-11 15:49:21 +0800
  • b70ad902b4
    Fix ipex-llm CPU linear dtype not match (#12805) Xiangyu Tian 2025-02-11 10:34:44 +0800
  • 2701a9d1e3
    Remove Migrated Workflows to Avoid Duplication and Confusion (#12801) Shaojun Liu 2025-02-10 14:58:08 +0800
  • eb2df5ed70
    common.h -> npu/npu_common.h (#12800) Yina Chen 2025-02-10 08:38:22 +0200
  • e4ceb722b6
    fix qwen2 vl (#12798) Yishuo Wang 2025-02-10 13:25:53 +0800
  • 3fee838b14
    [NPU] Fix of c++ convert example (#12797) binbin Deng 2025-02-10 11:17:58 +0800
  • 468d3f22fc
    Rename NPU public example to llm-cli (#12790) Kai Huang 2025-02-08 10:19:59 +0800
  • e90a9ad196
    [NPU] Support non-const parameter for decoder layers when keep_ir=True (#12789) Ruonan Wang 2025-02-08 09:58:42 +0800
  • 8aea5319bb
    update more lora example (#12785) Yishuo Wang 2025-02-08 09:46:48 +0800
  • fd28cf1672
    Upgrade ipex-llm[cpp] to oneAPI 2025.0 on Windows (#12778) Yuwen Hu 2025-02-07 18:29:34 +0800
  • ca1d7b7c2c
    [NPU] Support qwen models with cos_sin_input=True (#12788) binbin Deng 2025-02-07 16:41:13 +0800
  • 6ff7faa781
    [NPU] Update deepseek support in python examples and quickstart (#12786) binbin Deng 2025-02-07 11:25:16 +0800
  • b4f2be2b09
    [NPU] Update C++ example to add DeepSeek-R1 (#12787) Ruonan Wang 2025-02-07 11:23:34 +0800
  • d0d9c9d636
    remove load_in_8bit usage as it is not supported a long time ago (#12779) Yishuo Wang 2025-02-07 11:21:29 +0800
  • 9e9b6c9f2b
    Fix cpu serving docker image (#12783) Xiangyu Tian 2025-02-07 11:12:42 +0800
  • b4c9e23f73
    fix galore and peft finetune example (#12776) Yishuo Wang 2025-02-06 16:36:13 +0800
  • c0d6b282b8
    fix lisa finetune example (#12775) Yishuo Wang 2025-02-06 16:35:43 +0800
  • 2e5f2e5dda
    fix dpo finetune (#12774) Yishuo Wang 2025-02-06 16:35:21 +0800
  • 9697197f3e
    fix qlora finetune example (#12769) Yishuo Wang 2025-02-06 11:18:28 +0800
  • 094a25b740
    [NPU] Expose parameter to control blob / IR save logic (#12767) Ruonan Wang 2025-02-06 10:07:45 +0800
  • 9c0daf6396
    Fix readme links (#12771) Jason Dai 2025-02-05 19:24:25 +0800