Commit graph

3856 commits

Author SHA1 Message Date
Yishuo Wang
ccf618ff4a
Remove all ipex usage (#12666) 2025-01-08 10:31:18 +08:00
logicat
0534d7254f
Update docker_cpp_xpu_quickstart.md (#12667) 2025-01-08 09:56:56 +08:00
Yuwen Hu
5db6f9dcde
Add option with PyTorch 2.6 RC version for testing purposes (#12668)
* Add option with PyTorch 2.6 RC version for testing purposes

* Small update
2025-01-07 18:28:55 +08:00
Yishuo Wang
f9ee7898c8
fix onednn dependency bug (#12665) 2025-01-07 16:26:56 +08:00
Yishuo Wang
29ad5c449e
refactor codegeex to remove ipex kernel usage (#12664) 2025-01-07 16:17:40 +08:00
Yuwen Hu
525b0ee991
[NPU] Tiny fixes on examples (#12661) 2025-01-07 14:30:38 +08:00
Yuwen Hu
ebdf19fa7e
[NPU] Further fix saving of generation config (#12657)
* Further fix saving of generation config

* Fix based on comments

* Small fix
2025-01-07 13:53:54 +08:00
Yuwen Hu
381d448ee2
[NPU] Example & Quickstart updates (#12650)
* Remove model with optimize_model=False in NPU verified models tables, and remove related example

* Remove experimental in run optimized model section title

* Unify model table order & example cmd

* Move embedding example to separate folder & update quickstart example link

* Add Quickstart reference in main NPU readme

* Small fix

* Small fix

* Move save/load examples under NPU/HF-Transformers-AutoModels

* Add low-bit and polish arguments for LLM Python examples

* Small fix

* Add low-bit and polish arguments for Multi-Model  examples

* Polish argument for Embedding models

* Polish argument for LLM CPP examples

* Add low-bit and polish argument for Save-Load examples

* Add accuracy tuning tips for examples

* Update NPU qucikstart accuracy tuning with low-bit optimizations

* Add save/load section to qucikstart

* Update CPP example sample output to EN

* Add installation regarding cmake for CPP examples

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Unify max prompt length to 512

* Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4

* Update based on comments

* Small fix
2025-01-07 13:52:41 +08:00
Yishuo Wang
ddc0ef3993
refactor device check and remove cohere/mixtral support (#12659) 2025-01-07 11:15:51 +08:00
Yishuo Wang
ea65e4fecc
remove falcon support and related UT (#12656) 2025-01-07 09:26:00 +08:00
Yina Chen
fae73eee79
[NPU] Support save npu quantized model without npu dependency (#12647)
* support save awq

* load quantized model & save npu compiled model

* fix style

* update

* fix dll load issue

* update error message

* fix style
2025-01-06 18:06:22 +08:00
Yishuo Wang
502461d836
remove unnecessary ipex kernel usage (#12649) 2025-01-03 16:45:24 +08:00
Yishuo Wang
9f8b134889
add ipex-llm custom kernel registration (#12648) 2025-01-03 16:45:04 +08:00
binbin Deng
0b377100c5
Add guide for save-load usage (#12498) 2025-01-03 16:30:15 +08:00
Wang, Jian4
6711a48a36
Enable internvl2-8b on vllm(#12645) 2025-01-03 14:49:36 +08:00
Zijie Li
8fd2dcba86
Add benchmark_util for transformers >= 4.47.0 (#12644) 2025-01-03 10:48:29 +08:00
SONG Ge
550fa01649
[Doc] Update ipex-llm ollama troubleshooting for v0.4.6 (#12642)
* update ollama v0.4.6 troubleshooting

* update chinese ollama-doc
2025-01-02 17:28:54 +08:00
Yina Chen
8e5328e9b4
add disable opts for awq (#12641) 2025-01-02 15:45:22 +08:00
Xu, Shuo
62318964fa
Update llama example information (#12640)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2025-01-02 13:48:39 +08:00
Yishuo Wang
81211fd010
remove unused code (#12635) 2025-01-02 13:31:09 +08:00
binbin Deng
534566e290
[NPU] Support minicpm-v with python cpp backend (#12637) 2025-01-02 11:13:15 +08:00
Yishuo Wang
f289f68d57
small fix (#12634) 2024-12-30 17:14:25 +08:00
Yishuo Wang
2d08155513
remove bmm, which is only required in ipex 2.0 (#12630) 2024-12-27 17:28:57 +08:00
binbin Deng
f17ccfa61a
[NPU] Fix save-load usage of minicpm models (#12628) 2024-12-27 15:56:46 +08:00
Yishuo Wang
c72a5db757
remove unused code again (#12624) 2024-12-27 14:17:11 +08:00
binbin Deng
46eeab4479
[NPU] Fix regression caused by layer_norm change (#12627) 2024-12-27 14:08:49 +08:00
Ruonan Wang
90f6709486
[remove pipeline examples (#12626) 2024-12-27 13:42:28 +08:00
Zijie Li
5f04ed7254
NPU] Update prompt format for baichuan2-pipeline (#12625) 2024-12-27 11:30:54 +08:00
Yishuo Wang
34dbdb8ee3
small fix (#12623) 2024-12-27 10:19:27 +08:00
Xu, Shuo
55ce091242
Add GLM4-Edge-V GPU example (#12596)
* Add GLM4-Edge-V examples

* polish readme

* revert wrong changes

* polish readme

* polish readme

* little polish in reference info and indent

* Small fix and sample output updates

* Update main readme

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-12-27 09:40:29 +08:00
binbin Deng
796ee571a5
[NPU doc] Update verified platforms (#12621) 2024-12-26 17:39:13 +08:00
Ruonan Wang
bbdbbb0d88
[NPU] Compatible with other third-party models like auto-round (#12620)
* support third party model

* simplify code

* fix sty;e

* fix sym int4 GW

* code refactor

* fix
2024-12-26 17:25:18 +08:00
Yishuo Wang
a9abde0b5d
support passing attn_scale to sdpa (#12619) 2024-12-26 16:58:09 +08:00
Shaojun Liu
40a7d2b4f0
Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Environments (#12618)
* run c-eval on multi-GPUs

* Update README.md
2024-12-26 15:23:32 +08:00
Zijie Li
ccc4055058
[NPU] Update prompt format for baichuan2 (#12615)
* Update baichuan2.py

* style fix
2024-12-26 11:41:37 +08:00
Yishuo Wang
1604b4ead8
small fix (#12616) 2024-12-26 11:35:12 +08:00
Ruonan Wang
d841e1dc0d
[NPU] update convert script based on latest usage (#12617) 2024-12-26 11:23:04 +08:00
Xu, Shuo
ef585d3360
Polish Readme for ModelScope-related examples (#12603) 2024-12-26 10:52:47 +08:00
Shaojun Liu
28737c250c
Update Dockerfile (#12585) 2024-12-26 10:20:52 +08:00
Yishuo Wang
a596f1ae5f
remove bigdl-llm test to fix langchain UT (#12613) 2024-12-26 10:17:25 +08:00
Ruonan Wang
9e895f04ec
[NPU] fix npu save (#12614)
* fix npu save

* update
2024-12-26 09:21:16 +08:00
Mingqi Hu
0477fe6480
[docs] Update doc for latest open webui: 0.4.8 (#12591)
* Update open webui doc

* Resolve comments
2024-12-26 09:18:20 +08:00
Yishuo Wang
6249c1e373
rewrite llama optimization (#12609) 2024-12-25 17:04:32 +08:00
Yishuo Wang
5f5ac8a856
fix llama related import (#12611) 2024-12-25 16:23:52 +08:00
Jason Dai
54b1d7d333
Update README.zh-CN.md (#12610) 2024-12-25 15:38:59 +08:00
Yishuo Wang
4e6b9d804f
add compresskv back for mistral (#12607)
* add compresskv back for mistral

* fix

* fix
2024-12-25 11:06:08 +08:00
joan726
9c9800be31
Update README.zh-CN.md (#12570) 2024-12-24 20:32:36 +08:00
Yishuo Wang
4135b895b3
refactor chatglm2, internlm, stablelm and qwen (#12604) 2024-12-24 18:18:00 +08:00
Yishuo Wang
073f936c37
refactor mistral and phi3 (#12605) 2024-12-24 17:52:32 +08:00
binbin Deng
45f8f72a28
[NPU] Fix minicpm on MTL (#12599) 2024-12-24 15:37:56 +08:00