Commit graph

3898 commits

Author SHA1 Message Date
Ruonan Wang
78cca0a68c
[NPU] update llm-npu-cli example (#12729)
* update cli example

* add license

* rename

* update readme sample output
2025-01-22 09:59:27 +08:00
Jason Dai
7e29edcc4b
Update Readme (#12730) 2025-01-22 08:43:32 +08:00
Yishuo Wang
6789e5d92f
small fix (#12727) 2025-01-21 17:27:18 +08:00
Jason Dai
412bfd6644
Update readme (#12724) 2025-01-21 10:59:14 +08:00
Wang, Jian4
716d4fe563
Add vllm 0.6.2 vision offline example (#12721)
* add vision offline example

* add to docker
2025-01-21 09:58:01 +08:00
Yishuo Wang
085974e307
fix nf4 to cpu (#12722) 2025-01-21 09:23:22 +08:00
Yuwen Hu
9aa4be8ced
Update runtime configuration on MTL (#12720) 2025-01-20 11:06:37 +08:00
Yishuo Wang
bda87c21eb
add support and optimization for minicpmo audio part (#12716) 2025-01-16 16:39:00 +08:00
Shaojun Liu
53aae24616
Add note about enabling Resizable BAR in BIOS for GPU setup (#12715) 2025-01-16 16:22:35 +08:00
Yuwen Hu
534e0e6774
Update dependency for PyTorch 2.6 RC support for woq int4 (#12714) 2025-01-16 15:51:57 +08:00
Zhao Changmin
54d6328b3c
woq int4 fwd (#12711) 2025-01-16 15:48:05 +08:00
Yishuo Wang
b62734748f
add support and optimization for minicpmo vision part (#12713) 2025-01-16 14:51:00 +08:00
Yuwen Hu
c52bdff76b
Update Deepseek coder GPU example (#12712)
* Update Deepseek coder GPU example

* Fix based on comment
2025-01-16 14:05:31 +08:00
Yuwen Hu
9d65dcd7ef
Fix deepseek coder with linear rope type support on GPU (#12709)
* Fix deepseek coder with linear rope type

* Style fix

* Move to optimize_pre

* Small fix

* Small fix

* Small fix to not affect other cases

* Style fixes

* Update function name

* Small fix

* Small fix

* Small fix

* Fix for low transformers version first

* Style fix

* Small fix
2025-01-15 21:12:34 +08:00
binbin Deng
36bf3d8e29
[NPU doc] Update ARL product in QuickStart (#12708) 2025-01-15 15:57:06 +08:00
Cengguang Zhang
9930351112
LLM: add new qtype woq_int4 to support gemm int4 temporary. (#12706)
This PR add temporary qtype woq_int4 to avoid affecting other qtype and models.

Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2025-01-15 14:41:33 +08:00
Yuwen Hu
6d03d06ebb
Change runtime configurations for perf test on Windows (#12705)
* Change runtime configurations for perf test on Windows

* Small fix
2025-01-14 17:54:57 +08:00
Xu, Shuo
350fae285d
Add Qwen2-VL HF GPU example with ModelScope Support (#12606)
* Add qwen2-vl example

* complete generate.py & readme

* improve lint style

* update 1-6

* update main readme

* Format and other small fixes

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2025-01-13 15:42:04 +08:00
Yuwen Hu
a1da7908b9
Fix name device is not found bug (#12703) 2025-01-13 10:11:02 +08:00
SONG Ge
e2d58f733e
Update ollama v0.5.1 document (#12699)
* Update ollama document version and known issue
2025-01-10 18:04:49 +08:00
Yishuo Wang
db9db51e2c
fix lnl perf (#12700) 2025-01-10 18:00:58 +08:00
Yuwen Hu
4bf93c66e8
Support install from source for PyTorch 2.6 RC in UT (#12697)
* Support install from source for PyTorch 2.6 RC in UT

* Remove expecttest
2025-01-10 16:44:18 +08:00
binbin Deng
da8bcb7db1
[NPU ] fix load logic of glm-edge models (#12698) 2025-01-10 16:08:37 +08:00
joan726
584c1c5373
Update B580 CN doc (#12695) 2025-01-10 11:20:47 +08:00
Jason Dai
cbb8e2a2d5
Update documents (#12693) 2025-01-10 10:47:11 +08:00
Yishuo Wang
f8dc408888
fix user issue (#12692) 2025-01-10 10:18:47 +08:00
Yishuo Wang
68857494a5
refactor to simplify following upgrade 2 (#12685) 2025-01-10 09:29:03 +08:00
Shaojun Liu
2673792de6
Update Dockerfile (#12688) 2025-01-10 09:01:29 +08:00
Jason Dai
f9b29a4f56
Update B580 doc (#12691) 2025-01-10 08:59:35 +08:00
joan726
66d4385cc9
Update B580 CN Doc (#12686) 2025-01-09 19:10:57 +08:00
Yuwen Hu
c24741584d
Support PyTorch 2.6 RC perf test on Windows (#12683) 2025-01-09 18:17:23 +08:00
Yishuo Wang
7234c9b27b
update quantize kv cache condition (#12681) 2025-01-09 15:23:04 +08:00
Yuwen Hu
5d8081afbc
Remove dummy model from performance tests (#12682) 2025-01-09 14:50:17 +08:00
Yishuo Wang
1ec40cd09e
refactor to simplify following upgrade (#12680) 2025-01-09 13:34:30 +08:00
Jason Dai
aa9e70a347
Update B580 Doc (#12678) 2025-01-08 22:36:48 +08:00
Jason Dai
c6f57ad6ed
Update README.md (#12677) 2025-01-08 21:55:52 +08:00
Jason Dai
2321e8d60c
Update README.md (#12676) 2025-01-08 21:54:31 +08:00
Yishuo Wang
5c24276fc4
fix custom kernel registration (#12674) 2025-01-08 17:39:17 +08:00
Yishuo Wang
a22a8c21bb
small fix and remove ununsed code about ipex (#12671) 2025-01-08 17:39:04 +08:00
Yishuo Wang
c11f5f0fcd
also convert SdpaAttention in optimize_model (#12673) 2025-01-08 16:48:03 +08:00
Shaojun Liu
2c23ce2553
Create a BattleMage QuickStart (#12663)
* Create bmg_quickstart.md

* Update bmg_quickstart.md

* Clarify IPEX-LLM package installation based on use case

* Update bmg_quickstart.md

* Update bmg_quickstart.md
2025-01-08 14:58:37 +08:00
Yishuo Wang
7dd156d292
small fix and add comment (#12670) 2025-01-08 10:56:50 +08:00
Yishuo Wang
ccf618ff4a
Remove all ipex usage (#12666) 2025-01-08 10:31:18 +08:00
logicat
0534d7254f
Update docker_cpp_xpu_quickstart.md (#12667) 2025-01-08 09:56:56 +08:00
Yuwen Hu
5db6f9dcde
Add option with PyTorch 2.6 RC version for testing purposes (#12668)
* Add option with PyTorch 2.6 RC version for testing purposes

* Small update
2025-01-07 18:28:55 +08:00
Yishuo Wang
f9ee7898c8
fix onednn dependency bug (#12665) 2025-01-07 16:26:56 +08:00
Yishuo Wang
29ad5c449e
refactor codegeex to remove ipex kernel usage (#12664) 2025-01-07 16:17:40 +08:00
Yuwen Hu
525b0ee991
[NPU] Tiny fixes on examples (#12661) 2025-01-07 14:30:38 +08:00
Yuwen Hu
ebdf19fa7e
[NPU] Further fix saving of generation config (#12657)
* Further fix saving of generation config

* Fix based on comments

* Small fix
2025-01-07 13:53:54 +08:00
Yuwen Hu
381d448ee2
[NPU] Example & Quickstart updates (#12650)
* Remove model with optimize_model=False in NPU verified models tables, and remove related example

* Remove experimental in run optimized model section title

* Unify model table order & example cmd

* Move embedding example to separate folder & update quickstart example link

* Add Quickstart reference in main NPU readme

* Small fix

* Small fix

* Move save/load examples under NPU/HF-Transformers-AutoModels

* Add low-bit and polish arguments for LLM Python examples

* Small fix

* Add low-bit and polish arguments for Multi-Model  examples

* Polish argument for Embedding models

* Polish argument for LLM CPP examples

* Add low-bit and polish argument for Save-Load examples

* Add accuracy tuning tips for examples

* Update NPU qucikstart accuracy tuning with low-bit optimizations

* Add save/load section to qucikstart

* Update CPP example sample output to EN

* Add installation regarding cmake for CPP examples

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Unify max prompt length to 512

* Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4

* Update based on comments

* Small fix
2025-01-07 13:52:41 +08:00