Yuwen Hu
4bf93c66e8
Support install from source for PyTorch 2.6 RC in UT ( #12697 )
...
* Support install from source for PyTorch 2.6 RC in UT
* Remove expecttest
2025-01-10 16:44:18 +08:00
binbin Deng
da8bcb7db1
[NPU ] fix load logic of glm-edge models ( #12698 )
2025-01-10 16:08:37 +08:00
joan726
584c1c5373
Update B580 CN doc ( #12695 )
2025-01-10 11:20:47 +08:00
Jason Dai
cbb8e2a2d5
Update documents ( #12693 )
2025-01-10 10:47:11 +08:00
Yishuo Wang
f8dc408888
fix user issue ( #12692 )
2025-01-10 10:18:47 +08:00
Yishuo Wang
68857494a5
refactor to simplify following upgrade 2 ( #12685 )
2025-01-10 09:29:03 +08:00
Shaojun Liu
2673792de6
Update Dockerfile ( #12688 )
2025-01-10 09:01:29 +08:00
Jason Dai
f9b29a4f56
Update B580 doc ( #12691 )
2025-01-10 08:59:35 +08:00
joan726
66d4385cc9
Update B580 CN Doc ( #12686 )
2025-01-09 19:10:57 +08:00
Yuwen Hu
c24741584d
Support PyTorch 2.6 RC perf test on Windows ( #12683 )
2025-01-09 18:17:23 +08:00
Yishuo Wang
7234c9b27b
update quantize kv cache condition ( #12681 )
2025-01-09 15:23:04 +08:00
Yuwen Hu
5d8081afbc
Remove dummy model from performance tests ( #12682 )
2025-01-09 14:50:17 +08:00
Yishuo Wang
1ec40cd09e
refactor to simplify following upgrade ( #12680 )
2025-01-09 13:34:30 +08:00
Jason Dai
aa9e70a347
Update B580 Doc ( #12678 )
2025-01-08 22:36:48 +08:00
Jason Dai
c6f57ad6ed
Update README.md ( #12677 )
2025-01-08 21:55:52 +08:00
Jason Dai
2321e8d60c
Update README.md ( #12676 )
2025-01-08 21:54:31 +08:00
Yishuo Wang
5c24276fc4
fix custom kernel registration ( #12674 )
2025-01-08 17:39:17 +08:00
Yishuo Wang
a22a8c21bb
small fix and remove ununsed code about ipex ( #12671 )
2025-01-08 17:39:04 +08:00
Yishuo Wang
c11f5f0fcd
also convert SdpaAttention in optimize_model ( #12673 )
2025-01-08 16:48:03 +08:00
Shaojun Liu
2c23ce2553
Create a BattleMage QuickStart ( #12663 )
...
* Create bmg_quickstart.md
* Update bmg_quickstart.md
* Clarify IPEX-LLM package installation based on use case
* Update bmg_quickstart.md
* Update bmg_quickstart.md
2025-01-08 14:58:37 +08:00
Yishuo Wang
7dd156d292
small fix and add comment ( #12670 )
2025-01-08 10:56:50 +08:00
Yishuo Wang
ccf618ff4a
Remove all ipex usage ( #12666 )
2025-01-08 10:31:18 +08:00
logicat
0534d7254f
Update docker_cpp_xpu_quickstart.md ( #12667 )
2025-01-08 09:56:56 +08:00
Yuwen Hu
5db6f9dcde
Add option with PyTorch 2.6 RC version for testing purposes ( #12668 )
...
* Add option with PyTorch 2.6 RC version for testing purposes
* Small update
2025-01-07 18:28:55 +08:00
Yishuo Wang
f9ee7898c8
fix onednn dependency bug ( #12665 )
2025-01-07 16:26:56 +08:00
Yishuo Wang
29ad5c449e
refactor codegeex to remove ipex kernel usage ( #12664 )
2025-01-07 16:17:40 +08:00
Yuwen Hu
525b0ee991
[NPU] Tiny fixes on examples ( #12661 )
2025-01-07 14:30:38 +08:00
Yuwen Hu
ebdf19fa7e
[NPU] Further fix saving of generation config ( #12657 )
...
* Further fix saving of generation config
* Fix based on comments
* Small fix
2025-01-07 13:53:54 +08:00
Yuwen Hu
381d448ee2
[NPU] Example & Quickstart updates ( #12650 )
...
* Remove model with optimize_model=False in NPU verified models tables, and remove related example
* Remove experimental in run optimized model section title
* Unify model table order & example cmd
* Move embedding example to separate folder & update quickstart example link
* Add Quickstart reference in main NPU readme
* Small fix
* Small fix
* Move save/load examples under NPU/HF-Transformers-AutoModels
* Add low-bit and polish arguments for LLM Python examples
* Small fix
* Add low-bit and polish arguments for Multi-Model examples
* Polish argument for Embedding models
* Polish argument for LLM CPP examples
* Add low-bit and polish argument for Save-Load examples
* Add accuracy tuning tips for examples
* Update NPU qucikstart accuracy tuning with low-bit optimizations
* Add save/load section to qucikstart
* Update CPP example sample output to EN
* Add installation regarding cmake for CPP examples
* Small fix
* Small fix
* Small fix
* Small fix
* Small fix
* Small fix
* Unify max prompt length to 512
* Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4
* Update based on comments
* Small fix
2025-01-07 13:52:41 +08:00
Yishuo Wang
ddc0ef3993
refactor device check and remove cohere/mixtral support ( #12659 )
2025-01-07 11:15:51 +08:00
Yishuo Wang
ea65e4fecc
remove falcon support and related UT ( #12656 )
2025-01-07 09:26:00 +08:00
Yina Chen
fae73eee79
[NPU] Support save npu quantized model without npu dependency ( #12647 )
...
* support save awq
* load quantized model & save npu compiled model
* fix style
* update
* fix dll load issue
* update error message
* fix style
2025-01-06 18:06:22 +08:00
Yishuo Wang
502461d836
remove unnecessary ipex kernel usage ( #12649 )
2025-01-03 16:45:24 +08:00
Yishuo Wang
9f8b134889
add ipex-llm custom kernel registration ( #12648 )
2025-01-03 16:45:04 +08:00
binbin Deng
0b377100c5
Add guide for save-load usage ( #12498 )
2025-01-03 16:30:15 +08:00
Wang, Jian4
6711a48a36
Enable internvl2-8b on vllm( #12645 )
2025-01-03 14:49:36 +08:00
Zijie Li
8fd2dcba86
Add benchmark_util for transformers >= 4.47.0 ( #12644 )
2025-01-03 10:48:29 +08:00
SONG Ge
550fa01649
[Doc] Update ipex-llm ollama troubleshooting for v0.4.6 ( #12642 )
...
* update ollama v0.4.6 troubleshooting
* update chinese ollama-doc
2025-01-02 17:28:54 +08:00
Yina Chen
8e5328e9b4
add disable opts for awq ( #12641 )
2025-01-02 15:45:22 +08:00
Xu, Shuo
62318964fa
Update llama example information ( #12640 )
...
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2025-01-02 13:48:39 +08:00
Yishuo Wang
81211fd010
remove unused code ( #12635 )
2025-01-02 13:31:09 +08:00
binbin Deng
534566e290
[NPU] Support minicpm-v with python cpp backend ( #12637 )
2025-01-02 11:13:15 +08:00
Yishuo Wang
f289f68d57
small fix ( #12634 )
2024-12-30 17:14:25 +08:00
Yishuo Wang
2d08155513
remove bmm, which is only required in ipex 2.0 ( #12630 )
2024-12-27 17:28:57 +08:00
binbin Deng
f17ccfa61a
[NPU] Fix save-load usage of minicpm models ( #12628 )
2024-12-27 15:56:46 +08:00
Yishuo Wang
c72a5db757
remove unused code again ( #12624 )
2024-12-27 14:17:11 +08:00
binbin Deng
46eeab4479
[NPU] Fix regression caused by layer_norm change ( #12627 )
2024-12-27 14:08:49 +08:00
Ruonan Wang
90f6709486
[remove pipeline examples ( #12626 )
2024-12-27 13:42:28 +08:00
Zijie Li
5f04ed7254
NPU] Update prompt format for baichuan2-pipeline ( #12625 )
2024-12-27 11:30:54 +08:00
Yishuo Wang
34dbdb8ee3
small fix ( #12623 )
2024-12-27 10:19:27 +08:00