Xu, Shuo
b0338c5529
Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internvl2 ( #12583 )
...
* Add --modelscope option for glm-v4 and MiniCPM-V-2_6
* glm-edge
* minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-20 13:54:17 +08:00
Yishuo Wang
f3b5fad3be
refactor qwen2 and llama3 ( #12587 )
2024-12-20 13:25:25 +08:00
Xu, Shuo
47da3c999f
Add --modelscope in GPU examples for minicpm, minicpm3, baichuan2 ( #12564 )
...
* Add --modelscope for more models
* minicpm
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-19 17:25:46 +08:00
Yishuo Wang
3eeb02f1be
support Megrez-3B-Omni ( #12582 )
2024-12-19 17:23:01 +08:00
binbin Deng
4e7e988f70
[NPU] Fix MTL and ARL support ( #12580 )
2024-12-19 16:55:30 +08:00
Yishuo Wang
80f2fdc37b
optimize new minicpm model ( #12579 )
2024-12-19 14:22:47 +08:00
Yishuo Wang
4540424271
optimize siglip attention again ( #12578 )
2024-12-19 13:40:48 +08:00
Yishuo Wang
e0921f80c1
padding mask on torch side ( #12577 )
2024-12-19 10:53:02 +08:00
Xu, Shuo
47e90a362f
Add --modelscope in GPU examples for glm4, codegeex2, qwen2 and qwen2.5 ( #12561 )
...
* Add --modelscope for more models
* imporve readme
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-19 10:00:39 +08:00
Yishuo Wang
e2ae42929a
small fix ( #12573 )
2024-12-18 15:48:22 +08:00
Yishuo Wang
a4eb561f36
optimize siglip attention on arc ( #12569 )
2024-12-18 14:19:43 +08:00
Zijie Li
1a2ab12876
[NPU] support asym_int4 for minicpm ( #12567 )
2024-12-18 10:55:35 +08:00
Yuwen Hu
6278cafc25
Add setuptools as a basic dependency ( #12563 )
...
* Add setuptools as a basic dependency
* Remove unnecessary requirements of setuptools in example/unit/nightly tests
2024-12-17 16:56:41 +08:00
Zijie Li
fcb474820d
[NPU] support asym_int4 for llama ( #12556 )
...
* add llama-imatrix
* fix bugs in llama.py
* style fix
2024-12-17 14:01:17 +08:00
Yishuo Wang
a608f26cc8
use new fused layer norm ( #12553 )
2024-12-17 13:52:35 +08:00
binbin Deng
680ea7e4a8
[NPU doc] Update configuration for different platforms ( #12554 )
2024-12-17 10:15:09 +08:00
Xu, Shuo
ccc18eefb5
Add Modelscope option for chatglm3 on GPU ( #12545 )
...
* Add Modelscope option for GPU model chatglm3
* Update readme
* Update readme
* Update readme
* Update readme
* format update
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-12-16 20:00:37 +08:00
Yishuo Wang
5ae0006103
remove old rope usage ( #12552 )
2024-12-16 15:59:36 +08:00
Chu,Youcheng
a86487c539
Add GLM-Edge GPU example ( #12483 )
...
* feat: initial commit
* generate.py and README updates
* Update link for main readme
* Update based on comments
* Small fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-12-16 14:39:19 +08:00
Jun Wang
0b953e61ef
[REFINE] graphmode code ( #12540 )
2024-12-16 09:17:01 +08:00
binbin Deng
caf15cc5ef
[NPU] Add IPEX_LLM_NPU_MTL to enable support on mtl ( #12543 )
2024-12-13 17:01:13 +08:00
Yishuo Wang
c090d167dc
remove old rope usage ( #12544 )
2024-12-13 16:54:58 +08:00
binbin Deng
d20a968ce2
[NPU] Fix generate example ( #12541 )
2024-12-13 14:07:24 +08:00
Yishuo Wang
15219944b8
optimize glm edge again ( #12539 )
2024-12-13 13:52:39 +08:00
binbin Deng
6596c18489
[NPU] Modify IPEX_LLM_NPU_DISABLE_COMPILE_OPT setting for long input ( #12537 )
2024-12-13 13:49:56 +08:00
Ruonan Wang
7cc01fdc86
[NPU] further fix of new_value_states ( #12538 )
2024-12-13 13:42:00 +08:00
Heyang Sun
fa261b8af1
torch 2.3 inference docker ( #12517 )
...
* torch 2.3 inference docker
* Update README.md
* add convert code
* rename image
* remove 2.1 and add graph example
* Update README.md
2024-12-13 10:47:04 +08:00
binbin Deng
f36c23664f
[NPU] Fix abnormal output with latest driver ( #12530 )
2024-12-12 17:56:30 +08:00
Yishuo Wang
ffce86d69f
add basic glm-edge-v support ( #12533 )
2024-12-12 17:25:48 +08:00
Yishuo Wang
3e0823d2ae
add basic glm-edge support ( #12531 )
2024-12-12 16:02:22 +08:00
Yuwen Hu
dbaf4abcb3
[NPU] Update C++ example with repetition_penalty & update Python code accordingly ( #12528 )
...
* Update c++ npu examples with repetition penalty
* Fit python with updated C++ API
* Style fix
* Small fix
* Small fix
2024-12-12 13:42:55 +08:00
Shaojun Liu
2cce89691a
Enable use_batch_forward Optimization on Battlemage GPU ( #12516 )
...
* Update get_xpu_device_type() to support bmg
* enable use_batch_forward for bmg
* Update low_bit_linear.py
* Update utils.py
* use batch kernel for fp8e5
2024-12-12 12:44:36 +08:00
binbin Deng
6fc27da9c1
[NPU] Update glm-edge support in docs ( #12529 )
2024-12-12 11:14:09 +08:00
binbin Deng
509bdb4661
[NPU] Fix minicpm-2B error ( #12527 )
2024-12-11 16:49:32 +08:00
Xu, Shuo
fd9cf767ed
All-in-one Benchmark run.py: Ignore error if import BenchmarkWrapper failed. ( #12526 )
2024-12-11 16:20:55 +08:00
Ruonan Wang
41ef4974ab
[NPU] fix transpose_value = False for NPU optimize_model=True ( #12525 )
2024-12-11 15:51:39 +08:00
Ruonan Wang
588bfa24dc
support hqq ( #12518 )
...
* support
* fix
2024-12-11 15:43:02 +08:00
Yuwen Hu
68f2873bd3
[NPU] Support repetition penalty for simple generate, Python (cpp backend) ( #12522 )
...
* Initial support of repetition penalty on NPU (cpp backend) for simple generate
* Bug fix for generation config and others
* Remove unnecessary print and style fix
* Remove unnecessary print
* Fix based on comments
2024-12-11 14:55:25 +08:00
Yishuo Wang
77404d2a63
support new model ( #12523 )
2024-12-11 13:41:15 +08:00
binbin Deng
ea55235cbd
[NPU] Support glm-edge models ( #12511 )
2024-12-09 14:06:27 +08:00
binbin Deng
12c78978dd
[NPU C++] Update example with conversation mode support ( #12510 )
2024-12-06 12:46:37 +08:00
Yuwen Hu
0918d3baca
[NPU] Fix hf generate with save/load generation config for Python (cpp backend) ( #12509 )
...
* Fix hf generate with save/load generation config
* Small fix
* Fix based on comments
2024-12-05 19:19:58 +08:00
Ruonan Wang
49ab8974fa
[NPU] initial support of asym_int4_rtn ( #12484 )
...
* initiail support of q4_1
* fix
* fix
* update
* update min to Z1
* update
* fix
* update
* fix style
* fix
* support qwen2 optimize_model=True mp version
* temp save
* fix
* fix style
* replace min with zero
* support split linear for q4_1
* fix lm_head with mixed_precision=True
* fix style
* revert test code
* add down proj back for q4_0
* remove print
2024-12-05 17:40:36 +08:00
Jinhe
5e1416c9aa
fix readme for npu cpp examples and llama.cpp ( #12505 )
...
* fix cpp readme
* fix cpp readme
* fix cpp readme
2024-12-05 12:32:42 +08:00
binbin Deng
f56a111aa2
[NPU] Fix load-low-bit benchmark script ( #12502 )
2024-12-05 10:01:32 +08:00
Yuwen Hu
84f1c4ad57
Small fix for NPU Python cpp simple generate regarding eos tokens ( #12501 )
2024-12-04 18:54:06 +08:00
Kai Huang
d8b14a6305
Update save/load comments ( #12500 )
2024-12-04 18:51:38 +08:00
Kai Huang
b89ea1b0cf
Support save/load model for hf generate ( #12499 )
...
* change dummy model
* style
* meet review
2024-12-04 18:26:39 +08:00
Kai Huang
7d27f134dd
Fix hf generate for llama3.2 ( #12497 )
...
* fix kv condition]
* meet review
2024-12-04 17:54:40 +08:00
Chu,Youcheng
ffa9a9e1b3
Update streaming in npu examples ( #12495 )
...
* feat: add streaming
* Update readme accordingly
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-12-04 17:51:10 +08:00