Zhao Changmin
|
cf8eb7b128
|
Init NPU quantize method and support q8_0_rtn (#11452)
* q8_0_rtn
* fix float point
|
2024-07-01 13:45:07 +08:00 |
|
Yishuo Wang
|
ca0e69c3a7
|
optimize npu llama perf again (#11431)
|
2024-06-26 10:52:54 +08:00 |
|
Yishuo Wang
|
9f6e5b4fba
|
optimize llama npu perf (#11426)
|
2024-06-25 17:43:20 +08:00 |
|
Yishuo Wang
|
a5e7d93242
|
Add initial save/load low bit support for NPU(now only fp16 is supported) (#11359)
|
2024-06-20 10:49:39 +08:00 |
|
Yishuo Wang
|
ae7b662ed2
|
add fp16 NPU Linear support and fix intel_npu_acceleration_library version 1.0 support (#11352)
|
2024-06-19 09:14:59 +08:00 |
|
Yishuo Wang
|
83082e5cc7
|
add initial support for intel npu acceleration library (#11347)
|
2024-06-18 16:07:16 +08:00 |
|