Commit graph

7 commits

Author SHA1 Message Date
Zhao Changmin
6a0134a9b2
support q4_0_rtn (#11477)
* q4_0_rtn
2024-07-02 16:57:02 +08:00
Zhao Changmin
cf8eb7b128
Init NPU quantize method and support q8_0_rtn (#11452)
* q8_0_rtn

* fix float point
2024-07-01 13:45:07 +08:00
Yishuo Wang
ca0e69c3a7
optimize npu llama perf again (#11431) 2024-06-26 10:52:54 +08:00
Yishuo Wang
9f6e5b4fba
optimize llama npu perf (#11426) 2024-06-25 17:43:20 +08:00
Yishuo Wang
a5e7d93242
Add initial save/load low bit support for NPU(now only fp16 is supported) (#11359) 2024-06-20 10:49:39 +08:00
Yishuo Wang
ae7b662ed2
add fp16 NPU Linear support and fix intel_npu_acceleration_library version 1.0 support (#11352) 2024-06-19 09:14:59 +08:00
Yishuo Wang
83082e5cc7
add initial support for intel npu acceleration library (#11347) 2024-06-18 16:07:16 +08:00