Commit graph

10 commits

Author SHA1 Message Date
binbin Deng
23d3acdc77
Add experimental support of fused decoder layer for llama2 (#11768) 2024-08-13 14:41:36 +08:00
Zhao Changmin
f7e957aaf9
Clean npu dtype branch (#11515)
* clean branch

* create_npu_kernels
2024-07-05 15:45:26 +08:00
Zhao Changmin
57b8adb189
[WIP] Support npu load_low_bit method (#11502)
* npu_load_low_bit
2024-07-04 17:15:34 +08:00
Zhao Changmin
6a0134a9b2
support q4_0_rtn (#11477)
* q4_0_rtn
2024-07-02 16:57:02 +08:00
Zhao Changmin
cf8eb7b128
Init NPU quantize method and support q8_0_rtn (#11452)
* q8_0_rtn

* fix float point
2024-07-01 13:45:07 +08:00
Yishuo Wang
ca0e69c3a7
optimize npu llama perf again (#11431) 2024-06-26 10:52:54 +08:00
Yishuo Wang
9f6e5b4fba
optimize llama npu perf (#11426) 2024-06-25 17:43:20 +08:00
Yishuo Wang
a5e7d93242
Add initial save/load low bit support for NPU(now only fp16 is supported) (#11359) 2024-06-20 10:49:39 +08:00
Yishuo Wang
ae7b662ed2
add fp16 NPU Linear support and fix intel_npu_acceleration_library version 1.0 support (#11352) 2024-06-19 09:14:59 +08:00
Yishuo Wang
83082e5cc7
add initial support for intel npu acceleration library (#11347) 2024-06-18 16:07:16 +08:00