Zhao Changmin
|
cf8eb7b128
|
Init NPU quantize method and support q8_0_rtn (#11452)
* q8_0_rtn
* fix float point
|
2024-07-01 13:45:07 +08:00 |
|
Yishuo Wang
|
319a3b36b2
|
fix npu llama2 (#11471)
|
2024-07-01 10:14:11 +08:00 |
|
Yishuo Wang
|
029ff15d28
|
optimize npu llama2 first token performance (#11451)
|
2024-06-27 17:37:33 +08:00 |
|
Yishuo Wang
|
f89ca23748
|
optimize npu llama2 perf again (#11445)
|
2024-06-27 15:13:42 +08:00 |
|
Yishuo Wang
|
ca0e69c3a7
|
optimize npu llama perf again (#11431)
|
2024-06-26 10:52:54 +08:00 |
|
Yishuo Wang
|
9f6e5b4fba
|
optimize llama npu perf (#11426)
|
2024-06-25 17:43:20 +08:00 |
|