Yishuo Wang
|
1a8bab172e
|
add minicpm 1B/2B npu support (#11507)
|
2024-07-04 16:31:04 +08:00 |
|
Yishuo Wang
|
bb0a84044b
|
add qwen2 npu support (#11504)
|
2024-07-04 11:01:25 +08:00 |
|
Yishuo Wang
|
ec3a912ab6
|
optimize npu llama long context performance (#11478)
|
2024-07-01 16:49:23 +08:00 |
|
Zhao Changmin
|
cf8eb7b128
|
Init NPU quantize method and support q8_0_rtn (#11452)
* q8_0_rtn
* fix float point
|
2024-07-01 13:45:07 +08:00 |
|
Yishuo Wang
|
319a3b36b2
|
fix npu llama2 (#11471)
|
2024-07-01 10:14:11 +08:00 |
|
Yishuo Wang
|
029ff15d28
|
optimize npu llama2 first token performance (#11451)
|
2024-06-27 17:37:33 +08:00 |
|
Yishuo Wang
|
f89ca23748
|
optimize npu llama2 perf again (#11445)
|
2024-06-27 15:13:42 +08:00 |
|
Yishuo Wang
|
ca0e69c3a7
|
optimize npu llama perf again (#11431)
|
2024-06-26 10:52:54 +08:00 |
|
Yishuo Wang
|
9f6e5b4fba
|
optimize llama npu perf (#11426)
|
2024-06-25 17:43:20 +08:00 |
|