Yang Wang
|
51bcac1229
|
follow up on experimental support of fused decoder layer for llama2 (#11785)
* clean up and support transpose value cache
* refine
* fix style
* fix style
|
2024-08-13 18:53:55 -07:00 |
|
binbin Deng
|
23d3acdc77
|
Add experimental support of fused decoder layer for llama2 (#11768)
|
2024-08-13 14:41:36 +08:00 |
|
Jin, Qiao
|
05989ad0f9
|
Update npu example and all in one benckmark (#11766)
|
2024-08-12 16:46:46 +08:00 |
|
Jin, Qiao
|
a44ab32153
|
Switch to conhost when running on NPU (#11687)
|
2024-07-30 17:08:06 +08:00 |
|
Zhao Changmin
|
06745e5742
|
Add npu benchmark all-in-one script (#11571)
* npu benchmark
|
2024-07-15 10:42:37 +08:00 |
|
Zhao Changmin
|
b9c66994a5
|
add npu sdp (#11562)
|
2024-07-11 16:57:35 +08:00 |
|
Zhao Changmin
|
3c16c9f725
|
Optimize baichuan on NPU (#11548)
* baichuan_npu
|
2024-07-10 13:18:48 +08:00 |
|
Zhao Changmin
|
76a5802acf
|
update NPU examples (#11540)
* update NPU examples
|
2024-07-09 17:19:42 +08:00 |
|