Yishuo Wang
|
46ba962168
|
use new quantize kv (#10888)
|
2024-04-26 14:42:17 +08:00 |
|
Cengguang Zhang
|
3e2662c87e
|
LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771)
|
2024-04-16 09:32:30 +08:00 |
|
Keyan (Kyrie) Zhang
|
585c174e92
|
Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707)
* Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables.
* Fix style
|
2024-04-10 10:48:46 +08:00 |
|
Xin Qiu
|
1274cba79b
|
stablelm fp8 kv cache (#10672)
* stablelm fp8 kvcache
* update
* fix
* change to fp8 matmul
* fix style
* fix
* fix
* meet code review
* add comment
|
2024-04-08 15:16:46 +08:00 |
|
Xin Qiu
|
4c3e493b2d
|
fix stablelm2 1.6b (#10656)
* fix stablelm2 1.6b
* meet code review
|
2024-04-03 22:15:32 +08:00 |
|
Xin Qiu
|
3a9ab8f1ae
|
fix stablelm logits diff (#10636)
* fix logits diff
* Small fixes
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
|
2024-04-03 15:08:12 +08:00 |
|
Yuwen Hu
|
fd384ddfb8
|
Optimize StableLM (#10619)
* Initial commit for stablelm optimizations
* Small style fix
* add dependency
* Add mlp optimizations
* Small fix
* add attention forward
* Remove quantize kv for now as head_dim=80
* Add merged qkv
* fix lisence
* Python style fix
---------
Co-authored-by: qiuxin2012 <qiuxin2012cs@gmail.com>
|
2024-04-02 18:58:38 +08:00 |
|