Yishuo Wang
|
46ba962168
|
use new quantize kv (#10888)
|
2024-04-26 14:42:17 +08:00 |
|
Xin Qiu
|
e764f9b1b1
|
Disable fast fused rope on UHD (#10780)
* use decoding fast path
* update
* update
* cleanup
|
2024-04-18 10:03:53 +08:00 |
|
Cengguang Zhang
|
3e2662c87e
|
LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771)
|
2024-04-16 09:32:30 +08:00 |
|
Keyan (Kyrie) Zhang
|
585c174e92
|
Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707)
* Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables.
* Fix style
|
2024-04-10 10:48:46 +08:00 |
|
Yang Wang
|
5a1f446d3c
|
support fp8 in xetla (#10555)
* support fp8 in xetla
* change name
* adjust model file
* support convert back to cpu
* factor
* fix bug
* fix style
|
2024-04-08 13:22:09 -07:00 |
|
Xin Qiu
|
1dd40b429c
|
enable fp4 fused mlp and qkv (#10531)
* enable fp4 fused mlp and qkv
* update qwen
* update qwen2
|
2024-03-26 08:34:00 +08:00 |
|
Wang, Jian4
|
9df70d95eb
|
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
|
2024-03-22 15:41:21 +08:00 |
|