Commit graph

14 commits

Author SHA1 Message Date
Yishuo Wang
170e3d65e0
use new sdp and fp32 sdp (#11007) 2024-05-14 14:29:18 +08:00
Cengguang Zhang
cfed76b2ed
LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. (#10937)
* LLM: add split tensor support for baichuan2-7b and qwen1.5-7b.

* fix style.

* fix style.

* fix style.

* add support for mistral and fix condition threshold.

* fix  style.

* fix comments.
2024-05-10 16:40:15 +08:00
Yishuo Wang
d884c62dc4
remove new_layout parameter (#10906) 2024-04-29 10:31:50 +08:00
Yina Chen
dc27b3bc35
Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) (#10790)
* update sdp condition

* update

* fix

* update & test llama

* mistral

* fix style

* update

* fix style

* remove pvc constrain

* update ds on arc

* fix style
2024-04-24 17:24:01 +08:00
Xin Qiu
e764f9b1b1
Disable fast fused rope on UHD (#10780)
* use decoding fast path

* update

* update

* cleanup
2024-04-18 10:03:53 +08:00
Cengguang Zhang
3e2662c87e
LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771) 2024-04-16 09:32:30 +08:00
Yishuo Wang
8086554d33
use new fp16 sdp in llama and mistral (#10734) 2024-04-12 10:49:02 +08:00
Keyan (Kyrie) Zhang
585c174e92
Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707)
* Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables.

* Fix style
2024-04-10 10:48:46 +08:00
Yina Chen
c7422712fc
mistral 4.36 use fp16 sdp (#10704) 2024-04-09 13:50:33 +08:00
Yang Wang
5a1f446d3c
support fp8 in xetla (#10555)
* support fp8 in xetla

* change name

* adjust model file

* support convert back to cpu

* factor

* fix bug

* fix style
2024-04-08 13:22:09 -07:00
Jiao Wang
69bdbf5806
Fix vllm print error message issue (#10664)
* update chatglm readme

* Add condition to invalidInputError

* update

* update

* style
2024-04-05 15:08:13 -07:00
binbin Deng
0a3e4e788f
LLM: fix mistral hidden_size setting for deepspeed autotp (#10527) 2024-03-26 10:55:44 +08:00
Xin Qiu
1dd40b429c
enable fp4 fused mlp and qkv (#10531)
* enable fp4 fused mlp and qkv

* update qwen

* update qwen2
2024-03-26 08:34:00 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm

* rm python/llm/src/bigdl

* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00
Renamed from python/llm/src/bigdl/llm/transformers/models/mistral.py (Browse further)