Commit graph

18 commits

Author SHA1 Message Date
Yuwen Hu
a69395f31f
Support performance mode of GLM4 model (#12401)
* Initial support of prepare generation args for transformers 445

* Small fix to chatglm4 model optimization

* Small fix

* fix glm4 position id

* fix glm4 error

* Small change in conditon & fix based on comments

* Style fixes

---------

Co-authored-by: cyita <yitastudy@gmail.com>
2024-11-18 18:46:52 +08:00
Yina Chen
cc27321441
support chatglm4 in lookup (#11855) 2024-08-21 15:53:17 +08:00
Yina Chen
4b9c57cc60
Support compress kv with lookahead (#11752)
* support compress kv with lookahead

* enough kv miss param
2024-08-09 17:39:57 +08:00
Zhao Changmin
b7948671de
[WIP] Add look up table in 1st token stage (#11193)
* lookuptb
2024-06-07 10:51:05 +08:00
Wang, Jian4
3209d6b057
Fix spculative llama3 no stop error (#10963)
* fix normal

* add eos_tokens_id on sp and add list if

* update

* no none
2024-05-08 17:09:47 +08:00
Wang, Jian4
0e0bd309e2
LLM: Enable Speculative on Fastchat (#10909)
* init

* enable streamer

* update

* update

* remove deprecated

* update

* update

* add gpu example
2024-05-06 10:06:20 +08:00
ZehuaCao
92ea54b512
Fix speculative decoding bug (#10855) 2024-04-23 14:28:31 +08:00
Yina Chen
3daad242b8
Fix No module named 'transformers.cache_utils' with transformers < 4.36 (#10835)
* update sdp condition

* update

* fix

* fix 431 error

* revert sdp & style fix

* fix

* meet comments
2024-04-22 14:05:50 +08:00
Yishuo Wang
57edf2033c
fix lookahead with transformers >= 4.36 (#10808) 2024-04-19 16:24:56 +08:00
Ovo233
1a885020ee
Updated importing of top_k_top_p_filtering for transformers>=4.39.0 (#10794)
* In transformers>=4.39.0, the top_k_top_p_filtering function has been deprecated and moved to the hugging face package trl. Thus, for versions >= 4.39.0, import this function from trl.
2024-04-19 15:34:39 +08:00
Yina Chen
ea5b373a97
Add lookahead GPU example (#10785)
* Add lookahead example

* fix style & attn mask

* fix typo

* address comments
2024-04-17 17:41:55 +08:00
Yina Chen
766fe45222
Fix spec error caused by lookup pr (#10777)
* Fix spec error

* remove

* fix style
2024-04-17 11:27:35 +08:00
Yina Chen
899d392e2f
Support prompt lookup in ipex-llm (#10768)
* lookup init

* add lookup

* fix style

* remove redundant code

* change param name

* fix style
2024-04-16 16:52:38 +08:00
Wang, Jian4
47cabe8fcc
LLM: Fix no return_last_logit running bigdl_ipex chatglm3 (#10678)
* fix no return_last_logits

* update only for chatglm
2024-04-07 15:27:58 +08:00
ZehuaCao
52a2135d83
Replace ipex with ipex-llm (#10554)
* fix ipex with ipex_llm

* fix ipex with ipex_llm

* update

* update

* update

* update

* update

* update

* update

* update
2024-03-28 13:54:40 +08:00
Xiangyu Tian
51d34ca68e
Fix wrong import in speculative (#10562) 2024-03-27 18:21:07 +08:00
Xiangyu Tian
11550d3f25
LLM: Add length check for IPEX-CPU speculative decoding (#10529)
Add length check for IPEX-CPU speculative decoding.
2024-03-26 17:47:10 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm

* rm python/llm/src/bigdl

* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00
Renamed from python/llm/src/bigdl/llm/transformers/speculative.py (Browse further)