Commit graph

17 commits

Author SHA1 Message Date
Yishuo Wang
ddc0ef3993
refactor device check and remove cohere/mixtral support (#12659) 2025-01-07 11:15:51 +08:00
Yuwen Hu
8fdc36c140
Optimize with new batch kernel when batch_size=1 on LNL (#12419)
* Add use batch kernel condition for LNL

* Fix for other device judgement

* Fix based on comment
2024-11-21 16:21:35 +08:00
Yuwen Hu
a69395f31f
Support performance mode of GLM4 model (#12401)
* Initial support of prepare generation args for transformers 445

* Small fix to chatglm4 model optimization

* Small fix

* fix glm4 position id

* fix glm4 error

* Small change in conditon & fix based on comments

* Style fixes

---------

Co-authored-by: cyita <yitastudy@gmail.com>
2024-11-18 18:46:52 +08:00
Yuwen Hu
6eb55653ba
Performance mode strategy update for input_embeds input (#11997) 2024-09-03 17:46:16 +08:00
Yuwen Hu
659d15defc
Fix wrong attention mask and garbage output for inputs_embeds inputs during lookup generation (#11989)
* Fix garbage output for input_embeds inputs during lookup generation

* Fix on sliding windows

* Simplify code
2024-09-02 19:09:12 +08:00
Yuwen Hu
c1d07bc626
Support streaming for lookup generation (#11922)
* Support streaming for lookup generation

* Small update

* Style fixes

* Add origin generate full back for batch inference and beam search; support input length threshold judgement for directly input with input_ids

* Fix lookup stream generate with eos token

* Small fixes

* Small fix

* index fix

* Small fix
2024-08-26 19:33:31 +08:00
Yuwen Hu
24c279e0ae
Update IPEX_LLM_PERFORMANCE_MODE with input length threshold (#11908)
* Update IPEX_LLM_PERFORMANCE_MODE with input length threshold

* Update based on comments. And and judgement for inputs_embeds

* Fix for benchmarking purposes

* Update based on comments

* Small fix
2024-08-23 20:49:15 +08:00
Yuwen Hu
420ce7d164
Fix non-stop at eos token problem for lookup generation (#11896)
* Fix non-stop by eos_token_id problem for lookup

* Small fix

* Add judgement when generation_config.eos_token_id is None

* Fix based on comments
2024-08-22 18:55:59 +08:00
Yuwen Hu
356281cb80
Further all-in-one benchmark update continuation task (#11784)
* Further update prompt for continuation task, and disable lookup candidate update strategy on MTL

* style fix
2024-08-14 14:39:34 +08:00
Ruonan Wang
8db34057b4
optimize lookahead init time (#11769) 2024-08-12 17:19:12 +08:00
Ruonan Wang
66fe2ee464
initial support of IPEX_LLM_PERFORMANCE_MODE (#11754)
* add perf mode

* update

* fix style
2024-08-09 19:04:09 +08:00
Zhao Changmin
b7948671de
[WIP] Add look up table in 1st token stage (#11193)
* lookuptb
2024-06-07 10:51:05 +08:00
hxsz1997
93d40ab127
Update lookahead strategy (#11021)
* update lookahead strategy

* remove lines

* fix python style check
2024-05-15 14:48:05 +08:00
Yina Chen
015d07a58f
Fix lookahead sample error & add update strategy (#10894)
* Fix sample error & add update strategy

* add mtl config

* fix style

* remove print
2024-04-28 17:21:00 +08:00
Yina Chen
3daad242b8
Fix No module named 'transformers.cache_utils' with transformers < 4.36 (#10835)
* update sdp condition

* update

* fix

* fix 431 error

* revert sdp & style fix

* fix

* meet comments
2024-04-22 14:05:50 +08:00
Yina Chen
ea5b373a97
Add lookahead GPU example (#10785)
* Add lookahead example

* fix style & attn mask

* fix typo

* address comments
2024-04-17 17:41:55 +08:00
Yina Chen
899d392e2f
Support prompt lookup in ipex-llm (#10768)
* lookup init

* add lookup

* fix style

* remove redundant code

* change param name

* fix style
2024-04-16 16:52:38 +08:00