Yishuo Wang
ddc0ef3993
refactor device check and remove cohere/mixtral support ( #12659 )
2025-01-07 11:15:51 +08:00
Yuwen Hu
8fdc36c140
Optimize with new batch kernel when batch_size=1 on LNL ( #12419 )
...
* Add use batch kernel condition for LNL
* Fix for other device judgement
* Fix based on comment
2024-11-21 16:21:35 +08:00
Yuwen Hu
a69395f31f
Support performance mode of GLM4 model ( #12401 )
...
* Initial support of prepare generation args for transformers 445
* Small fix to chatglm4 model optimization
* Small fix
* fix glm4 position id
* fix glm4 error
* Small change in conditon & fix based on comments
* Style fixes
---------
Co-authored-by: cyita <yitastudy@gmail.com>
2024-11-18 18:46:52 +08:00
Yuwen Hu
6eb55653ba
Performance mode strategy update for input_embeds input ( #11997 )
2024-09-03 17:46:16 +08:00
Yuwen Hu
659d15defc
Fix wrong attention mask and garbage output for inputs_embeds inputs during lookup generation ( #11989 )
...
* Fix garbage output for input_embeds inputs during lookup generation
* Fix on sliding windows
* Simplify code
2024-09-02 19:09:12 +08:00
Yuwen Hu
c1d07bc626
Support streaming for lookup generation ( #11922 )
...
* Support streaming for lookup generation
* Small update
* Style fixes
* Add origin generate full back for batch inference and beam search; support input length threshold judgement for directly input with input_ids
* Fix lookup stream generate with eos token
* Small fixes
* Small fix
* index fix
* Small fix
2024-08-26 19:33:31 +08:00
Yuwen Hu
24c279e0ae
Update IPEX_LLM_PERFORMANCE_MODE with input length threshold ( #11908 )
...
* Update IPEX_LLM_PERFORMANCE_MODE with input length threshold
* Update based on comments. And and judgement for inputs_embeds
* Fix for benchmarking purposes
* Update based on comments
* Small fix
2024-08-23 20:49:15 +08:00
Yuwen Hu
420ce7d164
Fix non-stop at eos token problem for lookup generation ( #11896 )
...
* Fix non-stop by eos_token_id problem for lookup
* Small fix
* Add judgement when generation_config.eos_token_id is None
* Fix based on comments
2024-08-22 18:55:59 +08:00
Yuwen Hu
356281cb80
Further all-in-one benchmark update continuation task ( #11784 )
...
* Further update prompt for continuation task, and disable lookup candidate update strategy on MTL
* style fix
2024-08-14 14:39:34 +08:00
Ruonan Wang
8db34057b4
optimize lookahead init time ( #11769 )
2024-08-12 17:19:12 +08:00
Ruonan Wang
66fe2ee464
initial support of IPEX_LLM_PERFORMANCE_MODE ( #11754 )
...
* add perf mode
* update
* fix style
2024-08-09 19:04:09 +08:00
Zhao Changmin
b7948671de
[WIP] Add look up table in 1st token stage ( #11193 )
...
* lookuptb
2024-06-07 10:51:05 +08:00
hxsz1997
93d40ab127
Update lookahead strategy ( #11021 )
...
* update lookahead strategy
* remove lines
* fix python style check
2024-05-15 14:48:05 +08:00
Yina Chen
015d07a58f
Fix lookahead sample error & add update strategy ( #10894 )
...
* Fix sample error & add update strategy
* add mtl config
* fix style
* remove print
2024-04-28 17:21:00 +08:00
Yina Chen
3daad242b8
Fix No module named 'transformers.cache_utils' with transformers < 4.36 ( #10835 )
...
* update sdp condition
* update
* fix
* fix 431 error
* revert sdp & style fix
* fix
* meet comments
2024-04-22 14:05:50 +08:00
Yina Chen
ea5b373a97
Add lookahead GPU example ( #10785 )
...
* Add lookahead example
* fix style & attn mask
* fix typo
* address comments
2024-04-17 17:41:55 +08:00
Yina Chen
899d392e2f
Support prompt lookup in ipex-llm ( #10768 )
...
* lookup init
* add lookup
* fix style
* remove redundant code
* change param name
* fix style
2024-04-16 16:52:38 +08:00