binbin Deng
01099f08ee
Revert prefill logic of qwen2-7b ( #11992 )
2024-09-03 14:45:01 +08:00
Yuwen Hu
659d15defc
Fix wrong attention mask and garbage output for inputs_embeds inputs during lookup generation ( #11989 )
...
* Fix garbage output for input_embeds inputs during lookup generation
* Fix on sliding windows
* Simplify code
2024-09-02 19:09:12 +08:00
binbin Deng
2f3d1bd0ec
hotfix qwen2-7b weight setting ( #11991 )
2024-09-02 18:11:08 +08:00
binbin Deng
a40ea7038d
Fix AttributeError of qwen2-1.5B ( #11990 )
2024-09-02 17:55:10 +08:00
Yang Wang
c48817bd43
Support Qwen2-7b MLP in int4 and transpose_value_cache=True ( #11968 )
2024-09-02 14:37:44 +08:00
Ruonan Wang
573c20bae6
fix npu lm_head cpu condition ( #11976 )
...
* fix
* fix
* fix
* fix stype
* fix style
* fix style
2024-08-30 17:11:26 +08:00
Ruonan Wang
60aa1a2c0f
Initial NPU support for MiniCPM-V-2_6 ( #11966 )
...
* initial pr
* update npu model
* fix
* fix kv cache type
* fix
* small fix
* fix style
* fix model id
* change inter_pp=4
* address comment
* fix
* fix style
* fix
* rebase
2024-08-30 16:34:35 +08:00
SONG Ge
158289d205
[NPU] Add initial support for minicpm-llama-v2.5 ( #11962 )
...
* add initial support for minicpm-llama-v2.5
* update impl
* add minicpm-llama3-v2.5 example
2024-08-30 16:00:33 +08:00
binbin Deng
cd077881f1
Disable lm head ( #11972 )
2024-08-30 11:05:18 +08:00
Wang, Jian4
7d103417b8
Fix glm4-9b-chat nan error on vllm 0.3.3 ( #11970 )
...
* fix nan value
* update
2024-08-30 09:50:18 +08:00
Yang Wang
fbf088f61e
remove obselete npu code ( #11967 )
2024-08-29 14:16:44 -07:00
Yuwen Hu
a9e485eb1b
Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer ( #11963 )
...
* Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer
* Style fixes
2024-08-29 19:22:09 +08:00
Yina Chen
882f4a5ff7
Add lnl npu driver recommend version and enable cpu_lm_head on llama3 ( #11952 )
...
* update lnl npu driver version and enable cpu_lm_head on llama3
* update
* fix style
* typo
* address comments
* update
* add qwen2-7b
2024-08-29 15:01:18 +08:00
binbin Deng
71f03dcc39
Support qwen2-7b with fused decoderlayer optimization on NPU ( #11912 )
2024-08-29 13:34:20 +08:00
Jiao Wang
63ac5f64bb
Refactor NPU baichuan multiple-process ( #11945 )
...
* update
* add baichuan mp
* clean
* refactor
* merge
* style
* update
* update
2024-08-28 11:33:40 -07:00
SONG Ge
5ca7390082
[NPU] Add minicpm-2b support for npu multi-processing ( #11949 )
...
* add minicpm-2b support
* update example for minicpm-2b
* add LNL NPU driver requirement in readme
2024-08-28 18:08:49 +08:00
Yishuo Wang
0fbb10259a
use sdp_causal to reduce internvl2-4b memory usage if set environment variable ( #11953 )
2024-08-28 17:35:05 +08:00
Guancheng Fu
0a7bd274e2
Add vllm awq loading logic ( #11950 )
...
* add vllm awq loading logic
* fix
* refine
2024-08-28 16:46:18 +08:00
Yina Chen
b38fb67bec
[NPU] lm head to cpu ( #11943 )
...
* lm head to cpu
* qwen2
* mv logic and add param to disable cpu_lm_head
* use env and lm_head opt to mp file
* fix
* update
* remove print
2024-08-28 16:34:07 +08:00
binbin Deng
bec00e2015
Improve baichuan2 NPU performance ( #11942 )
2024-08-27 18:37:08 +08:00
Zijie Li
90f692937d
Update npu baichuan2 ( #11939 )
2024-08-27 16:56:26 +08:00
Jiao Wang
b4b6ddf73c
NPU Baichuan2 Multi- Process example ( #11928 )
2024-08-27 15:25:49 +08:00
SONG Ge
e211a5b076
update minicpm to meet latest refactor ( #11937 )
2024-08-27 15:08:01 +08:00
binbin Deng
7c8c9a0670
Update benchmark script for NPU ( #11932 )
2024-08-27 14:41:14 +08:00
Zijie Li
6c3eb1e1e8
refactor from_pretrained API for NPU ( #11927 )
2024-08-27 09:50:30 +08:00
Xiangyu Tian
7ca557aada
LLM: Fix vLLM CPU convert error ( #11926 )
2024-08-27 09:22:19 +08:00
Yuwen Hu
c1d07bc626
Support streaming for lookup generation ( #11922 )
...
* Support streaming for lookup generation
* Small update
* Style fixes
* Add origin generate full back for batch inference and beam search; support input length threshold judgement for directly input with input_ids
* Fix lookup stream generate with eos token
* Small fixes
* Small fix
* index fix
* Small fix
2024-08-26 19:33:31 +08:00
SONG Ge
019f725d4d
[NPU] Add support for running mp minicpm model on npu ( #11909 )
...
* add initial support for npu minicpm mp
* fix minicpm-1b abnormal output error
2024-08-26 17:52:55 +08:00
Yuwen Hu
24c279e0ae
Update IPEX_LLM_PERFORMANCE_MODE with input length threshold ( #11908 )
...
* Update IPEX_LLM_PERFORMANCE_MODE with input length threshold
* Update based on comments. And and judgement for inputs_embeds
* Fix for benchmarking purposes
* Update based on comments
* Small fix
2024-08-23 20:49:15 +08:00
binbin Deng
303a090a6b
Add lm_head optimization on NPU ( #11903 )
2024-08-23 15:51:07 +08:00
Yina Chen
23631cd357
disable lm_head opt for baichuan2-13b ( #11905 )
2024-08-23 15:39:47 +08:00
hxsz1997
650e6e6ce4
Merge pull request #11891 from hxsz1997/baichuan2-compresskv
...
Add compress_kv for Baichuan2
2024-08-23 06:09:58 +03:00
Ruonan Wang
4a61f7d20d
update mlp of llama ( #11897 )
...
* update mlp of llama
* relax threshold of mlp test
* revert code
2024-08-22 20:34:53 +08:00
Yuwen Hu
420ce7d164
Fix non-stop at eos token problem for lookup generation ( #11896 )
...
* Fix non-stop by eos_token_id problem for lookup
* Small fix
* Add judgement when generation_config.eos_token_id is None
* Fix based on comments
2024-08-22 18:55:59 +08:00
Huang, Xinshengzi
4cf03d6212
update baichuan-7b
2024-08-22 18:16:33 +08:00
Guancheng Fu
278b191dc1
Fix optimize lm head error ( #11899 )
2024-08-22 17:45:26 +08:00
Wang, Jian4
5c4ed00593
Add lightweight-serving whisper asr example ( #11847 )
...
* add asr init
* update for pp
* update style
* update readme
* update reamde
2024-08-22 15:46:28 +08:00
Huang, Xinshengzi
eb1e65f8a9
add comment
2024-08-22 15:14:47 +08:00
Huang, Xinshengzi
a2be3d7501
add comment of compress kv in attention forward
2024-08-22 15:11:55 +08:00
Huang, Xinshengzi
ce7de77085
add comment of change in model forward
2024-08-22 14:29:27 +08:00
Huang, Xinshengzi
42398a0045
add comment
2024-08-22 13:17:13 +08:00
Huang, Xinshengzi
48a827aa07
fix typos
2024-08-22 11:35:47 +08:00
Huang, Xinshengzi
8a5df93de2
fix typos
2024-08-22 11:33:07 +08:00
Huang, Xinshengzi
01ed397e7a
fix typos
2024-08-22 11:31:25 +08:00
Huang, Xinshengzi
c6ed1c412d
fix typos
2024-08-22 11:26:49 +08:00
Huang, Xinshengzi
2a0aa9271b
fix typos
2024-08-22 11:23:22 +08:00
Huang, Xinshengzi
4adadddbbc
fix typos
2024-08-22 11:12:23 +08:00
Huang, Xinshengzi
6a5ca17afc
fix typoes
2024-08-22 11:09:58 +08:00
binbin Deng
72a7bf624b
Support qwen2-1.5b with fused decoderlayer optimization on NPU ( #11888 )
2024-08-22 11:09:12 +08:00
Huang, Xinshengzi
6bb9035788
fix typos
2024-08-22 11:08:48 +08:00