Commit graph

1798 commits

Author SHA1 Message Date
Ruonan Wang
573c20bae6
fix npu lm_head cpu condition (#11976)
* fix

* fix

* fix

* fix stype

* fix style

* fix style
2024-08-30 17:11:26 +08:00
Ruonan Wang
60aa1a2c0f
Initial NPU support for MiniCPM-V-2_6 (#11966)
* initial pr

* update npu model

* fix

* fix kv cache type

* fix

* small fix

* fix style

* fix model id

* change inter_pp=4

* address comment

* fix

* fix style

* fix

* rebase
2024-08-30 16:34:35 +08:00
SONG Ge
158289d205
[NPU] Add initial support for minicpm-llama-v2.5 (#11962)
* add initial support for minicpm-llama-v2.5

* update impl

* add minicpm-llama3-v2.5 example
2024-08-30 16:00:33 +08:00
Chu,Youcheng
ae7302a654
add gptq option for ppl test (#11921)
* feat:add gptq for ppl

* fix: add an empty line

* fix: add an empty line

* fix: remove an empty line

* Resolve comments

* Resolve comments

* Resolve comments
2024-08-30 13:43:48 +08:00
binbin Deng
cd077881f1
Disable lm head (#11972) 2024-08-30 11:05:18 +08:00
Wang, Jian4
7d103417b8
Fix glm4-9b-chat nan error on vllm 0.3.3 (#11970)
* fix nan value

* update
2024-08-30 09:50:18 +08:00
Yang Wang
fbf088f61e
remove obselete npu code (#11967) 2024-08-29 14:16:44 -07:00
Yuwen Hu
a9e485eb1b
Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer (#11963)
* Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer

* Style fixes
2024-08-29 19:22:09 +08:00
Yuwen Hu
2e49e1f8e9
Further fix for MiniCPM-V-2_6 example (#11965) 2024-08-29 19:14:13 +08:00
Jason Dai
431affd0a0
Update README.md (#11964) 2024-08-29 18:56:35 +08:00
binbin Deng
14b2c8dc32
Update qwen2-7b example script (#11961) 2024-08-29 18:25:17 +08:00
Yuwen Hu
7abe17d6f7
Update MiniCPM-V-2_6 Example (#11958)
* Update example scripts regarding warmup, stream generate, moudles to not convert, etc.

* Update readme accordingly

* Fix based on comments

* Small fix

* Remove n_predict
2024-08-29 18:23:48 +08:00
Yina Chen
5f7ff76ea5
update troubleshooting (#11960) 2024-08-29 17:44:22 +08:00
Yina Chen
882f4a5ff7
Add lnl npu driver recommend version and enable cpu_lm_head on llama3 (#11952)
* update lnl npu driver version and enable cpu_lm_head on llama3

* update

* fix style

* typo

* address comments

* update

* add qwen2-7b
2024-08-29 15:01:18 +08:00
binbin Deng
71f03dcc39
Support qwen2-7b with fused decoderlayer optimization on NPU (#11912) 2024-08-29 13:34:20 +08:00
Jiao Wang
63ac5f64bb
Refactor NPU baichuan multiple-process (#11945)
* update

* add baichuan mp

* clean

* refactor

* merge

* style

* update

* update
2024-08-28 11:33:40 -07:00
SONG Ge
5ca7390082
[NPU] Add minicpm-2b support for npu multi-processing (#11949)
* add minicpm-2b support

* update example for minicpm-2b

* add LNL NPU driver requirement in readme
2024-08-28 18:08:49 +08:00
Yishuo Wang
0fbb10259a
use sdp_causal to reduce internvl2-4b memory usage if set environment variable (#11953) 2024-08-28 17:35:05 +08:00
Guancheng Fu
0a7bd274e2
Add vllm awq loading logic (#11950)
* add vllm awq loading logic

* fix

* refine
2024-08-28 16:46:18 +08:00
Yina Chen
b38fb67bec
[NPU] lm head to cpu (#11943)
* lm head to cpu

* qwen2

* mv logic and add param to disable cpu_lm_head

* use env and lm_head opt to mp file

* fix

* update

* remove print
2024-08-28 16:34:07 +08:00
hxsz1997
e23549f63f
Update llamaindex examples (#11940)
* modify rag.py

* update readme of gpu example

* update llamaindex cpu example and readme

* add llamaindex doc

* update note style

* import before instancing IpexLLMEmbedding

* update index in readme

* update links

* update link

* update related links
2024-08-28 14:03:44 +08:00
binbin Deng
bec00e2015
Improve baichuan2 NPU performance (#11942) 2024-08-27 18:37:08 +08:00
Zijie Li
90f692937d
Update npu baichuan2 (#11939) 2024-08-27 16:56:26 +08:00
binbin Deng
7f7f6c89f5
Quick fix benchmark script (#11938) 2024-08-27 15:29:27 +08:00
Jiao Wang
b4b6ddf73c
NPU Baichuan2 Multi- Process example (#11928) 2024-08-27 15:25:49 +08:00
SONG Ge
e211a5b076
update minicpm to meet latest refactor (#11937) 2024-08-27 15:08:01 +08:00
SONG Ge
a81a329a5f
[NPU] Add example for NPU multi-processing minicpm-1b model (#11935)
* add minicpm example
2024-08-27 14:57:46 +08:00
binbin Deng
7c8c9a0670
Update benchmark script for NPU (#11932) 2024-08-27 14:41:14 +08:00
Ch1y0q
730d9ec811
Add Qwen2-audio example (#11835)
* add draft for qwen2-audio

* update example for `Qwen2-Audio`

* update

* update

* add warmup
2024-08-27 13:35:24 +08:00
Shaojun Liu
b11b28e9a9
update CORE_XE_VERSION to 2.6.0 (#11929) 2024-08-27 13:10:13 +08:00
Yina Chen
e246f1e258
update llama3 npu example (#11933) 2024-08-27 13:03:18 +08:00
binbin Deng
14dddfc0d6
Update NPU example readme (#11931) 2024-08-27 12:44:58 +08:00
Zijie Li
6c3eb1e1e8
refactor from_pretrained API for NPU (#11927) 2024-08-27 09:50:30 +08:00
Xiangyu Tian
7ca557aada
LLM: Fix vLLM CPU convert error (#11926) 2024-08-27 09:22:19 +08:00
Yuwen Hu
c1d07bc626
Support streaming for lookup generation (#11922)
* Support streaming for lookup generation

* Small update

* Style fixes

* Add origin generate full back for batch inference and beam search; support input length threshold judgement for directly input with input_ids

* Fix lookup stream generate with eos token

* Small fixes

* Small fix

* index fix

* Small fix
2024-08-26 19:33:31 +08:00
Yuwen Hu
a0bbd8e28d
All-in-one benchmark update regarding performance mode for input length threshold (#11920)
* All-in-one benchmark update regarding performance mode input length threshold

* typo fix
2024-08-26 18:52:13 +08:00
SONG Ge
019f725d4d
[NPU] Add support for running mp minicpm model on npu (#11909)
* add initial support for npu minicpm mp

* fix minicpm-1b abnormal output error
2024-08-26 17:52:55 +08:00
binbin Deng
dd303776cf
Add troubleshooting about transpose value setting 2024-08-26 16:06:32 +08:00
Yuwen Hu
24c279e0ae
Update IPEX_LLM_PERFORMANCE_MODE with input length threshold (#11908)
* Update IPEX_LLM_PERFORMANCE_MODE with input length threshold

* Update based on comments. And and judgement for inputs_embeds

* Fix for benchmarking purposes

* Update based on comments

* Small fix
2024-08-23 20:49:15 +08:00
binbin Deng
303a090a6b
Add lm_head optimization on NPU (#11903) 2024-08-23 15:51:07 +08:00
Yina Chen
23631cd357
disable lm_head opt for baichuan2-13b (#11905) 2024-08-23 15:39:47 +08:00
hxsz1997
650e6e6ce4
Merge pull request #11891 from hxsz1997/baichuan2-compresskv
Add compress_kv for Baichuan2
2024-08-23 06:09:58 +03:00
Ruonan Wang
4a61f7d20d
update mlp of llama (#11897)
* update mlp of llama

* relax threshold of  mlp test

* revert code
2024-08-22 20:34:53 +08:00
Yuwen Hu
420ce7d164
Fix non-stop at eos token problem for lookup generation (#11896)
* Fix non-stop by eos_token_id problem for lookup

* Small fix

* Add judgement when generation_config.eos_token_id is None

* Fix based on comments
2024-08-22 18:55:59 +08:00
Huang, Xinshengzi
4cf03d6212 update baichuan-7b 2024-08-22 18:16:33 +08:00
Zijie Li
794abe2ce8
update npu-readme (#11900) 2024-08-22 17:49:35 +08:00
Guancheng Fu
278b191dc1
Fix optimize lm head error (#11899) 2024-08-22 17:45:26 +08:00
Shaojun Liu
c5b51d41fb
Update pypi tag to 2.2.0.dev0 (#11895) 2024-08-22 16:48:09 +08:00
Jinhe
18662dca1c
change 5 pytorch/huggingface models to fp16 (#11894) 2024-08-22 16:12:09 +08:00
Wang, Jian4
5c4ed00593
Add lightweight-serving whisper asr example (#11847)
* add asr init

* update for pp

* update style

* update readme

* update reamde
2024-08-22 15:46:28 +08:00