Commit graph

563 commits

Author SHA1 Message Date
Jin, Qiao
65e281bb29
Add MiniCPM-V cpu example (#11975)
* Add MiniCPM-V cpu example

* fix

* fix

* fix

* fix
2024-09-02 10:17:57 +08:00
Ruonan Wang
79978e6f36
update npu multimodal readme (#11979)
* update npu readme of multimodal

* small fix

* meet comment
2024-08-30 19:02:06 +08:00
Ruonan Wang
4811a490ef
small fix (#11978)
* fix

* meet comment
2024-08-30 17:55:15 +08:00
Ruonan Wang
573c20bae6
fix npu lm_head cpu condition (#11976)
* fix

* fix

* fix

* fix stype

* fix style

* fix style
2024-08-30 17:11:26 +08:00
Ruonan Wang
60aa1a2c0f
Initial NPU support for MiniCPM-V-2_6 (#11966)
* initial pr

* update npu model

* fix

* fix kv cache type

* fix

* small fix

* fix style

* fix model id

* change inter_pp=4

* address comment

* fix

* fix style

* fix

* rebase
2024-08-30 16:34:35 +08:00
SONG Ge
158289d205
[NPU] Add initial support for minicpm-llama-v2.5 (#11962)
* add initial support for minicpm-llama-v2.5

* update impl

* add minicpm-llama3-v2.5 example
2024-08-30 16:00:33 +08:00
binbin Deng
cd077881f1
Disable lm head (#11972) 2024-08-30 11:05:18 +08:00
Yuwen Hu
2e49e1f8e9
Further fix for MiniCPM-V-2_6 example (#11965) 2024-08-29 19:14:13 +08:00
Jason Dai
431affd0a0
Update README.md (#11964) 2024-08-29 18:56:35 +08:00
binbin Deng
14b2c8dc32
Update qwen2-7b example script (#11961) 2024-08-29 18:25:17 +08:00
Yuwen Hu
7abe17d6f7
Update MiniCPM-V-2_6 Example (#11958)
* Update example scripts regarding warmup, stream generate, moudles to not convert, etc.

* Update readme accordingly

* Fix based on comments

* Small fix

* Remove n_predict
2024-08-29 18:23:48 +08:00
Yina Chen
5f7ff76ea5
update troubleshooting (#11960) 2024-08-29 17:44:22 +08:00
Yina Chen
882f4a5ff7
Add lnl npu driver recommend version and enable cpu_lm_head on llama3 (#11952)
* update lnl npu driver version and enable cpu_lm_head on llama3

* update

* fix style

* typo

* address comments

* update

* add qwen2-7b
2024-08-29 15:01:18 +08:00
binbin Deng
71f03dcc39
Support qwen2-7b with fused decoderlayer optimization on NPU (#11912) 2024-08-29 13:34:20 +08:00
SONG Ge
5ca7390082
[NPU] Add minicpm-2b support for npu multi-processing (#11949)
* add minicpm-2b support

* update example for minicpm-2b

* add LNL NPU driver requirement in readme
2024-08-28 18:08:49 +08:00
hxsz1997
e23549f63f
Update llamaindex examples (#11940)
* modify rag.py

* update readme of gpu example

* update llamaindex cpu example and readme

* add llamaindex doc

* update note style

* import before instancing IpexLLMEmbedding

* update index in readme

* update links

* update link

* update related links
2024-08-28 14:03:44 +08:00
Zijie Li
90f692937d
Update npu baichuan2 (#11939) 2024-08-27 16:56:26 +08:00
Jiao Wang
b4b6ddf73c
NPU Baichuan2 Multi- Process example (#11928) 2024-08-27 15:25:49 +08:00
SONG Ge
a81a329a5f
[NPU] Add example for NPU multi-processing minicpm-1b model (#11935)
* add minicpm example
2024-08-27 14:57:46 +08:00
Ch1y0q
730d9ec811
Add Qwen2-audio example (#11835)
* add draft for qwen2-audio

* update example for `Qwen2-Audio`

* update

* update

* add warmup
2024-08-27 13:35:24 +08:00
Yina Chen
e246f1e258
update llama3 npu example (#11933) 2024-08-27 13:03:18 +08:00
binbin Deng
14dddfc0d6
Update NPU example readme (#11931) 2024-08-27 12:44:58 +08:00
Zijie Li
6c3eb1e1e8
refactor from_pretrained API for NPU (#11927) 2024-08-27 09:50:30 +08:00
binbin Deng
dd303776cf
Add troubleshooting about transpose value setting 2024-08-26 16:06:32 +08:00
Zijie Li
794abe2ce8
update npu-readme (#11900) 2024-08-22 17:49:35 +08:00
Jinhe
18662dca1c
change 5 pytorch/huggingface models to fp16 (#11894) 2024-08-22 16:12:09 +08:00
Wang, Jian4
5c4ed00593
Add lightweight-serving whisper asr example (#11847)
* add asr init

* update for pp

* update style

* update readme

* update reamde
2024-08-22 15:46:28 +08:00
Jinhe
a8e2573421
added tokenization file for codegeex2-6b in pytorch-models(#11875)
* added tokenization file

* tokenization file readme update

* optional
2024-08-22 14:37:56 +08:00
binbin Deng
72a7bf624b
Support qwen2-1.5b with fused decoderlayer optimization on NPU (#11888) 2024-08-22 11:09:12 +08:00
Zijie Li
bdbe995b01
Update README.md (#11889)
Set datasets version to 2.16.1. Clear out the transformers version requirement.
2024-08-22 09:40:16 +08:00
SONG Ge
8c5c7f32dd
Update doc for running npu generate example with ipex-llm[npu] (#11876)
* update doc for running npu generate example with ipex-llm[npu]

* switch max_prompt_len to 512 to fix compile error on mtl
2024-08-21 13:45:29 +08:00
Jinhe
3ee194d983
Pytorch models transformers version update (#11860)
* yi sync

* delete 4.34 constraint

* delete 4.34 constraint

* delete 4.31 constraint

* delete 4.34 constraint

* delete 4.35 constraint

* added <=4.33.3 constraint

* added <=4.33.3 constraint

* switched to chinese prompt
2024-08-20 18:01:42 +08:00
Yuwen Hu
5e8286f72d
Update ipex-llm default transformers version to 4.37.0 (#11859)
* Update default transformers version to 4.37.0

* Add dependency requirements for qwen and qwen-vl

* Temp fix transformers version for these not yet verified models

* Skip qwen test in UT for now as it requires transformers<4.37.0
2024-08-20 17:37:58 +08:00
SONG Ge
5b83493b1a
Add ipex-llm npu option in setup.py (#11858)
* add ipex-llm npu release

* update example doc

* meet latest release changes
2024-08-20 17:29:49 +08:00
Heyang Sun
ee6852c915
Fix typo (#11862) 2024-08-20 16:38:11 +08:00
SONG Ge
7380823f3f
Update Llama2 multi-processes example (#11852)
* update llama2 multi-processes examples

* update

* update readme

* update
2024-08-19 19:49:01 +08:00
Yang Wang
99b05ba1dc
separate prefill into a process (#11787)
* seperate prefill into a process

* using model.share_memory()

* might work

* worked

* use long prompt

* refactor

* cleanup

* fix bug

* clean up

* changable inter and intra process stages

* refactor

* add max output len

* fix npu_model changes that may cause generate down

* fix npu_model generate import error

* fix generare forward error

---------

Co-authored-by: sgwhat <ge.song@intel.com>
2024-08-19 17:53:36 +08:00
Jinhe
da3d7a3a53
delete transformers version requirement (#11845)
* delete transformers version requirement

* delete transformers version requirement
2024-08-19 17:53:02 +08:00
Jinhe
e07a55665c
Codegeex2 tokenization fix (#11831)
* updated tokenizer file

* updated tokenizer file

* updated tokenizer file

* updated tokenizer file

* new folder
2024-08-16 15:48:47 +08:00
Jinhe
adfbb9124a
Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples (#11815)
* model to fp16 & 2_6 reorganize

* revisions

* revisions

* half

* deleted transformer version requirements

* deleted transformer version requirements

---------

Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
2024-08-16 14:48:56 +08:00
Chu,Youcheng
f463268e36
fix: add run oneAPI instruction for the example of codeshell (#11828)
* fix: delete ipex extension import in ppl wikitext evaluation

* feat: add mixed_precision argument on ppl wikitext evaluation

* fix: delete mix_precision command in perplex evaluation for wikitext

* fix: remove fp16 mixed-presicion argument

* fix: Add a space.

* fix: add run oneAPI instruction for the example of codeshell

* fix: textual adjustments

* fix: Textual adjustment

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-16 14:29:06 +08:00
Ch1y0q
447c8ed324
update transformers version for replit-code-v1-3b, `internlm2-chat-… (#11811)
* update transformers version for `replit-code-v1-3b`, `internlm2-chat-7b` and mistral

* remove for default transformers version
2024-08-15 16:40:48 +08:00
Jinhe
2fbbb51e71
transformers==4.37, yi & yuan2 & vicuna (#11805)
* transformers==4.37

* added yi model

* added yi model

* xxxx

* delete prompt template

* / and delete
2024-08-15 15:39:24 +08:00
Jinhe
f43da2d455
deletion of specification of transformers version (#11808) 2024-08-15 15:23:32 +08:00
Jinhe
d8d887edd2
added minicpm-v-2_6 (#11794) 2024-08-14 16:23:44 +08:00
Yang Wang
51bcac1229
follow up on experimental support of fused decoder layer for llama2 (#11785)
* clean up and support transpose value cache

* refine

* fix style

* fix style
2024-08-13 18:53:55 -07:00
Heyang Sun
70c828b87c
deepspeed zero3 QLoRA finetuning (#11625)
* deepspeed zero3 QLoRA finetuning

* Update convert.py

* Update low_bit_linear.py

* Update utils.py

* Update qlora_finetune_llama2_13b_arch_2_card.sh

* Update low_bit_linear.py

* Update alpaca_qlora_finetuning.py

* Update low_bit_linear.py

* Update utils.py

* Update convert.py

* Update alpaca_qlora_finetuning.py

* Update alpaca_qlora_finetuning.py

* Update low_bit_linear.py

* Update deepspeed_zero3.json

* Update qlora_finetune_llama2_13b_arch_2_card.sh

* Update low_bit_linear.py

* Update low_bit_linear.py

* Update utils.py

* fix style

* fix style

* Update alpaca_qlora_finetuning.py

* Update qlora_finetune_llama2_13b_arch_2_card.sh

* Update convert.py

* Update low_bit_linear.py

* Update model.py

* Update alpaca_qlora_finetuning.py

* Update low_bit_linear.py

* Update low_bit_linear.py

* Update low_bit_linear.py
2024-08-13 16:15:29 +08:00
binbin Deng
23d3acdc77
Add experimental support of fused decoder layer for llama2 (#11768) 2024-08-13 14:41:36 +08:00
Jin, Qiao
c28b3389e6
Update npu multimodal example (#11773) 2024-08-13 14:14:59 +08:00
Ruonan Wang
8db34057b4
optimize lookahead init time (#11769) 2024-08-12 17:19:12 +08:00