Yina Chen
882f4a5ff7
Add lnl npu driver recommend version and enable cpu_lm_head on llama3 ( #11952 )
...
* update lnl npu driver version and enable cpu_lm_head on llama3
* update
* fix style
* typo
* address comments
* update
* add qwen2-7b
2024-08-29 15:01:18 +08:00
binbin Deng
71f03dcc39
Support qwen2-7b with fused decoderlayer optimization on NPU ( #11912 )
2024-08-29 13:34:20 +08:00
SONG Ge
5ca7390082
[NPU] Add minicpm-2b support for npu multi-processing ( #11949 )
...
* add minicpm-2b support
* update example for minicpm-2b
* add LNL NPU driver requirement in readme
2024-08-28 18:08:49 +08:00
hxsz1997
e23549f63f
Update llamaindex examples ( #11940 )
...
* modify rag.py
* update readme of gpu example
* update llamaindex cpu example and readme
* add llamaindex doc
* update note style
* import before instancing IpexLLMEmbedding
* update index in readme
* update links
* update link
* update related links
2024-08-28 14:03:44 +08:00
Zijie Li
90f692937d
Update npu baichuan2 ( #11939 )
2024-08-27 16:56:26 +08:00
Jiao Wang
b4b6ddf73c
NPU Baichuan2 Multi- Process example ( #11928 )
2024-08-27 15:25:49 +08:00
SONG Ge
a81a329a5f
[NPU] Add example for NPU multi-processing minicpm-1b model ( #11935 )
...
* add minicpm example
2024-08-27 14:57:46 +08:00
Ch1y0q
730d9ec811
Add Qwen2-audio example ( #11835 )
...
* add draft for qwen2-audio
* update example for `Qwen2-Audio`
* update
* update
* add warmup
2024-08-27 13:35:24 +08:00
Yina Chen
e246f1e258
update llama3 npu example ( #11933 )
2024-08-27 13:03:18 +08:00
binbin Deng
14dddfc0d6
Update NPU example readme ( #11931 )
2024-08-27 12:44:58 +08:00
Zijie Li
6c3eb1e1e8
refactor from_pretrained API for NPU ( #11927 )
2024-08-27 09:50:30 +08:00
binbin Deng
dd303776cf
Add troubleshooting about transpose value setting
2024-08-26 16:06:32 +08:00
Zijie Li
794abe2ce8
update npu-readme ( #11900 )
2024-08-22 17:49:35 +08:00
Jinhe
18662dca1c
change 5 pytorch/huggingface models to fp16 ( #11894 )
2024-08-22 16:12:09 +08:00
Wang, Jian4
5c4ed00593
Add lightweight-serving whisper asr example ( #11847 )
...
* add asr init
* update for pp
* update style
* update readme
* update reamde
2024-08-22 15:46:28 +08:00
Jinhe
a8e2573421
added tokenization file for codegeex2-6b in pytorch-models( #11875 )
...
* added tokenization file
* tokenization file readme update
* optional
2024-08-22 14:37:56 +08:00
binbin Deng
72a7bf624b
Support qwen2-1.5b with fused decoderlayer optimization on NPU ( #11888 )
2024-08-22 11:09:12 +08:00
Zijie Li
bdbe995b01
Update README.md ( #11889 )
...
Set datasets version to 2.16.1. Clear out the transformers version requirement.
2024-08-22 09:40:16 +08:00
SONG Ge
8c5c7f32dd
Update doc for running npu generate example with ipex-llm[npu] ( #11876 )
...
* update doc for running npu generate example with ipex-llm[npu]
* switch max_prompt_len to 512 to fix compile error on mtl
2024-08-21 13:45:29 +08:00
Jinhe
3ee194d983
Pytorch models transformers version update ( #11860 )
...
* yi sync
* delete 4.34 constraint
* delete 4.34 constraint
* delete 4.31 constraint
* delete 4.34 constraint
* delete 4.35 constraint
* added <=4.33.3 constraint
* added <=4.33.3 constraint
* switched to chinese prompt
2024-08-20 18:01:42 +08:00
Yuwen Hu
5e8286f72d
Update ipex-llm default transformers version to 4.37.0 ( #11859 )
...
* Update default transformers version to 4.37.0
* Add dependency requirements for qwen and qwen-vl
* Temp fix transformers version for these not yet verified models
* Skip qwen test in UT for now as it requires transformers<4.37.0
2024-08-20 17:37:58 +08:00
SONG Ge
5b83493b1a
Add ipex-llm npu option in setup.py ( #11858 )
...
* add ipex-llm npu release
* update example doc
* meet latest release changes
2024-08-20 17:29:49 +08:00
Heyang Sun
ee6852c915
Fix typo ( #11862 )
2024-08-20 16:38:11 +08:00
SONG Ge
7380823f3f
Update Llama2 multi-processes example ( #11852 )
...
* update llama2 multi-processes examples
* update
* update readme
* update
2024-08-19 19:49:01 +08:00
Yang Wang
99b05ba1dc
separate prefill into a process ( #11787 )
...
* seperate prefill into a process
* using model.share_memory()
* might work
* worked
* use long prompt
* refactor
* cleanup
* fix bug
* clean up
* changable inter and intra process stages
* refactor
* add max output len
* fix npu_model changes that may cause generate down
* fix npu_model generate import error
* fix generare forward error
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2024-08-19 17:53:36 +08:00
Jinhe
da3d7a3a53
delete transformers version requirement ( #11845 )
...
* delete transformers version requirement
* delete transformers version requirement
2024-08-19 17:53:02 +08:00
Jinhe
e07a55665c
Codegeex2 tokenization fix ( #11831 )
...
* updated tokenizer file
* updated tokenizer file
* updated tokenizer file
* updated tokenizer file
* new folder
2024-08-16 15:48:47 +08:00
Jinhe
adfbb9124a
Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples ( #11815 )
...
* model to fp16 & 2_6 reorganize
* revisions
* revisions
* half
* deleted transformer version requirements
* deleted transformer version requirements
---------
Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
2024-08-16 14:48:56 +08:00
Chu,Youcheng
f463268e36
fix: add run oneAPI instruction for the example of codeshell ( #11828 )
...
* fix: delete ipex extension import in ppl wikitext evaluation
* feat: add mixed_precision argument on ppl wikitext evaluation
* fix: delete mix_precision command in perplex evaluation for wikitext
* fix: remove fp16 mixed-presicion argument
* fix: Add a space.
* fix: add run oneAPI instruction for the example of codeshell
* fix: textual adjustments
* fix: Textual adjustment
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-16 14:29:06 +08:00
Ch1y0q
447c8ed324
update transformers version for replit-code-v1-3b, `internlm2-chat-… ( #11811 )
...
* update transformers version for `replit-code-v1-3b`, `internlm2-chat-7b` and mistral
* remove for default transformers version
2024-08-15 16:40:48 +08:00
Jinhe
2fbbb51e71
transformers==4.37, yi & yuan2 & vicuna ( #11805 )
...
* transformers==4.37
* added yi model
* added yi model
* xxxx
* delete prompt template
* / and delete
2024-08-15 15:39:24 +08:00
Jinhe
f43da2d455
deletion of specification of transformers version ( #11808 )
2024-08-15 15:23:32 +08:00
Jinhe
d8d887edd2
added minicpm-v-2_6 ( #11794 )
2024-08-14 16:23:44 +08:00
Yang Wang
51bcac1229
follow up on experimental support of fused decoder layer for llama2 ( #11785 )
...
* clean up and support transpose value cache
* refine
* fix style
* fix style
2024-08-13 18:53:55 -07:00
Heyang Sun
70c828b87c
deepspeed zero3 QLoRA finetuning ( #11625 )
...
* deepspeed zero3 QLoRA finetuning
* Update convert.py
* Update low_bit_linear.py
* Update utils.py
* Update qlora_finetune_llama2_13b_arch_2_card.sh
* Update low_bit_linear.py
* Update alpaca_qlora_finetuning.py
* Update low_bit_linear.py
* Update utils.py
* Update convert.py
* Update alpaca_qlora_finetuning.py
* Update alpaca_qlora_finetuning.py
* Update low_bit_linear.py
* Update deepspeed_zero3.json
* Update qlora_finetune_llama2_13b_arch_2_card.sh
* Update low_bit_linear.py
* Update low_bit_linear.py
* Update utils.py
* fix style
* fix style
* Update alpaca_qlora_finetuning.py
* Update qlora_finetune_llama2_13b_arch_2_card.sh
* Update convert.py
* Update low_bit_linear.py
* Update model.py
* Update alpaca_qlora_finetuning.py
* Update low_bit_linear.py
* Update low_bit_linear.py
* Update low_bit_linear.py
2024-08-13 16:15:29 +08:00
binbin Deng
23d3acdc77
Add experimental support of fused decoder layer for llama2 ( #11768 )
2024-08-13 14:41:36 +08:00
Jin, Qiao
c28b3389e6
Update npu multimodal example ( #11773 )
2024-08-13 14:14:59 +08:00
Ruonan Wang
8db34057b4
optimize lookahead init time ( #11769 )
2024-08-12 17:19:12 +08:00
Jin, Qiao
05989ad0f9
Update npu example and all in one benckmark ( #11766 )
2024-08-12 16:46:46 +08:00
Ruonan Wang
7e917d6cfb
fix gptq of llama ( #11749 )
...
* fix gptq of llama
* small fix
2024-08-09 16:39:25 +08:00
Shaojun Liu
107f7aafd0
enable inference mode for deepspeed tp serving ( #11742 )
2024-08-08 14:38:30 +08:00
Zijie Li
9e65cf00b3
Add openai-whisper pytorch gpu ( #11736 )
...
* Add openai-whisper pytorch gpu
* Update README.md
* Update README.md
* fix typo
* fix names update readme
* Update README.md
2024-08-08 12:32:59 +08:00
Jinhe
d0c89fb715
updated llama.cpp and ollama quickstart ( #11732 )
...
* updated llama.cpp and ollama quickstart.md
* added qwen2-1.5B sample output
* revision on quickstart updates
* revision on quickstart updates
* revision on qwen2 readme
* added 2 troubleshoots“
”
* troubleshoot revision
2024-08-08 11:04:01 +08:00
Ch1y0q
4676af2054
add gemma2 example ( #11724 )
...
* add `gemma2`
* update `transformers` version
* update `README.md`
2024-08-06 21:17:50 +08:00
Jin, Qiao
11650b6f81
upgrade glm-4v example transformers version ( #11719 )
2024-08-06 14:55:09 +08:00
Jin, Qiao
7f241133da
Add MiniCPM-Llama3-V-2_5 GPU example ( #11693 )
...
* Add MiniCPM-Llama3-V-2_5 GPU example
* fix
2024-08-06 10:22:41 +08:00
Jin, Qiao
808d9a7bae
Add MiniCPM-V-2 GPU example ( #11699 )
...
* Add MiniCPM-V-2 GPU example
* add example in README.md
* add example in README.md
2024-08-06 10:22:33 +08:00
Zijie Li
8fb36b9f4a
add new benchmark_util.py ( #11713 )
...
* add new benchmark_util.py
2024-08-05 16:18:48 +08:00
Wang, Jian4
493cbd9a36
Support lightweight-serving with internlm-xcomposer2-vl-7b multimodal input ( #11703 )
...
* init image_list
* enable internlm-xcomposer2 image input
* update style
* add readme
* update model
* update readme
2024-08-05 09:36:04 +08:00
Qiyuan Gong
762ad49362
Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM ( #11704 )
...
* DeepSpeed-AutoTP will start multiple processors to load models and convert them in CPU memory. If model/rank_num is large, this will lead to OOM. Add RANK_WAIT_TIME to reduce memory usage by controlling model reading parallelism.
2024-08-01 18:16:21 +08:00
Zijie Li
5079ed9e06
Add Llama3.1 example ( #11689 )
...
* Add Llama3.1 example
Add Llama3.1 example for Linux arc and Windows MTL
* Changes made to adjust compatibilities
transformers changed to 4.43.1
* Update index.rst
* Update README.md
* Update index.rst
* Update index.rst
* Update index.rst
2024-07-31 10:53:30 +08:00
Jin, Qiao
6e3ce28173
Upgrade glm-4 example transformers version ( #11659 )
...
* upgrade glm-4 example transformers version
* move pip install in one line
2024-07-31 10:24:50 +08:00
Jin, Qiao
a44ab32153
Switch to conhost when running on NPU ( #11687 )
2024-07-30 17:08:06 +08:00
Guoqiong Song
336dfc04b1
fix 1482 ( #11661 )
...
Co-authored-by: rnwang04 <ruonan1.wang@intel.com>
2024-07-26 12:39:09 -07:00
Wang, Jian4
23681fbf5c
Support codegeex4-9b for lightweight-serving ( #11648 )
...
* add options, support prompt and not return end_token
* enable openai parameter
* set do_sample None and update style
2024-07-26 09:41:03 +08:00
Wang, Jian4
1eed0635f2
Add lightweight serving and support tgi parameter ( #11600 )
...
* init tgi request
* update openai api
* update for pp
* update and add readme
* add to docker
* add start bash
* update
* update
* update
2024-07-19 13:15:56 +08:00
Guoqiong Song
380717f50d
fix gemma for 4.41 ( #11531 )
...
* fix gemma for 4.41
2024-07-18 15:02:50 -07:00
Guoqiong Song
5a6211fd56
fix minicpm for transformers>=4.39 ( #11533 )
...
* fix minicpm for transformers>=4.39
2024-07-18 15:01:57 -07:00
Guoqiong Song
bfcdc35b04
phi-3 on "transformers>=4.37.0,<=4.42.3" ( #11534 )
2024-07-17 17:19:57 -07:00
Guoqiong Song
d64711900a
Fix cohere model on transformers>=4.41 ( #11575 )
...
* fix cohere model for 4-41
2024-07-17 17:18:59 -07:00
Guoqiong Song
5b6eb85b85
phi model readme ( #11595 )
...
Co-authored-by: rnwang04 <ruonan1.wang@intel.com>
2024-07-17 17:18:34 -07:00
Wang, Jian4
9c15abf825
Refactor fastapi-serving and add one card serving( #11581 )
...
* init fastapi-serving one card
* mv api code to source
* update worker
* update for style-check
* add worker
* update bash
* update
* update worker name and add readme
* rename update
* rename to fastapi
2024-07-17 11:12:43 +08:00
Heyang Sun
365adad59f
Support LoRA ChatGLM with Alpaca Dataset ( #11580 )
...
* Support LoRA ChatGLM with Alpaca Dataset
* refine
* fix
* add 2-card alpaca
2024-07-16 15:40:02 +08:00
Ch1y0q
50cf563a71
Add example: MiniCPM-V ( #11570 )
2024-07-15 10:55:48 +08:00
Zhao Changmin
06745e5742
Add npu benchmark all-in-one script ( #11571 )
...
* npu benchmark
2024-07-15 10:42:37 +08:00
Xiangyu Tian
0981b72275
Fix /generate_stream api in Pipeline Parallel FastAPI ( #11569 )
2024-07-12 13:19:42 +08:00
Zhao Changmin
b9c66994a5
add npu sdp ( #11562 )
2024-07-11 16:57:35 +08:00
binbin Deng
2b8ad8731e
Support pipeline parallel for glm-4v ( #11545 )
2024-07-11 16:06:06 +08:00
Xiangyu Tian
7f5111a998
LLM: Refine start script for Pipeline Parallel Serving ( #11557 )
...
Refine start script and readme for Pipeline Parallel Serving
2024-07-11 15:45:27 +08:00
Zhao Changmin
105e124752
optimize phi3-v encoder npu performance and add multimodal example ( #11553 )
...
* phi3-v
* readme
2024-07-11 13:59:14 +08:00
Zhao Changmin
3c16c9f725
Optimize baichuan on NPU ( #11548 )
...
* baichuan_npu
2024-07-10 13:18:48 +08:00
Zhao Changmin
76a5802acf
update NPU examples ( #11540 )
...
* update NPU examples
2024-07-09 17:19:42 +08:00
Jason Dai
099486afb7
Update README.md ( #11530 )
2024-07-08 20:18:41 +08:00
binbin Deng
66f6ffe4b2
Update GPU HF-Transformers example structure ( #11526 )
2024-07-08 17:58:06 +08:00
Xiangyu Tian
7d8bc83415
LLM: Partial Prefilling for Pipeline Parallel Serving ( #11457 )
...
LLM: Partial Prefilling for Pipeline Parallel Serving
2024-07-05 13:10:35 +08:00
binbin Deng
60de428b37
Support pipeline parallel for qwen-vl ( #11503 )
2024-07-04 18:03:57 +08:00
Wang, Jian4
61c36ba085
Add pp_serving verified models ( #11498 )
...
* add verified models
* update
* verify large model
* update commend
2024-07-03 14:57:09 +08:00
binbin Deng
9274282ef7
Support pipeline parallel for glm-4-9b-chat ( #11463 )
2024-07-03 14:25:28 +08:00
Wang, Jian4
4390e7dc49
Fix codegeex2 transformers version ( #11487 )
2024-07-02 15:09:28 +08:00
Heyang Sun
913e750b01
fix non-string deepseed config path bug ( #11476 )
...
* fix non-string deepseed config path bug
* Update lora_finetune_chatglm.py
2024-07-01 15:53:50 +08:00
Yishuo Wang
319a3b36b2
fix npu llama2 ( #11471 )
2024-07-01 10:14:11 +08:00
Heyang Sun
07362ffffc
ChatGLM3-6B LoRA Fine-tuning Demo ( #11450 )
...
* ChatGLM3-6B LoRA Fine-tuning Demo
* refine
* refine
* add 2-card deepspeed
* refine format
* add mpi4py and deepspeed install
2024-07-01 09:18:39 +08:00
Xiangyu Tian
fd933c92d8
Fix: Correct num_requests in benchmark for Pipeline Parallel Serving ( #11462 )
2024-06-28 16:10:51 +08:00
binbin Deng
987017ef47
Update pipeline parallel serving for more model support ( #11428 )
2024-06-27 18:21:01 +08:00
Yishuo Wang
cf0f5c4322
change npu document ( #11446 )
2024-06-27 13:59:59 +08:00
binbin Deng
508c364a79
Add precision option in PP inference examples ( #11440 )
2024-06-27 09:24:27 +08:00
Shaojun Liu
ab9f7f3ac5
FIX: Qwen1.5-GPTQ-Int4 inference error ( #11432 )
...
* merge_qkv if quant_method is 'gptq'
* fix python style checks
* refactor
* update GPU example
2024-06-26 15:36:22 +08:00
Jiao Wang
40fa23560e
Fix LLAVA example on CPU ( #11271 )
...
* update
* update
* update
* update
2024-06-25 20:04:59 -07:00
binbin Deng
e473b8d946
Add more qwen1.5 and qwen2 support for pipeline parallel inference ( #11423 )
2024-06-25 15:49:32 +08:00
Yishuo Wang
3b23de684a
update npu examples ( #11422 )
2024-06-25 13:32:53 +08:00
Xiangyu Tian
8ddae22cfb
LLM: Refactor Pipeline-Parallel-FastAPI example ( #11319 )
...
Initially Refactor for Pipeline-Parallel-FastAPI example
2024-06-25 13:30:36 +08:00
SONG Ge
34c15d3a10
update pp document ( #11421 )
2024-06-25 10:17:20 +08:00
Heyang Sun
c985912ee3
Add Deepspeed LoRA dependencies in document ( #11410 )
2024-06-24 15:29:59 +08:00
SONG Ge
0c67639539
Add more examples for pipeline parallel inference ( #11372 )
...
* add more model exampels for pipelien parallel inference
* add mixtral and vicuna models
* add yi model and past_kv supprot for chatglm family
* add docs
* doc update
* add license
* update
2024-06-21 17:55:16 +08:00
ivy-lv11
21fc781fce
Add GLM-4V example ( #11343 )
...
* add example
* modify
* modify
* add line
* add
* add link and replace with phi-3-vision template
* fix generate options
* fix
* fix
---------
Co-authored-by: jinbridge <2635480475@qq.com>
2024-06-21 12:54:31 +08:00
binbin Deng
4ba82191f2
Support PP inference for chatglm3 ( #11375 )
2024-06-21 09:59:01 +08:00
Zijie Li
ae452688c2
Add NPU HF example ( #11358 )
2024-06-19 18:07:28 +08:00
Heyang Sun
67a1e05876
Remove zero3 context manager from LoRA ( #11346 )
2024-06-18 17:24:43 +08:00
Shaojun Liu
694912698e
Upgrade scikit-learn to 1.5.0 to fix dependabot issue ( #11349 )
2024-06-18 15:47:25 +08:00
Heyang Sun
00f322d8ee
Finetune ChatGLM with Deepspeed Zero3 LoRA ( #11314 )
...
* Fintune ChatGLM with Deepspeed Zero3 LoRA
* add deepspeed zero3 config
* rename config
* remove offload_param
* add save_checkpoint parameter
* Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh
* refine
2024-06-18 12:31:26 +08:00