binbin Deng
66f6ffe4b2
Update GPU HF-Transformers example structure ( #11526 )
2024-07-08 17:58:06 +08:00
Xiangyu Tian
7d8bc83415
LLM: Partial Prefilling for Pipeline Parallel Serving ( #11457 )
...
LLM: Partial Prefilling for Pipeline Parallel Serving
2024-07-05 13:10:35 +08:00
binbin Deng
60de428b37
Support pipeline parallel for qwen-vl ( #11503 )
2024-07-04 18:03:57 +08:00
Wang, Jian4
61c36ba085
Add pp_serving verified models ( #11498 )
...
* add verified models
* update
* verify large model
* update commend
2024-07-03 14:57:09 +08:00
binbin Deng
9274282ef7
Support pipeline parallel for glm-4-9b-chat ( #11463 )
2024-07-03 14:25:28 +08:00
Wang, Jian4
4390e7dc49
Fix codegeex2 transformers version ( #11487 )
2024-07-02 15:09:28 +08:00
Heyang Sun
913e750b01
fix non-string deepseed config path bug ( #11476 )
...
* fix non-string deepseed config path bug
* Update lora_finetune_chatglm.py
2024-07-01 15:53:50 +08:00
Yishuo Wang
319a3b36b2
fix npu llama2 ( #11471 )
2024-07-01 10:14:11 +08:00
Heyang Sun
07362ffffc
ChatGLM3-6B LoRA Fine-tuning Demo ( #11450 )
...
* ChatGLM3-6B LoRA Fine-tuning Demo
* refine
* refine
* add 2-card deepspeed
* refine format
* add mpi4py and deepspeed install
2024-07-01 09:18:39 +08:00
Xiangyu Tian
fd933c92d8
Fix: Correct num_requests in benchmark for Pipeline Parallel Serving ( #11462 )
2024-06-28 16:10:51 +08:00
binbin Deng
987017ef47
Update pipeline parallel serving for more model support ( #11428 )
2024-06-27 18:21:01 +08:00
Yishuo Wang
cf0f5c4322
change npu document ( #11446 )
2024-06-27 13:59:59 +08:00
binbin Deng
508c364a79
Add precision option in PP inference examples ( #11440 )
2024-06-27 09:24:27 +08:00
Shaojun Liu
ab9f7f3ac5
FIX: Qwen1.5-GPTQ-Int4 inference error ( #11432 )
...
* merge_qkv if quant_method is 'gptq'
* fix python style checks
* refactor
* update GPU example
2024-06-26 15:36:22 +08:00
Jiao Wang
40fa23560e
Fix LLAVA example on CPU ( #11271 )
...
* update
* update
* update
* update
2024-06-25 20:04:59 -07:00
binbin Deng
e473b8d946
Add more qwen1.5 and qwen2 support for pipeline parallel inference ( #11423 )
2024-06-25 15:49:32 +08:00
Yishuo Wang
3b23de684a
update npu examples ( #11422 )
2024-06-25 13:32:53 +08:00
Xiangyu Tian
8ddae22cfb
LLM: Refactor Pipeline-Parallel-FastAPI example ( #11319 )
...
Initially Refactor for Pipeline-Parallel-FastAPI example
2024-06-25 13:30:36 +08:00
SONG Ge
34c15d3a10
update pp document ( #11421 )
2024-06-25 10:17:20 +08:00
Heyang Sun
c985912ee3
Add Deepspeed LoRA dependencies in document ( #11410 )
2024-06-24 15:29:59 +08:00
SONG Ge
0c67639539
Add more examples for pipeline parallel inference ( #11372 )
...
* add more model exampels for pipelien parallel inference
* add mixtral and vicuna models
* add yi model and past_kv supprot for chatglm family
* add docs
* doc update
* add license
* update
2024-06-21 17:55:16 +08:00
ivy-lv11
21fc781fce
Add GLM-4V example ( #11343 )
...
* add example
* modify
* modify
* add line
* add
* add link and replace with phi-3-vision template
* fix generate options
* fix
* fix
---------
Co-authored-by: jinbridge <2635480475@qq.com>
2024-06-21 12:54:31 +08:00
binbin Deng
4ba82191f2
Support PP inference for chatglm3 ( #11375 )
2024-06-21 09:59:01 +08:00
Zijie Li
ae452688c2
Add NPU HF example ( #11358 )
2024-06-19 18:07:28 +08:00
Heyang Sun
67a1e05876
Remove zero3 context manager from LoRA ( #11346 )
2024-06-18 17:24:43 +08:00
Shaojun Liu
694912698e
Upgrade scikit-learn to 1.5.0 to fix dependabot issue ( #11349 )
2024-06-18 15:47:25 +08:00
Heyang Sun
00f322d8ee
Finetune ChatGLM with Deepspeed Zero3 LoRA ( #11314 )
...
* Fintune ChatGLM with Deepspeed Zero3 LoRA
* add deepspeed zero3 config
* rename config
* remove offload_param
* add save_checkpoint parameter
* Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh
* refine
2024-06-18 12:31:26 +08:00
binbin Deng
e50c890e1f
Support finishing PP inference once eos_token_id is found ( #11336 )
2024-06-18 09:55:40 +08:00
Qiyuan Gong
de4bb97b4f
Remove accelerate 0.23.0 install command in readme and docker ( #11333 )
...
*ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。
2024-06-17 17:52:12 +08:00
SONG Ge
ef4b6519fb
Add phi-3 model support for pipeline parallel inference ( #11334 )
...
* add phi-3 model support
* add phi3 example
2024-06-17 17:44:24 +08:00
SONG Ge
be00380f1a
Fix pipeline parallel inference past_key_value error in Baichuan ( #11318 )
...
* fix past_key_value error
* add baichuan2 example
* fix style
* update doc
* add script link in doc
* fix import error
* update
2024-06-17 09:29:32 +08:00
Xiangyu Tian
4359ab3172
LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example ( #11187 )
...
Add /generate_stream and OpenAI-formatted endpoint for Pipeline-Parallel-FastAPI example
2024-06-14 15:15:32 +08:00
Jin Qiao
0e7a31a09c
ChatGLM Examples Restructure regarding Installation Steps ( #11285 )
...
* merge install step in glm examples
* fix section
* fix section
* fix tiktoken
2024-06-14 12:37:05 +08:00
binbin Deng
60cb1dac7c
Support PP for qwen1.5 ( #11300 )
2024-06-13 17:35:24 +08:00
binbin Deng
f97cce2642
Fix import error of ds autotp ( #11307 )
2024-06-13 16:22:52 +08:00
binbin Deng
220151e2a1
Refactor pipeline parallel multi-stage implementation ( #11286 )
2024-06-13 10:00:23 +08:00
ivy-lv11
e7a4e2296f
Add Stable Diffusion examples on GPU and CPU ( #11166 )
...
* add sdxl and lcm-lora
* readme
* modify
* add cpu
* add license
* modify
* add file
2024-06-12 16:33:25 +08:00
Jin Qiao
f224e98297
Add GLM-4 CPU example ( #11223 )
...
* Add GLM-4 example
* add tiktoken dependency
* fix
* fix
2024-06-12 15:30:51 +08:00
Zijie Li
40fc8704c4
Add GPU example for GLM-4 ( #11267 )
...
* Add GPU example for GLM-4
* Update streamchat.py
* Fix pretrianed arguments
Fix pretrained arguments in generate and streamchat.py
* Update Readme
Update install tiktoken required for GLM-4
* Update comments in generate.py
2024-06-12 14:29:50 +08:00
Wang, Jian4
6f2684e5c9
Update pp llama.py to save memory ( #11233 )
2024-06-07 13:18:16 +08:00
Zijie Li
7b753dc8ca
Update sample output for HF Qwen2 GPU and CPU ( #11257 )
2024-06-07 11:36:22 +08:00
Yuwen Hu
8c36b5bdde
Add qwen2 example ( #11252 )
...
* Add GPU example for Qwen2
* Update comments in README
* Update README for Qwen2 GPU example
* Add CPU example for Qwen2
Sample Output under README pending
* Update generate.py and README for CPU Qwen2
* Update GPU example for Qwen2
* Small update
* Small fix
* Add Qwen2 table
* Update README for Qwen2 CPU and GPU
Update sample output under README
---------
Co-authored-by: Zijie Li <michael20001122@gmail.com>
2024-06-07 10:29:33 +08:00
Shaojun Liu
85df5e7699
fix nightly perf test ( #11251 )
2024-06-07 09:33:14 +08:00
Guoqiong Song
09c6780d0c
phi-2 transformers 4.37 ( #11161 )
...
* phi-2 transformers 4.37
2024-06-05 13:36:41 -07:00
Zijie Li
bfa1367149
Add CPU and GPU example for MiniCPM ( #11202 )
...
* Change installation address
Change former address: "https://docs.conda.io/en/latest/miniconda.html# " to new address: "https://conda-forge.org/download/ " for 63 occurrences under python\llm\example
* Change Prompt
Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence
* Create and update model minicpm
* Update model minicpm
Update model minicpm under GPU/PyTorch-Models
* Update readme and generate.py
change "prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)" and delete "pip install transformers==4.37.0
"
* Update comments for minicpm GPU
Update comments for generate.py at minicpm GPU
* Add CPU example for MiniCPM
* Update minicpm README for CPU
* Update README for MiniCPM and Llama3
* Update Readme for Llama3 CPU Pytorch
* Update and fix comments for MiniCPM
2024-06-05 18:09:53 +08:00
Yuwen Hu
af96579c76
Update installation guide for pipeline parallel inference ( #11224 )
...
* Update installation guide for pipeline parallel inference
* Small fix
* further fix
* Small fix
* Small fix
* Update based on comments
* Small fix
* Small fix
* Small fix
2024-06-05 17:54:29 +08:00
Xiangyu Tian
ac3d53ff5d
LLM: Fix vLLM CPU version error ( #11206 )
...
Fix vLLM CPU version error
2024-06-04 19:10:23 +08:00
Qiyuan Gong
ce3f08b25a
Fix IPEX auto importer ( #11192 )
...
* Fix ipex auto importer with Python builtins.
* Raise errors if the user imports ipex manually before importing ipex_llm. Do nothing if they import ipex after importing ipex_llm.
* Remove import ipex in examples.
2024-06-04 16:57:18 +08:00
Xiangyu Tian
f02f097002
Fix vLLM verion in CPU/vLLM-Serving example README ( #11201 )
2024-06-04 15:56:55 +08:00
Zijie Li
a644e9409b
Miniconda/Anaconda -> Miniforge update in examples ( #11194 )
...
* Change installation address
Change former address: "https://docs.conda.io/en/latest/miniconda.html# " to new address: "https://conda-forge.org/download/ " for 63 occurrences under python\llm\example
* Change Prompt
Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence
2024-06-04 10:14:02 +08:00