Commit graph

436 commits

Author SHA1 Message Date
Shaojun Liu
85df5e7699
fix nightly perf test (#11251) 2024-06-07 09:33:14 +08:00
Guoqiong Song
09c6780d0c
phi-2 transformers 4.37 (#11161)
* phi-2 transformers 4.37
2024-06-05 13:36:41 -07:00
Zijie Li
bfa1367149
Add CPU and GPU example for MiniCPM (#11202)
* Change installation address

Change former address: "https://docs.conda.io/en/latest/miniconda.html#" to new address: "https://conda-forge.org/download/" for 63 occurrences under python\llm\example

* Change Prompt

Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence

* Create and update model minicpm

* Update model minicpm

Update model minicpm under GPU/PyTorch-Models

* Update readme and generate.py

change "prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)" and delete "pip install transformers==4.37.0
"

* Update comments for minicpm GPU

Update comments for generate.py at minicpm GPU

* Add CPU example for MiniCPM

* Update minicpm README for CPU

* Update README for MiniCPM and Llama3

* Update Readme for Llama3 CPU Pytorch

* Update and fix comments for MiniCPM
2024-06-05 18:09:53 +08:00
Yuwen Hu
af96579c76
Update installation guide for pipeline parallel inference (#11224)
* Update installation guide for pipeline parallel inference

* Small fix

* further fix

* Small fix

* Small fix

* Update based on comments

* Small fix

* Small fix

* Small fix
2024-06-05 17:54:29 +08:00
Xiangyu Tian
ac3d53ff5d
LLM: Fix vLLM CPU version error (#11206)
Fix vLLM CPU version error
2024-06-04 19:10:23 +08:00
Qiyuan Gong
ce3f08b25a
Fix IPEX auto importer (#11192)
* Fix ipex auto importer with Python builtins.
* Raise errors if the user imports ipex manually before importing ipex_llm. Do nothing if they import ipex after importing ipex_llm.
* Remove import ipex in examples.
2024-06-04 16:57:18 +08:00
Xiangyu Tian
f02f097002
Fix vLLM verion in CPU/vLLM-Serving example README (#11201) 2024-06-04 15:56:55 +08:00
Zijie Li
a644e9409b
Miniconda/Anaconda -> Miniforge update in examples (#11194)
* Change installation address

Change former address: "https://docs.conda.io/en/latest/miniconda.html#" to new address: "https://conda-forge.org/download/" for 63 occurrences under python\llm\example

* Change Prompt

Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence
2024-06-04 10:14:02 +08:00
Qiyuan Gong
15a6205790
Fix LoRA tokenizer for Llama and chatglm (#11186)
* Set pad_token to eos_token if it's None. Otherwise, use model config.
2024-06-03 15:35:38 +08:00
Shaojun Liu
401013a630
Remove chatglm_C Module to Eliminate LGPL Dependency (#11178)
* remove chatglm_C.**.pyd to solve ngsolve weak copyright vunl

* fix style check error

* remove chatglm native int4 from langchain
2024-05-31 17:03:11 +08:00
Wang, Jian4
c0f1be6aea
Fix pp logic (#11175)
* only send no none batch and rank1-n sending first

* always send first
2024-05-30 16:40:59 +08:00
Jin Qiao
dcbf4d3d0a
Add phi-3-vision example (#11156)
* Add phi-3-vision example (HF-Automodels)

* fix

* fix

* fix

* Add phi-3-vision CPU example (HF-Automodels)

* add in readme

* fix

* fix

* fix

* fix

* use fp8 for gpu example

* remove eval
2024-05-30 10:02:47 +08:00
Jiao Wang
93146b9433
Reconstruct Speculative Decoding example directory (#11136)
* update

* update

* update
2024-05-29 13:15:27 -07:00
Xiangyu Tian
2299698b45
Refine Pipeline Parallel FastAPI example (#11168) 2024-05-29 17:16:50 +08:00
Wang, Jian4
8e25de1126
LLM: Add codegeex2 example (#11143)
* add codegeex example

* update

* update cpu

* add GPU

* add gpu

* update readme
2024-05-29 10:00:26 +08:00
ZehuaCao
751e1a4e29
Fix concurrent issue in autoTP streming. (#11150)
* add benchmark test

* update
2024-05-29 08:22:38 +08:00
SONG Ge
33852bd23e
Refactor pipeline parallel device config (#11149)
* refactor pipeline parallel device config

* meet comments

* update example

* add warnings and update code doc
2024-05-28 16:52:46 +08:00
Xiangyu Tian
b44cf405e2
Refine Pipeline-Parallel-Fastapi example README (#11155) 2024-05-28 15:18:21 +08:00
Xiangyu Tian
5c8ccf0ba9
LLM: Add Pipeline-Parallel-FastAPI example (#10917)
Add multi-stage Pipeline-Parallel-FastAPI example

---------

Co-authored-by: hzjane <a1015616934@qq.com>
2024-05-27 14:46:29 +08:00
Ruonan Wang
d550af957a
fix security issue of eagle (#11140)
* fix security issue of eagle

* small fix
2024-05-27 10:15:28 +08:00
Jean Yu
ab476c7fe2
Eagle Speculative Sampling examples (#11104)
* Eagle Speculative Sampling examples

* rm multi-gpu and ray content

* updated README to include Arc A770
2024-05-24 11:13:43 -07:00
Guancheng Fu
fabc395d0d
add langchain vllm interface (#11121)
* done

* fix

* fix

* add vllm

* add langchain vllm exampels

* add docs

* temp
2024-05-24 17:19:27 +08:00
ZehuaCao
63e95698eb
[LLM]Reopen autotp generate_stream (#11120)
* reopen autotp generate_stream

* fix style error

* update
2024-05-24 17:16:14 +08:00
Qiyuan Gong
120a0035ac
Fix type mismatch in eval for Baichuan2 QLora example (#11117)
* During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.
2024-05-24 14:14:30 +08:00
Xiangyu Tian
b3f6faa038
LLM: Add CPU vLLM entrypoint (#11083)
Add CPU vLLM entrypoint and update CPU vLLM serving example.
2024-05-24 09:16:59 +08:00
Qiyuan Gong
f6c9ffe4dc
Add WANDB_MODE and HF_HUB_OFFLINE to XPU finetune README (#11097)
* Add WANDB_MODE=offline to avoid multi-GPUs finetune errors.
* Add HF_HUB_OFFLINE=1 to avoid Hugging Face related errors.
2024-05-22 15:20:53 +08:00
Qiyuan Gong
492ed3fd41
Add verified models to GPU finetune README (#11088)
* Add verified models to GPU finetune README
2024-05-21 15:49:15 +08:00
Qiyuan Gong
1210491748
ChatGLM3, Baichuan2 and Qwen1.5 QLoRA example (#11078)
* Add chatglm3, qwen15-7b and baichuan-7b QLoRA alpaca example
* Remove unnecessary tokenization setting.
2024-05-21 15:29:43 +08:00
ZehuaCao
842d6dfc2d
Further Modify CPU example (#11081)
* modify CPU example

* update
2024-05-21 13:55:47 +08:00
binbin Deng
7170dd9192
Update guide for running qwen with AutoTP (#11065) 2024-05-20 10:53:17 +08:00
ZehuaCao
56cb992497
LLM: Modify CPU Installation Command for most examples (#11049)
* init

* refine

* refine

* refine

* modify hf-agent example

* modify all CPU model example

* remove readthedoc modify

* replace powershell with cmd

* fix repo

* fix repo

* update

* remove comment on windows code block

* update

* update

* update

* update

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-05-17 15:52:20 +08:00
Xiangyu Tian
d963e95363
LLM: Modify CPU Installation Command for documentation (#11042)
* init

* refine

* refine

* refine

* refine comments
2024-05-17 10:14:00 +08:00
Jin Qiao
9a96af4232
Remove oneAPI pip install command in related examples (#11030)
* Remove pip install command in windows installation guide

* fix chatglm3 installation guide

* Fix gemma cpu example

* Apply on other examples

* fix
2024-05-16 10:46:29 +08:00
Wang, Jian4
d9f71f1f53
Update benchmark util for example using (#11027)
* mv benchmark_util.py to utils/

* remove

* update
2024-05-15 14:16:35 +08:00
binbin Deng
4053a6ef94
Update environment variable setting in AutoTP with arc (#11018) 2024-05-15 10:23:58 +08:00
Ziteng Zhang
7d3791c819
[LLM] Add llama3 alpaca qlora example (#11011)
* Add llama3 finetune example based on alpaca qlora example
2024-05-15 09:17:32 +08:00
Qiyuan Gong
c957ea3831
Add axolotl main support and axolotl Llama-3-8B QLoRA example (#10984)
* Support axolotl main (796a085).
* Add axolotl Llama-3-8B QLoRA example.
* Change `sequence_len` to 256 for alpaca, and revert `lora_r` value.
* Add example to quick_start.
2024-05-14 13:43:59 +08:00
Wang, Jian4
f4c615b1ee
Add cohere example (#10954)
* add link first

* add_cpu_example

* add GPU example
2024-05-08 17:19:59 +08:00
Wang, Jian4
3209d6b057
Fix spculative llama3 no stop error (#10963)
* fix normal

* add eos_tokens_id on sp and add list if

* update

* no none
2024-05-08 17:09:47 +08:00
Xiangyu Tian
02870dc385
LLM: Refine README of AutoTP-FastAPI example (#10960) 2024-05-08 16:55:23 +08:00
Xin Qiu
5973d6c753
make gemma's output better (#10943) 2024-05-08 14:27:51 +08:00
Qiyuan Gong
164e6957af
Refine axolotl quickstart (#10957)
* Add default accelerate config for axolotl quickstart.
* Fix requirement link.
* Upgrade peft to 0.10.0 in requirement.
2024-05-08 09:34:02 +08:00
Qiyuan Gong
c11170b96f
Upgrade Peft to 0.10.0 in finetune examples and docker (#10930)
* Upgrade Peft to 0.10.0 in finetune examples.
* Upgrade Peft to 0.10.0 in docker.
2024-05-07 15:12:26 +08:00
Qiyuan Gong
d7ca5d935b
Upgrade Peft version to 0.10.0 for LLM finetune (#10886)
* Upgrade Peft version to 0.10.0
* Upgrade Peft version in ARC unit test and HF-Peft example.
2024-05-07 15:09:14 +08:00
hxsz1997
245c7348bc
Add codegemma example (#10884)
* add codegemma example in GPU/HF-Transformers-AutoModels/

* add README of codegemma example in GPU/HF-Transformers-AutoModels/

* add codegemma example in GPU/PyTorch-Models/

* add readme of codegemma example in GPU/PyTorch-Models/

* add codegemma example in CPU/HF-Transformers-AutoModels/

* add readme of codegemma example in CPU/HF-Transformers-AutoModels/

* add codegemma example in CPU/PyTorch-Models/

* add readme of codegemma example in CPU/PyTorch-Models/

* fix typos

* fix filename typo

* add codegemma in tables

* add comments of lm_head

* remove comments of use_cache
2024-05-07 13:35:42 +08:00
Xiangyu Tian
13a44cdacb
LLM: Refine Deepspped-AutoTP-FastAPI example (#10916) 2024-05-07 09:37:31 +08:00
Wang, Jian4
1de878bee1
LLM: Fix speculative llama3 long input error (#10934) 2024-05-07 09:25:20 +08:00
Guancheng Fu
2c64754eb0
Add vLLM to ipex-llm serving image (#10807)
* add vllm

* done

* doc work

* fix done

* temp

* add docs

* format

* add start-fastchat-service.sh

* fix
2024-04-29 17:25:42 +08:00
Jin Qiao
1f876fd837
Add example for phi-3 (#10881)
* Add example for phi-3

* add in readme and index

* fix

* fix

* fix

* fix indent

* fix
2024-04-29 16:43:55 +08:00
Xiangyu Tian
3d4950b0f0
LLM: Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example (#10876)
Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example.
2024-04-26 13:24:28 +08:00