Yuwen Hu
8c36b5bdde
Add qwen2 example ( #11252 )
...
* Add GPU example for Qwen2
* Update comments in README
* Update README for Qwen2 GPU example
* Add CPU example for Qwen2
Sample Output under README pending
* Update generate.py and README for CPU Qwen2
* Update GPU example for Qwen2
* Small update
* Small fix
* Add Qwen2 table
* Update README for Qwen2 CPU and GPU
Update sample output under README
---------
Co-authored-by: Zijie Li <michael20001122@gmail.com>
2024-06-07 10:29:33 +08:00
Shaojun Liu
85df5e7699
fix nightly perf test ( #11251 )
2024-06-07 09:33:14 +08:00
Guoqiong Song
09c6780d0c
phi-2 transformers 4.37 ( #11161 )
...
* phi-2 transformers 4.37
2024-06-05 13:36:41 -07:00
Zijie Li
bfa1367149
Add CPU and GPU example for MiniCPM ( #11202 )
...
* Change installation address
Change former address: "https://docs.conda.io/en/latest/miniconda.html# " to new address: "https://conda-forge.org/download/ " for 63 occurrences under python\llm\example
* Change Prompt
Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence
* Create and update model minicpm
* Update model minicpm
Update model minicpm under GPU/PyTorch-Models
* Update readme and generate.py
change "prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)" and delete "pip install transformers==4.37.0
"
* Update comments for minicpm GPU
Update comments for generate.py at minicpm GPU
* Add CPU example for MiniCPM
* Update minicpm README for CPU
* Update README for MiniCPM and Llama3
* Update Readme for Llama3 CPU Pytorch
* Update and fix comments for MiniCPM
2024-06-05 18:09:53 +08:00
Yuwen Hu
af96579c76
Update installation guide for pipeline parallel inference ( #11224 )
...
* Update installation guide for pipeline parallel inference
* Small fix
* further fix
* Small fix
* Small fix
* Update based on comments
* Small fix
* Small fix
* Small fix
2024-06-05 17:54:29 +08:00
Xiangyu Tian
ac3d53ff5d
LLM: Fix vLLM CPU version error ( #11206 )
...
Fix vLLM CPU version error
2024-06-04 19:10:23 +08:00
Qiyuan Gong
ce3f08b25a
Fix IPEX auto importer ( #11192 )
...
* Fix ipex auto importer with Python builtins.
* Raise errors if the user imports ipex manually before importing ipex_llm. Do nothing if they import ipex after importing ipex_llm.
* Remove import ipex in examples.
2024-06-04 16:57:18 +08:00
Xiangyu Tian
f02f097002
Fix vLLM verion in CPU/vLLM-Serving example README ( #11201 )
2024-06-04 15:56:55 +08:00
Zijie Li
a644e9409b
Miniconda/Anaconda -> Miniforge update in examples ( #11194 )
...
* Change installation address
Change former address: "https://docs.conda.io/en/latest/miniconda.html# " to new address: "https://conda-forge.org/download/ " for 63 occurrences under python\llm\example
* Change Prompt
Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence
2024-06-04 10:14:02 +08:00
Qiyuan Gong
15a6205790
Fix LoRA tokenizer for Llama and chatglm ( #11186 )
...
* Set pad_token to eos_token if it's None. Otherwise, use model config.
2024-06-03 15:35:38 +08:00
Shaojun Liu
401013a630
Remove chatglm_C Module to Eliminate LGPL Dependency ( #11178 )
...
* remove chatglm_C.**.pyd to solve ngsolve weak copyright vunl
* fix style check error
* remove chatglm native int4 from langchain
2024-05-31 17:03:11 +08:00
Wang, Jian4
c0f1be6aea
Fix pp logic ( #11175 )
...
* only send no none batch and rank1-n sending first
* always send first
2024-05-30 16:40:59 +08:00
Jin Qiao
dcbf4d3d0a
Add phi-3-vision example ( #11156 )
...
* Add phi-3-vision example (HF-Automodels)
* fix
* fix
* fix
* Add phi-3-vision CPU example (HF-Automodels)
* add in readme
* fix
* fix
* fix
* fix
* use fp8 for gpu example
* remove eval
2024-05-30 10:02:47 +08:00
Jiao Wang
93146b9433
Reconstruct Speculative Decoding example directory ( #11136 )
...
* update
* update
* update
2024-05-29 13:15:27 -07:00
Xiangyu Tian
2299698b45
Refine Pipeline Parallel FastAPI example ( #11168 )
2024-05-29 17:16:50 +08:00
Wang, Jian4
8e25de1126
LLM: Add codegeex2 example ( #11143 )
...
* add codegeex example
* update
* update cpu
* add GPU
* add gpu
* update readme
2024-05-29 10:00:26 +08:00
ZehuaCao
751e1a4e29
Fix concurrent issue in autoTP streming. ( #11150 )
...
* add benchmark test
* update
2024-05-29 08:22:38 +08:00
SONG Ge
33852bd23e
Refactor pipeline parallel device config ( #11149 )
...
* refactor pipeline parallel device config
* meet comments
* update example
* add warnings and update code doc
2024-05-28 16:52:46 +08:00
Xiangyu Tian
b44cf405e2
Refine Pipeline-Parallel-Fastapi example README ( #11155 )
2024-05-28 15:18:21 +08:00
Xiangyu Tian
5c8ccf0ba9
LLM: Add Pipeline-Parallel-FastAPI example ( #10917 )
...
Add multi-stage Pipeline-Parallel-FastAPI example
---------
Co-authored-by: hzjane <a1015616934@qq.com>
2024-05-27 14:46:29 +08:00
Ruonan Wang
d550af957a
fix security issue of eagle ( #11140 )
...
* fix security issue of eagle
* small fix
2024-05-27 10:15:28 +08:00
Jean Yu
ab476c7fe2
Eagle Speculative Sampling examples ( #11104 )
...
* Eagle Speculative Sampling examples
* rm multi-gpu and ray content
* updated README to include Arc A770
2024-05-24 11:13:43 -07:00
Guancheng Fu
fabc395d0d
add langchain vllm interface ( #11121 )
...
* done
* fix
* fix
* add vllm
* add langchain vllm exampels
* add docs
* temp
2024-05-24 17:19:27 +08:00
ZehuaCao
63e95698eb
[LLM]Reopen autotp generate_stream ( #11120 )
...
* reopen autotp generate_stream
* fix style error
* update
2024-05-24 17:16:14 +08:00
Qiyuan Gong
120a0035ac
Fix type mismatch in eval for Baichuan2 QLora example ( #11117 )
...
* During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.
2024-05-24 14:14:30 +08:00
Xiangyu Tian
b3f6faa038
LLM: Add CPU vLLM entrypoint ( #11083 )
...
Add CPU vLLM entrypoint and update CPU vLLM serving example.
2024-05-24 09:16:59 +08:00
Qiyuan Gong
f6c9ffe4dc
Add WANDB_MODE and HF_HUB_OFFLINE to XPU finetune README ( #11097 )
...
* Add WANDB_MODE=offline to avoid multi-GPUs finetune errors.
* Add HF_HUB_OFFLINE=1 to avoid Hugging Face related errors.
2024-05-22 15:20:53 +08:00
Qiyuan Gong
492ed3fd41
Add verified models to GPU finetune README ( #11088 )
...
* Add verified models to GPU finetune README
2024-05-21 15:49:15 +08:00
Qiyuan Gong
1210491748
ChatGLM3, Baichuan2 and Qwen1.5 QLoRA example ( #11078 )
...
* Add chatglm3, qwen15-7b and baichuan-7b QLoRA alpaca example
* Remove unnecessary tokenization setting.
2024-05-21 15:29:43 +08:00
ZehuaCao
842d6dfc2d
Further Modify CPU example ( #11081 )
...
* modify CPU example
* update
2024-05-21 13:55:47 +08:00
binbin Deng
7170dd9192
Update guide for running qwen with AutoTP ( #11065 )
2024-05-20 10:53:17 +08:00
ZehuaCao
56cb992497
LLM: Modify CPU Installation Command for most examples ( #11049 )
...
* init
* refine
* refine
* refine
* modify hf-agent example
* modify all CPU model example
* remove readthedoc modify
* replace powershell with cmd
* fix repo
* fix repo
* update
* remove comment on windows code block
* update
* update
* update
* update
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-05-17 15:52:20 +08:00
Xiangyu Tian
d963e95363
LLM: Modify CPU Installation Command for documentation ( #11042 )
...
* init
* refine
* refine
* refine
* refine comments
2024-05-17 10:14:00 +08:00
Jin Qiao
9a96af4232
Remove oneAPI pip install command in related examples ( #11030 )
...
* Remove pip install command in windows installation guide
* fix chatglm3 installation guide
* Fix gemma cpu example
* Apply on other examples
* fix
2024-05-16 10:46:29 +08:00
Wang, Jian4
d9f71f1f53
Update benchmark util for example using ( #11027 )
...
* mv benchmark_util.py to utils/
* remove
* update
2024-05-15 14:16:35 +08:00
binbin Deng
4053a6ef94
Update environment variable setting in AutoTP with arc ( #11018 )
2024-05-15 10:23:58 +08:00
Ziteng Zhang
7d3791c819
[LLM] Add llama3 alpaca qlora example ( #11011 )
...
* Add llama3 finetune example based on alpaca qlora example
2024-05-15 09:17:32 +08:00
Qiyuan Gong
c957ea3831
Add axolotl main support and axolotl Llama-3-8B QLoRA example ( #10984 )
...
* Support axolotl main (796a085).
* Add axolotl Llama-3-8B QLoRA example.
* Change `sequence_len` to 256 for alpaca, and revert `lora_r` value.
* Add example to quick_start.
2024-05-14 13:43:59 +08:00
Wang, Jian4
f4c615b1ee
Add cohere example ( #10954 )
...
* add link first
* add_cpu_example
* add GPU example
2024-05-08 17:19:59 +08:00
Wang, Jian4
3209d6b057
Fix spculative llama3 no stop error ( #10963 )
...
* fix normal
* add eos_tokens_id on sp and add list if
* update
* no none
2024-05-08 17:09:47 +08:00
Xiangyu Tian
02870dc385
LLM: Refine README of AutoTP-FastAPI example ( #10960 )
2024-05-08 16:55:23 +08:00
Xin Qiu
5973d6c753
make gemma's output better ( #10943 )
2024-05-08 14:27:51 +08:00
Qiyuan Gong
164e6957af
Refine axolotl quickstart ( #10957 )
...
* Add default accelerate config for axolotl quickstart.
* Fix requirement link.
* Upgrade peft to 0.10.0 in requirement.
2024-05-08 09:34:02 +08:00
Qiyuan Gong
c11170b96f
Upgrade Peft to 0.10.0 in finetune examples and docker ( #10930 )
...
* Upgrade Peft to 0.10.0 in finetune examples.
* Upgrade Peft to 0.10.0 in docker.
2024-05-07 15:12:26 +08:00
Qiyuan Gong
d7ca5d935b
Upgrade Peft version to 0.10.0 for LLM finetune ( #10886 )
...
* Upgrade Peft version to 0.10.0
* Upgrade Peft version in ARC unit test and HF-Peft example.
2024-05-07 15:09:14 +08:00
hxsz1997
245c7348bc
Add codegemma example ( #10884 )
...
* add codegemma example in GPU/HF-Transformers-AutoModels/
* add README of codegemma example in GPU/HF-Transformers-AutoModels/
* add codegemma example in GPU/PyTorch-Models/
* add readme of codegemma example in GPU/PyTorch-Models/
* add codegemma example in CPU/HF-Transformers-AutoModels/
* add readme of codegemma example in CPU/HF-Transformers-AutoModels/
* add codegemma example in CPU/PyTorch-Models/
* add readme of codegemma example in CPU/PyTorch-Models/
* fix typos
* fix filename typo
* add codegemma in tables
* add comments of lm_head
* remove comments of use_cache
2024-05-07 13:35:42 +08:00
Xiangyu Tian
13a44cdacb
LLM: Refine Deepspped-AutoTP-FastAPI example ( #10916 )
2024-05-07 09:37:31 +08:00
Wang, Jian4
1de878bee1
LLM: Fix speculative llama3 long input error ( #10934 )
2024-05-07 09:25:20 +08:00
Guancheng Fu
2c64754eb0
Add vLLM to ipex-llm serving image ( #10807 )
...
* add vllm
* done
* doc work
* fix done
* temp
* add docs
* format
* add start-fastchat-service.sh
* fix
2024-04-29 17:25:42 +08:00
Jin Qiao
1f876fd837
Add example for phi-3 ( #10881 )
...
* Add example for phi-3
* add in readme and index
* fix
* fix
* fix
* fix indent
* fix
2024-04-29 16:43:55 +08:00