Commit graph

427 commits

Author SHA1 Message Date
Shaojun Liu
401013a630
Remove chatglm_C Module to Eliminate LGPL Dependency (#11178)
* remove chatglm_C.**.pyd to solve ngsolve weak copyright vunl

* fix style check error

* remove chatglm native int4 from langchain
2024-05-31 17:03:11 +08:00
Wang, Jian4
c0f1be6aea
Fix pp logic (#11175)
* only send no none batch and rank1-n sending first

* always send first
2024-05-30 16:40:59 +08:00
Jin Qiao
dcbf4d3d0a
Add phi-3-vision example (#11156)
* Add phi-3-vision example (HF-Automodels)

* fix

* fix

* fix

* Add phi-3-vision CPU example (HF-Automodels)

* add in readme

* fix

* fix

* fix

* fix

* use fp8 for gpu example

* remove eval
2024-05-30 10:02:47 +08:00
Jiao Wang
93146b9433
Reconstruct Speculative Decoding example directory (#11136)
* update

* update

* update
2024-05-29 13:15:27 -07:00
Xiangyu Tian
2299698b45
Refine Pipeline Parallel FastAPI example (#11168) 2024-05-29 17:16:50 +08:00
Wang, Jian4
8e25de1126
LLM: Add codegeex2 example (#11143)
* add codegeex example

* update

* update cpu

* add GPU

* add gpu

* update readme
2024-05-29 10:00:26 +08:00
ZehuaCao
751e1a4e29
Fix concurrent issue in autoTP streming. (#11150)
* add benchmark test

* update
2024-05-29 08:22:38 +08:00
SONG Ge
33852bd23e
Refactor pipeline parallel device config (#11149)
* refactor pipeline parallel device config

* meet comments

* update example

* add warnings and update code doc
2024-05-28 16:52:46 +08:00
Xiangyu Tian
b44cf405e2
Refine Pipeline-Parallel-Fastapi example README (#11155) 2024-05-28 15:18:21 +08:00
Xiangyu Tian
5c8ccf0ba9
LLM: Add Pipeline-Parallel-FastAPI example (#10917)
Add multi-stage Pipeline-Parallel-FastAPI example

---------

Co-authored-by: hzjane <a1015616934@qq.com>
2024-05-27 14:46:29 +08:00
Ruonan Wang
d550af957a
fix security issue of eagle (#11140)
* fix security issue of eagle

* small fix
2024-05-27 10:15:28 +08:00
Jean Yu
ab476c7fe2
Eagle Speculative Sampling examples (#11104)
* Eagle Speculative Sampling examples

* rm multi-gpu and ray content

* updated README to include Arc A770
2024-05-24 11:13:43 -07:00
Guancheng Fu
fabc395d0d
add langchain vllm interface (#11121)
* done

* fix

* fix

* add vllm

* add langchain vllm exampels

* add docs

* temp
2024-05-24 17:19:27 +08:00
ZehuaCao
63e95698eb
[LLM]Reopen autotp generate_stream (#11120)
* reopen autotp generate_stream

* fix style error

* update
2024-05-24 17:16:14 +08:00
Qiyuan Gong
120a0035ac
Fix type mismatch in eval for Baichuan2 QLora example (#11117)
* During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.
2024-05-24 14:14:30 +08:00
Xiangyu Tian
b3f6faa038
LLM: Add CPU vLLM entrypoint (#11083)
Add CPU vLLM entrypoint and update CPU vLLM serving example.
2024-05-24 09:16:59 +08:00
Qiyuan Gong
f6c9ffe4dc
Add WANDB_MODE and HF_HUB_OFFLINE to XPU finetune README (#11097)
* Add WANDB_MODE=offline to avoid multi-GPUs finetune errors.
* Add HF_HUB_OFFLINE=1 to avoid Hugging Face related errors.
2024-05-22 15:20:53 +08:00
Qiyuan Gong
492ed3fd41
Add verified models to GPU finetune README (#11088)
* Add verified models to GPU finetune README
2024-05-21 15:49:15 +08:00
Qiyuan Gong
1210491748
ChatGLM3, Baichuan2 and Qwen1.5 QLoRA example (#11078)
* Add chatglm3, qwen15-7b and baichuan-7b QLoRA alpaca example
* Remove unnecessary tokenization setting.
2024-05-21 15:29:43 +08:00
ZehuaCao
842d6dfc2d
Further Modify CPU example (#11081)
* modify CPU example

* update
2024-05-21 13:55:47 +08:00
binbin Deng
7170dd9192
Update guide for running qwen with AutoTP (#11065) 2024-05-20 10:53:17 +08:00
ZehuaCao
56cb992497
LLM: Modify CPU Installation Command for most examples (#11049)
* init

* refine

* refine

* refine

* modify hf-agent example

* modify all CPU model example

* remove readthedoc modify

* replace powershell with cmd

* fix repo

* fix repo

* update

* remove comment on windows code block

* update

* update

* update

* update

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-05-17 15:52:20 +08:00
Xiangyu Tian
d963e95363
LLM: Modify CPU Installation Command for documentation (#11042)
* init

* refine

* refine

* refine

* refine comments
2024-05-17 10:14:00 +08:00
Jin Qiao
9a96af4232
Remove oneAPI pip install command in related examples (#11030)
* Remove pip install command in windows installation guide

* fix chatglm3 installation guide

* Fix gemma cpu example

* Apply on other examples

* fix
2024-05-16 10:46:29 +08:00
Wang, Jian4
d9f71f1f53
Update benchmark util for example using (#11027)
* mv benchmark_util.py to utils/

* remove

* update
2024-05-15 14:16:35 +08:00
binbin Deng
4053a6ef94
Update environment variable setting in AutoTP with arc (#11018) 2024-05-15 10:23:58 +08:00
Ziteng Zhang
7d3791c819
[LLM] Add llama3 alpaca qlora example (#11011)
* Add llama3 finetune example based on alpaca qlora example
2024-05-15 09:17:32 +08:00
Qiyuan Gong
c957ea3831
Add axolotl main support and axolotl Llama-3-8B QLoRA example (#10984)
* Support axolotl main (796a085).
* Add axolotl Llama-3-8B QLoRA example.
* Change `sequence_len` to 256 for alpaca, and revert `lora_r` value.
* Add example to quick_start.
2024-05-14 13:43:59 +08:00
Wang, Jian4
f4c615b1ee
Add cohere example (#10954)
* add link first

* add_cpu_example

* add GPU example
2024-05-08 17:19:59 +08:00
Wang, Jian4
3209d6b057
Fix spculative llama3 no stop error (#10963)
* fix normal

* add eos_tokens_id on sp and add list if

* update

* no none
2024-05-08 17:09:47 +08:00
Xiangyu Tian
02870dc385
LLM: Refine README of AutoTP-FastAPI example (#10960) 2024-05-08 16:55:23 +08:00
Xin Qiu
5973d6c753
make gemma's output better (#10943) 2024-05-08 14:27:51 +08:00
Qiyuan Gong
164e6957af
Refine axolotl quickstart (#10957)
* Add default accelerate config for axolotl quickstart.
* Fix requirement link.
* Upgrade peft to 0.10.0 in requirement.
2024-05-08 09:34:02 +08:00
Qiyuan Gong
c11170b96f
Upgrade Peft to 0.10.0 in finetune examples and docker (#10930)
* Upgrade Peft to 0.10.0 in finetune examples.
* Upgrade Peft to 0.10.0 in docker.
2024-05-07 15:12:26 +08:00
Qiyuan Gong
d7ca5d935b
Upgrade Peft version to 0.10.0 for LLM finetune (#10886)
* Upgrade Peft version to 0.10.0
* Upgrade Peft version in ARC unit test and HF-Peft example.
2024-05-07 15:09:14 +08:00
hxsz1997
245c7348bc
Add codegemma example (#10884)
* add codegemma example in GPU/HF-Transformers-AutoModels/

* add README of codegemma example in GPU/HF-Transformers-AutoModels/

* add codegemma example in GPU/PyTorch-Models/

* add readme of codegemma example in GPU/PyTorch-Models/

* add codegemma example in CPU/HF-Transformers-AutoModels/

* add readme of codegemma example in CPU/HF-Transformers-AutoModels/

* add codegemma example in CPU/PyTorch-Models/

* add readme of codegemma example in CPU/PyTorch-Models/

* fix typos

* fix filename typo

* add codegemma in tables

* add comments of lm_head

* remove comments of use_cache
2024-05-07 13:35:42 +08:00
Xiangyu Tian
13a44cdacb
LLM: Refine Deepspped-AutoTP-FastAPI example (#10916) 2024-05-07 09:37:31 +08:00
Wang, Jian4
1de878bee1
LLM: Fix speculative llama3 long input error (#10934) 2024-05-07 09:25:20 +08:00
Guancheng Fu
2c64754eb0
Add vLLM to ipex-llm serving image (#10807)
* add vllm

* done

* doc work

* fix done

* temp

* add docs

* format

* add start-fastchat-service.sh

* fix
2024-04-29 17:25:42 +08:00
Jin Qiao
1f876fd837
Add example for phi-3 (#10881)
* Add example for phi-3

* add in readme and index

* fix

* fix

* fix

* fix indent

* fix
2024-04-29 16:43:55 +08:00
Xiangyu Tian
3d4950b0f0
LLM: Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example (#10876)
Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example.
2024-04-26 13:24:28 +08:00
Yang Wang
1ce8d7bcd9
Support the desc_act feature in GPTQ model (#10851)
* support act_order

* update versions

* fix style

* fix bug

* clean up
2024-04-24 10:17:13 -07:00
binbin Deng
fabf54e052
LLM: make pipeline parallel inference example more common (#10786) 2024-04-24 09:28:52 +08:00
hxsz1997
328b1a1de9
Fix the not stop issue of llama3 examples (#10860)
* fix not stop issue in GPU/HF-Transformers-AutoModels

* fix not stop issue in GPU/PyTorch-Models/Model/llama3

* fix not stop issue in CPU/HF-Transformers-AutoModels/Model/llama3

* fix not stop issue in CPU/PyTorch-Models/Model/llama3

* update the output in readme

* update format

* add reference

* update prompt format

* update output format in readme

* update example output in readme
2024-04-23 19:10:09 +08:00
ZehuaCao
36eb8b2e96
Add llama3 speculative example (#10856)
* Initial llama3 speculative example

* update README

* update README

* update README
2024-04-23 17:03:54 +08:00
ZehuaCao
92ea54b512
Fix speculative decoding bug (#10855) 2024-04-23 14:28:31 +08:00
Wang, Jian4
18c032652d
LLM: Add mixtral speculative CPU example (#10830)
* init mixtral sp example

* use different prompt_format

* update output

* update
2024-04-23 10:05:51 +08:00
Qiyuan Gong
5494aa55f6
Downgrade datasets in axolotl example (#10849)
* Downgrade datasets to 2.15.0 to address axolotl prepare issue https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544

Tks to @kwaa for providing the solution in https://github.com/intel-analytics/ipex-llm/issues/10821#issuecomment-2068861571
2024-04-23 09:41:58 +08:00
Guancheng Fu
47bd5f504c
[vLLM]Remove vllm-v1, refactor v2 (#10842)
* remove vllm-v1

* fix format
2024-04-22 17:51:32 +08:00
Wang, Jian4
23c6a52fb0
LLM: Fix ipex torchscript=True error (#10832)
* remove

* update

* remove torchscript
2024-04-22 15:53:09 +08:00