Shaojun Liu
401013a630
Remove chatglm_C Module to Eliminate LGPL Dependency ( #11178 )
...
* remove chatglm_C.**.pyd to solve ngsolve weak copyright vunl
* fix style check error
* remove chatglm native int4 from langchain
2024-05-31 17:03:11 +08:00
Wang, Jian4
c0f1be6aea
Fix pp logic ( #11175 )
...
* only send no none batch and rank1-n sending first
* always send first
2024-05-30 16:40:59 +08:00
Jin Qiao
dcbf4d3d0a
Add phi-3-vision example ( #11156 )
...
* Add phi-3-vision example (HF-Automodels)
* fix
* fix
* fix
* Add phi-3-vision CPU example (HF-Automodels)
* add in readme
* fix
* fix
* fix
* fix
* use fp8 for gpu example
* remove eval
2024-05-30 10:02:47 +08:00
Jiao Wang
93146b9433
Reconstruct Speculative Decoding example directory ( #11136 )
...
* update
* update
* update
2024-05-29 13:15:27 -07:00
Xiangyu Tian
2299698b45
Refine Pipeline Parallel FastAPI example ( #11168 )
2024-05-29 17:16:50 +08:00
Wang, Jian4
8e25de1126
LLM: Add codegeex2 example ( #11143 )
...
* add codegeex example
* update
* update cpu
* add GPU
* add gpu
* update readme
2024-05-29 10:00:26 +08:00
ZehuaCao
751e1a4e29
Fix concurrent issue in autoTP streming. ( #11150 )
...
* add benchmark test
* update
2024-05-29 08:22:38 +08:00
SONG Ge
33852bd23e
Refactor pipeline parallel device config ( #11149 )
...
* refactor pipeline parallel device config
* meet comments
* update example
* add warnings and update code doc
2024-05-28 16:52:46 +08:00
Xiangyu Tian
b44cf405e2
Refine Pipeline-Parallel-Fastapi example README ( #11155 )
2024-05-28 15:18:21 +08:00
Xiangyu Tian
5c8ccf0ba9
LLM: Add Pipeline-Parallel-FastAPI example ( #10917 )
...
Add multi-stage Pipeline-Parallel-FastAPI example
---------
Co-authored-by: hzjane <a1015616934@qq.com>
2024-05-27 14:46:29 +08:00
Ruonan Wang
d550af957a
fix security issue of eagle ( #11140 )
...
* fix security issue of eagle
* small fix
2024-05-27 10:15:28 +08:00
Jean Yu
ab476c7fe2
Eagle Speculative Sampling examples ( #11104 )
...
* Eagle Speculative Sampling examples
* rm multi-gpu and ray content
* updated README to include Arc A770
2024-05-24 11:13:43 -07:00
Guancheng Fu
fabc395d0d
add langchain vllm interface ( #11121 )
...
* done
* fix
* fix
* add vllm
* add langchain vllm exampels
* add docs
* temp
2024-05-24 17:19:27 +08:00
ZehuaCao
63e95698eb
[LLM]Reopen autotp generate_stream ( #11120 )
...
* reopen autotp generate_stream
* fix style error
* update
2024-05-24 17:16:14 +08:00
Qiyuan Gong
120a0035ac
Fix type mismatch in eval for Baichuan2 QLora example ( #11117 )
...
* During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.
2024-05-24 14:14:30 +08:00
Xiangyu Tian
b3f6faa038
LLM: Add CPU vLLM entrypoint ( #11083 )
...
Add CPU vLLM entrypoint and update CPU vLLM serving example.
2024-05-24 09:16:59 +08:00
Qiyuan Gong
f6c9ffe4dc
Add WANDB_MODE and HF_HUB_OFFLINE to XPU finetune README ( #11097 )
...
* Add WANDB_MODE=offline to avoid multi-GPUs finetune errors.
* Add HF_HUB_OFFLINE=1 to avoid Hugging Face related errors.
2024-05-22 15:20:53 +08:00
Qiyuan Gong
492ed3fd41
Add verified models to GPU finetune README ( #11088 )
...
* Add verified models to GPU finetune README
2024-05-21 15:49:15 +08:00
Qiyuan Gong
1210491748
ChatGLM3, Baichuan2 and Qwen1.5 QLoRA example ( #11078 )
...
* Add chatglm3, qwen15-7b and baichuan-7b QLoRA alpaca example
* Remove unnecessary tokenization setting.
2024-05-21 15:29:43 +08:00
ZehuaCao
842d6dfc2d
Further Modify CPU example ( #11081 )
...
* modify CPU example
* update
2024-05-21 13:55:47 +08:00
binbin Deng
7170dd9192
Update guide for running qwen with AutoTP ( #11065 )
2024-05-20 10:53:17 +08:00
ZehuaCao
56cb992497
LLM: Modify CPU Installation Command for most examples ( #11049 )
...
* init
* refine
* refine
* refine
* modify hf-agent example
* modify all CPU model example
* remove readthedoc modify
* replace powershell with cmd
* fix repo
* fix repo
* update
* remove comment on windows code block
* update
* update
* update
* update
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-05-17 15:52:20 +08:00
Xiangyu Tian
d963e95363
LLM: Modify CPU Installation Command for documentation ( #11042 )
...
* init
* refine
* refine
* refine
* refine comments
2024-05-17 10:14:00 +08:00
Jin Qiao
9a96af4232
Remove oneAPI pip install command in related examples ( #11030 )
...
* Remove pip install command in windows installation guide
* fix chatglm3 installation guide
* Fix gemma cpu example
* Apply on other examples
* fix
2024-05-16 10:46:29 +08:00
Wang, Jian4
d9f71f1f53
Update benchmark util for example using ( #11027 )
...
* mv benchmark_util.py to utils/
* remove
* update
2024-05-15 14:16:35 +08:00
binbin Deng
4053a6ef94
Update environment variable setting in AutoTP with arc ( #11018 )
2024-05-15 10:23:58 +08:00
Ziteng Zhang
7d3791c819
[LLM] Add llama3 alpaca qlora example ( #11011 )
...
* Add llama3 finetune example based on alpaca qlora example
2024-05-15 09:17:32 +08:00
Qiyuan Gong
c957ea3831
Add axolotl main support and axolotl Llama-3-8B QLoRA example ( #10984 )
...
* Support axolotl main (796a085).
* Add axolotl Llama-3-8B QLoRA example.
* Change `sequence_len` to 256 for alpaca, and revert `lora_r` value.
* Add example to quick_start.
2024-05-14 13:43:59 +08:00
Wang, Jian4
f4c615b1ee
Add cohere example ( #10954 )
...
* add link first
* add_cpu_example
* add GPU example
2024-05-08 17:19:59 +08:00
Wang, Jian4
3209d6b057
Fix spculative llama3 no stop error ( #10963 )
...
* fix normal
* add eos_tokens_id on sp and add list if
* update
* no none
2024-05-08 17:09:47 +08:00
Xiangyu Tian
02870dc385
LLM: Refine README of AutoTP-FastAPI example ( #10960 )
2024-05-08 16:55:23 +08:00
Xin Qiu
5973d6c753
make gemma's output better ( #10943 )
2024-05-08 14:27:51 +08:00
Qiyuan Gong
164e6957af
Refine axolotl quickstart ( #10957 )
...
* Add default accelerate config for axolotl quickstart.
* Fix requirement link.
* Upgrade peft to 0.10.0 in requirement.
2024-05-08 09:34:02 +08:00
Qiyuan Gong
c11170b96f
Upgrade Peft to 0.10.0 in finetune examples and docker ( #10930 )
...
* Upgrade Peft to 0.10.0 in finetune examples.
* Upgrade Peft to 0.10.0 in docker.
2024-05-07 15:12:26 +08:00
Qiyuan Gong
d7ca5d935b
Upgrade Peft version to 0.10.0 for LLM finetune ( #10886 )
...
* Upgrade Peft version to 0.10.0
* Upgrade Peft version in ARC unit test and HF-Peft example.
2024-05-07 15:09:14 +08:00
hxsz1997
245c7348bc
Add codegemma example ( #10884 )
...
* add codegemma example in GPU/HF-Transformers-AutoModels/
* add README of codegemma example in GPU/HF-Transformers-AutoModels/
* add codegemma example in GPU/PyTorch-Models/
* add readme of codegemma example in GPU/PyTorch-Models/
* add codegemma example in CPU/HF-Transformers-AutoModels/
* add readme of codegemma example in CPU/HF-Transformers-AutoModels/
* add codegemma example in CPU/PyTorch-Models/
* add readme of codegemma example in CPU/PyTorch-Models/
* fix typos
* fix filename typo
* add codegemma in tables
* add comments of lm_head
* remove comments of use_cache
2024-05-07 13:35:42 +08:00
Xiangyu Tian
13a44cdacb
LLM: Refine Deepspped-AutoTP-FastAPI example ( #10916 )
2024-05-07 09:37:31 +08:00
Wang, Jian4
1de878bee1
LLM: Fix speculative llama3 long input error ( #10934 )
2024-05-07 09:25:20 +08:00
Guancheng Fu
2c64754eb0
Add vLLM to ipex-llm serving image ( #10807 )
...
* add vllm
* done
* doc work
* fix done
* temp
* add docs
* format
* add start-fastchat-service.sh
* fix
2024-04-29 17:25:42 +08:00
Jin Qiao
1f876fd837
Add example for phi-3 ( #10881 )
...
* Add example for phi-3
* add in readme and index
* fix
* fix
* fix
* fix indent
* fix
2024-04-29 16:43:55 +08:00
Xiangyu Tian
3d4950b0f0
LLM: Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example ( #10876 )
...
Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example.
2024-04-26 13:24:28 +08:00
Yang Wang
1ce8d7bcd9
Support the desc_act feature in GPTQ model ( #10851 )
...
* support act_order
* update versions
* fix style
* fix bug
* clean up
2024-04-24 10:17:13 -07:00
binbin Deng
fabf54e052
LLM: make pipeline parallel inference example more common ( #10786 )
2024-04-24 09:28:52 +08:00
hxsz1997
328b1a1de9
Fix the not stop issue of llama3 examples ( #10860 )
...
* fix not stop issue in GPU/HF-Transformers-AutoModels
* fix not stop issue in GPU/PyTorch-Models/Model/llama3
* fix not stop issue in CPU/HF-Transformers-AutoModels/Model/llama3
* fix not stop issue in CPU/PyTorch-Models/Model/llama3
* update the output in readme
* update format
* add reference
* update prompt format
* update output format in readme
* update example output in readme
2024-04-23 19:10:09 +08:00
ZehuaCao
36eb8b2e96
Add llama3 speculative example ( #10856 )
...
* Initial llama3 speculative example
* update README
* update README
* update README
2024-04-23 17:03:54 +08:00
ZehuaCao
92ea54b512
Fix speculative decoding bug ( #10855 )
2024-04-23 14:28:31 +08:00
Wang, Jian4
18c032652d
LLM: Add mixtral speculative CPU example ( #10830 )
...
* init mixtral sp example
* use different prompt_format
* update output
* update
2024-04-23 10:05:51 +08:00
Qiyuan Gong
5494aa55f6
Downgrade datasets in axolotl example ( #10849 )
...
* Downgrade datasets to 2.15.0 to address axolotl prepare issue https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544
Tks to @kwaa for providing the solution in https://github.com/intel-analytics/ipex-llm/issues/10821#issuecomment-2068861571
2024-04-23 09:41:58 +08:00
Guancheng Fu
47bd5f504c
[vLLM]Remove vllm-v1, refactor v2 ( #10842 )
...
* remove vllm-v1
* fix format
2024-04-22 17:51:32 +08:00
Wang, Jian4
23c6a52fb0
LLM: Fix ipex torchscript=True error ( #10832 )
...
* remove
* update
* remove torchscript
2024-04-22 15:53:09 +08:00