Commit graph

1397 commits

Author SHA1 Message Date
Yina Chen
711fa0199e
Fix fp6k phi3 ppl core dump (#11204) 2024-06-04 16:44:27 +08:00
Xiangyu Tian
f02f097002
Fix vLLM verion in CPU/vLLM-Serving example README (#11201) 2024-06-04 15:56:55 +08:00
Yishuo Wang
6454655dcc
use sdp in baichuan2 13b (#11198) 2024-06-04 15:39:00 +08:00
Yishuo Wang
d90cd977d0
refactor stablelm (#11195) 2024-06-04 13:14:43 +08:00
Zijie Li
a644e9409b
Miniconda/Anaconda -> Miniforge update in examples (#11194)
* Change installation address

Change former address: "https://docs.conda.io/en/latest/miniconda.html#" to new address: "https://conda-forge.org/download/" for 63 occurrences under python\llm\example

* Change Prompt

Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence
2024-06-04 10:14:02 +08:00
Xin Qiu
5f13700c9f
optimize Minicpm (#11189)
* minicpm optimize

* update
2024-06-03 18:28:29 +08:00
Qiyuan Gong
15a6205790
Fix LoRA tokenizer for Llama and chatglm (#11186)
* Set pad_token to eos_token if it's None. Otherwise, use model config.
2024-06-03 15:35:38 +08:00
Cengguang Zhang
3eb13ccd8c
LLM: fix input length condition in deepspeed all-in-one benchmark. (#11185) 2024-06-03 10:05:43 +08:00
Shaojun Liu
401013a630
Remove chatglm_C Module to Eliminate LGPL Dependency (#11178)
* remove chatglm_C.**.pyd to solve ngsolve weak copyright vunl

* fix style check error

* remove chatglm native int4 from langchain
2024-05-31 17:03:11 +08:00
Ruonan Wang
50b5f4476f
update q4k convert (#11179) 2024-05-31 11:36:53 +08:00
Wang, Jian4
c0f1be6aea
Fix pp logic (#11175)
* only send no none batch and rank1-n sending first

* always send first
2024-05-30 16:40:59 +08:00
ZehuaCao
4127b99ed6
Fix null pointer dereferences error. (#11125)
* delete unused function on tgi_server

* update

* update

* fix style
2024-05-30 16:16:10 +08:00
Guancheng Fu
50ee004ac7
Fix vllm condition (#11169)
* add use-vllm

* done

* fix style

* fix done
2024-05-30 15:23:17 +08:00
Jin Qiao
dcbf4d3d0a
Add phi-3-vision example (#11156)
* Add phi-3-vision example (HF-Automodels)

* fix

* fix

* fix

* Add phi-3-vision CPU example (HF-Automodels)

* add in readme

* fix

* fix

* fix

* fix

* use fp8 for gpu example

* remove eval
2024-05-30 10:02:47 +08:00
Jiao Wang
93146b9433
Reconstruct Speculative Decoding example directory (#11136)
* update

* update

* update
2024-05-29 13:15:27 -07:00
Xiangyu Tian
2299698b45
Refine Pipeline Parallel FastAPI example (#11168) 2024-05-29 17:16:50 +08:00
Ruonan Wang
9bfbf78bf4
update api usage of xe_batch & fp16 (#11164)
* update api usage

* update setup.py
2024-05-29 15:15:14 +08:00
Yina Chen
e29e2f1c78
Support new fp8 e4m3 (#11158) 2024-05-29 14:27:14 +08:00
Wang, Jian4
8e25de1126
LLM: Add codegeex2 example (#11143)
* add codegeex example

* update

* update cpu

* add GPU

* add gpu

* update readme
2024-05-29 10:00:26 +08:00
ZehuaCao
751e1a4e29
Fix concurrent issue in autoTP streming. (#11150)
* add benchmark test

* update
2024-05-29 08:22:38 +08:00
Yishuo Wang
bc5008f0d5
disable sdp_causal in phi-3 to fix overflow (#11157) 2024-05-28 17:25:53 +08:00
SONG Ge
33852bd23e
Refactor pipeline parallel device config (#11149)
* refactor pipeline parallel device config

* meet comments

* update example

* add warnings and update code doc
2024-05-28 16:52:46 +08:00
hxsz1997
62b2d8af6b
Add lookahead in all-in-one (#11142)
* add lookahead in allinone

* delete save to csv in run_transformer_int4_gpu

* change lookup to lookahead

* fix the error of add model.peak_memory

* Set transformer_int4_gpu as the default option

* add comment of transformer_int4_fp16_lookahead_gpu
2024-05-28 15:39:58 +08:00
Xiangyu Tian
b44cf405e2
Refine Pipeline-Parallel-Fastapi example README (#11155) 2024-05-28 15:18:21 +08:00
Yishuo Wang
d307622797
fix first token sdp with batch (#11153) 2024-05-28 15:03:06 +08:00
Yina Chen
3464440839
fix qwen import error (#11154) 2024-05-28 14:50:12 +08:00
Jin Qiao
25b6402315
Add Windows GPU unit test (#11050)
* Add Windows GPU UT

* temporarily remove ut on arc

* retry

* retry

* retry

* fix

* retry

* retry

* fix

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* fix

* retry

* retry

* retry

* retry

* retry

* retry

* merge into single workflow

* retry inference test

* retry

* retrigger

* try to fix inference test

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* retry

* check lower_bound

* retry

* retry

* try example test

* try fix example test

* retry

* fix

* seperate function into shell script

* remove cygpath

* try remove all cygpath

* retry

* retry

* Revert "try remove all cygpath"

This reverts commit 7ceeff3e48f08429062ecef548c1a3ad3488756f.

* Revert "retry"

This reverts commit 40ea2457843bff6991b8db24316cde5de1d35418.

* Revert "retry"

This reverts commit 817d0db3e5aec3bd449d3deaf4fb01d3ecfdc8a3.

* enable ut

* fix

* retrigger

* retrigger

* update download url

* fix

* fix

* retry

* add comment

* fix
2024-05-28 13:29:47 +08:00
Yina Chen
b6b70d1ba0
Divide core-xe packages (#11131)
* temp

* add batch

* fix style

* update package name

* fix style

* add workflow

* use temp version to run uts

* trigger performance test

* trigger win igpu perf

* revert workflow & setup
2024-05-28 12:00:18 +08:00
binbin Deng
c9168b85b7
Fix error during merging adapter (#11145) 2024-05-27 19:41:42 +08:00
Guancheng Fu
daf7b1cd56
[Docker] Fix image using two cards error (#11144)
* fix all

* done
2024-05-27 16:20:13 +08:00
Xiangyu Tian
5c8ccf0ba9
LLM: Add Pipeline-Parallel-FastAPI example (#10917)
Add multi-stage Pipeline-Parallel-FastAPI example

---------

Co-authored-by: hzjane <a1015616934@qq.com>
2024-05-27 14:46:29 +08:00
Ruonan Wang
d550af957a
fix security issue of eagle (#11140)
* fix security issue of eagle

* small fix
2024-05-27 10:15:28 +08:00
binbin Deng
367de141f2
Fix mixtral-8x7b with transformers=4.37.0 (#11132) 2024-05-27 09:50:54 +08:00
Jean Yu
ab476c7fe2
Eagle Speculative Sampling examples (#11104)
* Eagle Speculative Sampling examples

* rm multi-gpu and ray content

* updated README to include Arc A770
2024-05-24 11:13:43 -07:00
Guancheng Fu
fabc395d0d
add langchain vllm interface (#11121)
* done

* fix

* fix

* add vllm

* add langchain vllm exampels

* add docs

* temp
2024-05-24 17:19:27 +08:00
ZehuaCao
63e95698eb
[LLM]Reopen autotp generate_stream (#11120)
* reopen autotp generate_stream

* fix style error

* update
2024-05-24 17:16:14 +08:00
Yishuo Wang
1dc680341b
fix phi-3-vision import (#11129) 2024-05-24 15:57:15 +08:00
Guancheng Fu
7f772c5a4f
Add half precision for fastchat models (#11130) 2024-05-24 15:41:14 +08:00
Zhao Changmin
65f4212f89
Fix qwen 14b run into register attention fwd (#11128)
* fix qwen 14b
2024-05-24 14:45:07 +08:00
Shaojun Liu
373f9e6c79
add ipex-llm-init.bat for Windows (#11082)
* add ipex-llm-init.bat for Windows

* update setup.py
2024-05-24 14:26:25 +08:00
Qiyuan Gong
120a0035ac
Fix type mismatch in eval for Baichuan2 QLora example (#11117)
* During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.
2024-05-24 14:14:30 +08:00
Yishuo Wang
1db9d9a63b
optimize internlm2 xcomposer agin (#11124) 2024-05-24 13:44:52 +08:00
Yishuo Wang
9372ce87ce
fix internlm xcomposer2 fp16 (#11123) 2024-05-24 11:03:31 +08:00
Cengguang Zhang
011b9faa5c
LLM: unify baichuan2-13b alibi mask dtype with model dtype. (#11107)
* LLM: unify alibi mask dtype.

* fix comments.
2024-05-24 10:27:53 +08:00
Jiao Wang
0a06a6e1d4
Update tests for transformers 4.36 (#10858)
* update unit test

* update

* update

* update

* update

* update

* fix gpu attention test

* update

* update

* update

* update

* update

* update

* update example test

* replace replit code

* update

* update

* update

* update

* set safe_serialization false

* perf test

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* delete

* update

* update

* update

* update

* update

* update

* revert

* update
2024-05-24 10:26:38 +08:00
Xiangyu Tian
b3f6faa038
LLM: Add CPU vLLM entrypoint (#11083)
Add CPU vLLM entrypoint and update CPU vLLM serving example.
2024-05-24 09:16:59 +08:00
Yishuo Wang
797dbc48b8
fix phi-2 and phi-3 convert (#11116) 2024-05-23 17:37:37 +08:00
Yishuo Wang
37b98a531f
support running internlm xcomposer2 on gpu and add sdp optimization (#11115) 2024-05-23 17:26:24 +08:00
Zhao Changmin
c5e8b90c8d
Add Qwen register attention implemention (#11110)
* qwen_register
2024-05-23 17:17:45 +08:00
Yishuo Wang
0e53f20edb
support running internlm-xcomposer2 on cpu (#11111) 2024-05-23 16:36:09 +08:00