Wang, Jian4
8e25de1126
LLM: Add codegeex2 example ( #11143 )
...
* add codegeex example
* update
* update cpu
* add GPU
* add gpu
* update readme
2024-05-29 10:00:26 +08:00
ZehuaCao
751e1a4e29
Fix concurrent issue in autoTP streming. ( #11150 )
...
* add benchmark test
* update
2024-05-29 08:22:38 +08:00
Jason Dai
7cc43aa67a
Update readme ( #11160 )
2024-05-28 21:16:36 +08:00
Yishuo Wang
bc5008f0d5
disable sdp_causal in phi-3 to fix overflow ( #11157 )
2024-05-28 17:25:53 +08:00
SONG Ge
33852bd23e
Refactor pipeline parallel device config ( #11149 )
...
* refactor pipeline parallel device config
* meet comments
* update example
* add warnings and update code doc
2024-05-28 16:52:46 +08:00
hxsz1997
62b2d8af6b
Add lookahead in all-in-one ( #11142 )
...
* add lookahead in allinone
* delete save to csv in run_transformer_int4_gpu
* change lookup to lookahead
* fix the error of add model.peak_memory
* Set transformer_int4_gpu as the default option
* add comment of transformer_int4_fp16_lookahead_gpu
2024-05-28 15:39:58 +08:00
Ruonan Wang
83bd9cb681
add new version for cpp quickstart and keep an old version ( #11151 )
...
* add new version
* meet review
2024-05-28 15:29:34 +08:00
Xiangyu Tian
b44cf405e2
Refine Pipeline-Parallel-Fastapi example README ( #11155 )
2024-05-28 15:18:21 +08:00
Yishuo Wang
d307622797
fix first token sdp with batch ( #11153 )
2024-05-28 15:03:06 +08:00
Yina Chen
3464440839
fix qwen import error ( #11154 )
2024-05-28 14:50:12 +08:00
Jin Qiao
25b6402315
Add Windows GPU unit test ( #11050 )
...
* Add Windows GPU UT
* temporarily remove ut on arc
* retry
* retry
* retry
* fix
* retry
* retry
* fix
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* fix
* retry
* retry
* retry
* retry
* retry
* retry
* merge into single workflow
* retry inference test
* retry
* retrigger
* try to fix inference test
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* retry
* check lower_bound
* retry
* retry
* try example test
* try fix example test
* retry
* fix
* seperate function into shell script
* remove cygpath
* try remove all cygpath
* retry
* retry
* Revert "try remove all cygpath"
This reverts commit 7ceeff3e48f08429062ecef548c1a3ad3488756f.
* Revert "retry"
This reverts commit 40ea2457843bff6991b8db24316cde5de1d35418.
* Revert "retry"
This reverts commit 817d0db3e5aec3bd449d3deaf4fb01d3ecfdc8a3.
* enable ut
* fix
* retrigger
* retrigger
* update download url
* fix
* fix
* retry
* add comment
* fix
2024-05-28 13:29:47 +08:00
Yina Chen
b6b70d1ba0
Divide core-xe packages ( #11131 )
...
* temp
* add batch
* fix style
* update package name
* fix style
* add workflow
* use temp version to run uts
* trigger performance test
* trigger win igpu perf
* revert workflow & setup
2024-05-28 12:00:18 +08:00
binbin Deng
c9168b85b7
Fix error during merging adapter ( #11145 )
2024-05-27 19:41:42 +08:00
Guancheng Fu
daf7b1cd56
[Docker] Fix image using two cards error ( #11144 )
...
* fix all
* done
2024-05-27 16:20:13 +08:00
Jason Dai
34dab3b4ef
Update readme ( #11141 )
2024-05-27 15:41:02 +08:00
Xiangyu Tian
5c8ccf0ba9
LLM: Add Pipeline-Parallel-FastAPI example ( #10917 )
...
Add multi-stage Pipeline-Parallel-FastAPI example
---------
Co-authored-by: hzjane <a1015616934@qq.com>
2024-05-27 14:46:29 +08:00
Ruonan Wang
d550af957a
fix security issue of eagle ( #11140 )
...
* fix security issue of eagle
* small fix
2024-05-27 10:15:28 +08:00
binbin Deng
367de141f2
Fix mixtral-8x7b with transformers=4.37.0 ( #11132 )
2024-05-27 09:50:54 +08:00
Jean Yu
ab476c7fe2
Eagle Speculative Sampling examples ( #11104 )
...
* Eagle Speculative Sampling examples
* rm multi-gpu and ray content
* updated README to include Arc A770
2024-05-24 11:13:43 -07:00
Guancheng Fu
fabc395d0d
add langchain vllm interface ( #11121 )
...
* done
* fix
* fix
* add vllm
* add langchain vllm exampels
* add docs
* temp
2024-05-24 17:19:27 +08:00
ZehuaCao
63e95698eb
[LLM]Reopen autotp generate_stream ( #11120 )
...
* reopen autotp generate_stream
* fix style error
* update
2024-05-24 17:16:14 +08:00
Yishuo Wang
1dc680341b
fix phi-3-vision import ( #11129 )
2024-05-24 15:57:15 +08:00
Guancheng Fu
7f772c5a4f
Add half precision for fastchat models ( #11130 )
2024-05-24 15:41:14 +08:00
Zhao Changmin
65f4212f89
Fix qwen 14b run into register attention fwd ( #11128 )
...
* fix qwen 14b
2024-05-24 14:45:07 +08:00
Shaojun Liu
373f9e6c79
add ipex-llm-init.bat for Windows ( #11082 )
...
* add ipex-llm-init.bat for Windows
* update setup.py
2024-05-24 14:26:25 +08:00
Shaojun Liu
85491907f3
Update GIF link ( #11119 )
2024-05-24 14:26:18 +08:00
Qiyuan Gong
120a0035ac
Fix type mismatch in eval for Baichuan2 QLora example ( #11117 )
...
* During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.
2024-05-24 14:14:30 +08:00
Qiyuan Gong
21a1a973c1
Remove axolotl and python3-blinker ( #11127 )
...
* Remove axolotl from image to reduce image size.
* Remove python3-blinker to avoid axolotl lib conflict.
2024-05-24 13:54:19 +08:00
Yishuo Wang
1db9d9a63b
optimize internlm2 xcomposer agin ( #11124 )
2024-05-24 13:44:52 +08:00
Yishuo Wang
9372ce87ce
fix internlm xcomposer2 fp16 ( #11123 )
2024-05-24 11:03:31 +08:00
Cengguang Zhang
011b9faa5c
LLM: unify baichuan2-13b alibi mask dtype with model dtype. ( #11107 )
...
* LLM: unify alibi mask dtype.
* fix comments.
2024-05-24 10:27:53 +08:00
Jiao Wang
0a06a6e1d4
Update tests for transformers 4.36 ( #10858 )
...
* update unit test
* update
* update
* update
* update
* update
* fix gpu attention test
* update
* update
* update
* update
* update
* update
* update example test
* replace replit code
* update
* update
* update
* update
* set safe_serialization false
* perf test
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* delete
* update
* update
* update
* update
* update
* update
* revert
* update
2024-05-24 10:26:38 +08:00
Xiangyu Tian
1291165720
LLM: Add quickstart for vLLM cpu ( #11122 )
...
Add quickstart for vLLM cpu.
2024-05-24 10:21:21 +08:00
Wang, Jian4
1443b802cc
Docker:Fix building cpp_docker and remove unimportant dependencies ( #11114 )
...
* test build
* update
2024-05-24 09:49:44 +08:00
Xiangyu Tian
b3f6faa038
LLM: Add CPU vLLM entrypoint ( #11083 )
...
Add CPU vLLM entrypoint and update CPU vLLM serving example.
2024-05-24 09:16:59 +08:00
Shengsheng Huang
7ed270a4d8
update readme docker section, fix quickstart title, remove chs figure ( #11044 )
...
* update readme and fix quickstart title, remove chs figure
* update readme according to comment
* reorganize the docker guide structure
2024-05-24 00:18:20 +08:00
Yishuo Wang
797dbc48b8
fix phi-2 and phi-3 convert ( #11116 )
2024-05-23 17:37:37 +08:00
Yishuo Wang
37b98a531f
support running internlm xcomposer2 on gpu and add sdp optimization ( #11115 )
2024-05-23 17:26:24 +08:00
Zhao Changmin
c5e8b90c8d
Add Qwen register attention implemention ( #11110 )
...
* qwen_register
2024-05-23 17:17:45 +08:00
Yishuo Wang
0e53f20edb
support running internlm-xcomposer2 on cpu ( #11111 )
2024-05-23 16:36:09 +08:00
Shaojun Liu
e0f401d97d
FIX: APT Repository not working (signatures invalid) ( #11112 )
...
* chmod 644 gpg key
* chmod 644 gpg key
2024-05-23 16:15:45 +08:00
Yuwen Hu
d36b41d59e
Add setuptools limitation for ipex-llm[xpu] ( #11102 )
...
* Add setuptool limitation for ipex-llm[xpu]
* llamaindex option update
2024-05-22 18:20:30 +08:00
Yishuo Wang
cd4dff09ee
support phi-3 vision ( #11101 )
2024-05-22 17:43:50 +08:00
Zhao Changmin
15d906a97b
Update linux igpu run script ( #11098 )
...
* update run script
2024-05-22 17:18:07 +08:00
Kai Huang
f63172ef63
Align ppl with llama.cpp ( #11055 )
...
* update script
* remove
* add header
* update readme
2024-05-22 16:43:11 +08:00
Qiyuan Gong
f6c9ffe4dc
Add WANDB_MODE and HF_HUB_OFFLINE to XPU finetune README ( #11097 )
...
* Add WANDB_MODE=offline to avoid multi-GPUs finetune errors.
* Add HF_HUB_OFFLINE=1 to avoid Hugging Face related errors.
2024-05-22 15:20:53 +08:00
Yuwen Hu
1c5ed9b6cf
Fix arc ut ( #11096 )
2024-05-22 14:13:13 +08:00
Guancheng Fu
4fd1df9cf6
Add toc for docker quickstarts ( #11095 )
...
* fix
* fix
2024-05-22 11:23:22 +08:00
Shaojun Liu
584439e498
update homepage url for ipex-llm ( #11094 )
...
* update homepage url
* Update python version to 3.11
* Update long description
2024-05-22 11:10:44 +08:00
Zhao Changmin
bf0f904e66
Update level_zero on MTL linux ( #11085 )
...
* Update level_zero on MTL
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-05-22 11:01:56 +08:00