Guancheng Fu
2c64754eb0
Add vLLM to ipex-llm serving image ( #10807 )
...
* add vllm
* done
* doc work
* fix done
* temp
* add docs
* format
* add start-fastchat-service.sh
* fix
2024-04-29 17:25:42 +08:00
Jin Qiao
1f876fd837
Add example for phi-3 ( #10881 )
...
* Add example for phi-3
* add in readme and index
* fix
* fix
* fix
* fix indent
* fix
2024-04-29 16:43:55 +08:00
Yuwen Hu
c936ba3b64
Small fix for supporting workflow dispatch in nightly perf ( #10908 )
2024-04-29 13:25:14 +08:00
Yishuo Wang
d884c62dc4
remove new_layout parameter ( #10906 )
2024-04-29 10:31:50 +08:00
Guancheng Fu
fbcd7bc737
Fix Loader issue with dtype fp16 ( #10907 )
2024-04-29 10:16:02 +08:00
Guancheng Fu
c9fac8c26b
Fix sdp logic ( #10896 )
...
* fix
* fix
2024-04-28 22:02:14 +08:00
Yina Chen
015d07a58f
Fix lookahead sample error & add update strategy ( #10894 )
...
* Fix sample error & add update strategy
* add mtl config
* fix style
* remove print
2024-04-28 17:21:00 +08:00
Yuwen Hu
94b4e96fa6
Small updates for workflow-dispatch triggered nightly perf ( #10902 )
...
* Small fix for workflow-dispatch triggerd nightly perf
* Small fix
2024-04-28 11:27:20 +08:00
Yuwen Hu
1a8a93d5e0
Further fix nightly perf ( #10901 )
2024-04-28 10:18:58 +08:00
Yuwen Hu
7c290d3f92
Add workflow dispatch trigger to nightly perf ( #10900 )
2024-04-28 09:54:30 +08:00
Yuwen Hu
ddfdaec137
Fix nightly perf ( #10899 )
...
* Fix nightly perf by adding default value in benchmark for use_fp16_torch_dtype
* further fixes
2024-04-28 09:39:29 +08:00
Jason Dai
ea035f5e15
Update README.md ( #10898 )
2024-04-26 22:32:45 +08:00
Cengguang Zhang
9752ffe979
LLM: update split qkv native sdp. ( #10895 )
...
* LLM: update split qkv native sdp.
* fix typo.
2024-04-26 18:47:35 +08:00
Guancheng Fu
990535b1cf
Add tensor parallel for vLLM ( #10879 )
...
* initial
* test initial tp
* initial sup
* fix format
* fix
* fix
2024-04-26 17:10:49 +08:00
Shaojun Liu
d058f2b403
Fix apt install oneapi scripts ( #10891 )
...
* Fix apt install oneapi scripts
* add intel-oneapi-mkl-devel
* add apt pkgs
2024-04-26 16:39:37 +08:00
binbin Deng
f51bf018eb
Add benchmark script for pipeline parallel inference ( #10873 )
2024-04-26 15:28:11 +08:00
Yishuo Wang
46ba962168
use new quantize kv ( #10888 )
2024-04-26 14:42:17 +08:00
Heyang Sun
751f6d11d8
fix typos in qlora README ( #10893 )
2024-04-26 14:03:06 +08:00
Xiangyu Tian
3d4950b0f0
LLM: Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example ( #10876 )
...
Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example.
2024-04-26 13:24:28 +08:00
Wang, Jian4
3e8ed54270
LLM: Fix bigdl_ipex_int8 warning ( #10890 )
2024-04-26 11:18:44 +08:00
Jin Qiao
fb3c268d13
Add phi-3 to perf ( #10883 )
2024-04-25 20:21:56 +08:00
Yina Chen
8811f268ff
Use new fp16 sdp in Qwen and modify the constraint ( #10882 )
2024-04-25 19:23:37 +08:00
Yuxuan Xia
0213c1c1da
Add phi3 to the nightly test ( #10885 )
...
* Add llama3 and phi2 nightly test
* Change llama3-8b to llama3-8b-instruct
* Add phi3 to nightly test
* Add phi3 to nightly test
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-04-25 17:39:12 +08:00
Yuxuan Xia
ca2479be87
Update scripts readme ( #10725 )
...
* Update scripts readme
* Update scripts readme
* Update README
* Update readme
* Update readme
* Update windows env check readme
* Adjust env check readme
* Update windows env check
* Update env check readme
* Adjust the env-check README
* Modify the env-check README
2024-04-25 17:24:37 +08:00
Cengguang Zhang
cd369c2715
LLM: add device id to benchmark utils. ( #10877 )
2024-04-25 14:01:51 +08:00
Yang Wang
1ce8d7bcd9
Support the desc_act feature in GPTQ model ( #10851 )
...
* support act_order
* update versions
* fix style
* fix bug
* clean up
2024-04-24 10:17:13 -07:00
Yina Chen
dc27b3bc35
Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) ( #10790 )
...
* update sdp condition
* update
* fix
* update & test llama
* mistral
* fix style
* update
* fix style
* remove pvc constrain
* update ds on arc
* fix style
2024-04-24 17:24:01 +08:00
Yuxuan Xia
844e18b1db
Add llama3 and phi2 nightly test ( #10874 )
...
* Add llama3 and phi2 nightly test
* Change llama3-8b to llama3-8b-instruct
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-04-24 16:58:56 +08:00
Qiyuan Gong
634726211a
Add video to axolotl quick start ( #10870 )
...
* Add video to axolotl quick start.
* Fix wget url.
2024-04-24 16:53:14 +08:00
binbin Deng
c9feffff9a
LLM: support Qwen1.5-MoE-A2.7B-Chat pipeline parallel inference ( #10864 )
2024-04-24 16:02:27 +08:00
Yishuo Wang
2d210817ff
add phi3 optimization ( #10871 )
2024-04-24 15:17:40 +08:00
Cengguang Zhang
eb39c61607
LLM: add min new token to perf test. ( #10869 )
2024-04-24 14:32:02 +08:00
Yuwen Hu
fb2a160af3
Add phi-2 to 2048-256 test for fixes ( #10867 )
2024-04-24 10:00:25 +08:00
binbin Deng
fabf54e052
LLM: make pipeline parallel inference example more common ( #10786 )
2024-04-24 09:28:52 +08:00
hxsz1997
328b1a1de9
Fix the not stop issue of llama3 examples ( #10860 )
...
* fix not stop issue in GPU/HF-Transformers-AutoModels
* fix not stop issue in GPU/PyTorch-Models/Model/llama3
* fix not stop issue in CPU/HF-Transformers-AutoModels/Model/llama3
* fix not stop issue in CPU/PyTorch-Models/Model/llama3
* update the output in readme
* update format
* add reference
* update prompt format
* update output format in readme
* update example output in readme
2024-04-23 19:10:09 +08:00
Yuwen Hu
5c9eb5d0f5
Support llama-index install option for upstreaming purposes ( #10866 )
...
* Support llama-index install option for upstreaming purposes
* Small fix
* Small fix
2024-04-23 19:08:29 +08:00
Yuwen Hu
21bb8bd164
Add phi-2 to igpu performance test ( #10865 )
2024-04-23 18:13:14 +08:00
ZehuaCao
36eb8b2e96
Add llama3 speculative example ( #10856 )
...
* Initial llama3 speculative example
* update README
* update README
* update README
2024-04-23 17:03:54 +08:00
Zhicun
a017bf2981
add quick start for dify ( #10813 )
...
* add quick start
* modify
* modify
* add
* add
* resize
* add mp4
* add vedio
* add video
* video
* add
2024-04-23 16:32:22 +08:00
Cengguang Zhang
763413b7e1
LLM: support llama split tensor for long context in transformers>=4.36. ( #10844 )
...
* LLm: support llama split tensor for long context in transformers>=4.36.
* fix dtype.
* fix style.
* fix style.
* fix style.
* fix style.
* fix dtype.
* fix style.
2024-04-23 16:13:25 +08:00
Qiyuan Gong
bce99a5b00
Minior fix for quick start ( #10857 )
...
* Fix typo and space in quick start.
2024-04-23 15:22:01 +08:00
Qiyuan Gong
5eee1976ac
Add Axolotl v0.4.0 quickstart ( #10840 )
...
* Add Axolotl v0.4.0 quickstart
2024-04-23 14:57:34 +08:00
ZehuaCao
92ea54b512
Fix speculative decoding bug ( #10855 )
2024-04-23 14:28:31 +08:00
yb-peng
c9dee6cd0e
Update 8192.txt ( #10824 )
...
* Update 8192.txt
* Update 8192.txt with original text
2024-04-23 14:02:09 +08:00
Wang, Jian4
18c032652d
LLM: Add mixtral speculative CPU example ( #10830 )
...
* init mixtral sp example
* use different prompt_format
* update output
* update
2024-04-23 10:05:51 +08:00
Qiyuan Gong
5494aa55f6
Downgrade datasets in axolotl example ( #10849 )
...
* Downgrade datasets to 2.15.0 to address axolotl prepare issue https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544
Tks to @kwaa for providing the solution in https://github.com/intel-analytics/ipex-llm/issues/10821#issuecomment-2068861571
2024-04-23 09:41:58 +08:00
Ruonan Wang
2ec45c49d3
fix ollama quickstart( #10846 )
2024-04-22 22:04:49 +08:00
Yishuo Wang
fe5a082b84
add phi-2 optimization ( #10843 )
2024-04-22 18:56:47 +08:00
Guancheng Fu
47bd5f504c
[vLLM]Remove vllm-v1, refactor v2 ( #10842 )
...
* remove vllm-v1
* fix format
2024-04-22 17:51:32 +08:00
Wang, Jian4
23c6a52fb0
LLM: Fix ipex torchscript=True error ( #10832 )
...
* remove
* update
* remove torchscript
2024-04-22 15:53:09 +08:00