hxsz1997
245c7348bc
Add codegemma example ( #10884 )
...
* add codegemma example in GPU/HF-Transformers-AutoModels/
* add README of codegemma example in GPU/HF-Transformers-AutoModels/
* add codegemma example in GPU/PyTorch-Models/
* add readme of codegemma example in GPU/PyTorch-Models/
* add codegemma example in CPU/HF-Transformers-AutoModels/
* add readme of codegemma example in CPU/HF-Transformers-AutoModels/
* add codegemma example in CPU/PyTorch-Models/
* add readme of codegemma example in CPU/PyTorch-Models/
* fix typos
* fix filename typo
* add codegemma in tables
* add comments of lm_head
* remove comments of use_cache
2024-05-07 13:35:42 +08:00
Shaojun Liu
08ad40b251
improve ipex-llm-init for Linux ( #10928 )
...
* refine ipex-llm-init
* install libtcmalloc.so for Max
* update based on comment
* remove unneeded code
2024-05-07 12:55:14 +08:00
Wang, Jian4
33b8f524c2
Add cpp docker manually_test ( #10946 )
...
* add cpp docker
* update
2024-05-07 11:23:28 +08:00
Wang, Jian4
191b184341
LLM: Optimize cohere model ( #10878 )
...
* use mlp and rms
* optimize kv_cache
* add fuse qkv
* add flash attention and fp16 sdp
* error fp8 sdp
* fix optimized
* fix style
* update
* add for pp
2024-05-07 10:19:50 +08:00
Xiangyu Tian
13a44cdacb
LLM: Refine Deepspped-AutoTP-FastAPI example ( #10916 )
2024-05-07 09:37:31 +08:00
Wang, Jian4
1de878bee1
LLM: Fix speculative llama3 long input error ( #10934 )
2024-05-07 09:25:20 +08:00
Guancheng Fu
49ab5a2b0e
Add embeddings ( #10931 )
2024-05-07 09:07:02 +08:00
Shengsheng Huang
d649236321
make images clickable ( #10939 )
2024-05-06 20:24:15 +08:00
Shengsheng Huang
64938c2ca7
Dify quickstart revision ( #10938 )
...
* revise dify quickstart guide
* update quick links and a small typo
2024-05-06 19:59:17 +08:00
Ruonan Wang
3f438495e4
update llama.cpp and ollama quickstart ( #10929 )
2024-05-06 15:01:06 +08:00
Qiyuan Gong
41ffe1526c
Modify CPU finetune docker for bz2 error ( #10919 )
...
* Avoid bz2 error
* change to cpu torch
2024-05-06 10:41:50 +08:00
Wang, Jian4
0e0bd309e2
LLM: Enable Speculative on Fastchat ( #10909 )
...
* init
* enable streamer
* update
* update
* remove deprecated
* update
* update
* add gpu example
2024-05-06 10:06:20 +08:00
Zhicun
8379f02a74
Add Dify quickstart ( #10903 )
...
* add quick start
* modify
* modify
* add
* add
* resize
* add mp4
* add vedio
* add video
* video
* add
* modify
* add
* modify
2024-05-06 10:01:34 +08:00
Cengguang Zhang
0edef1f94c
LLM: add min_new_tokens to all in one benchmark. ( #10911 )
2024-05-06 09:32:59 +08:00
Shengsheng Huang
c78a8e3677
update quickstart ( #10923 )
2024-04-30 18:19:31 +08:00
Shengsheng Huang
282d676561
update continue quickstart ( #10922 )
2024-04-30 17:51:21 +08:00
Cengguang Zhang
75dbf240ec
LLM: update split tensor conditions. ( #10872 )
...
* LLM: update split tensor condition.
* add cond for split tensor.
* update priority of env.
* fix style.
* update env name.
2024-04-30 17:07:21 +08:00
Yuwen Hu
71f51ce589
Initial Update for Continue Quickstart with Ollama backend ( #10918 )
...
* Initial continue quickstart with ollama backend updates
* Small fix
* Small fix
2024-04-30 15:10:30 +08:00
Guancheng Fu
2c64754eb0
Add vLLM to ipex-llm serving image ( #10807 )
...
* add vllm
* done
* doc work
* fix done
* temp
* add docs
* format
* add start-fastchat-service.sh
* fix
2024-04-29 17:25:42 +08:00
Jin Qiao
1f876fd837
Add example for phi-3 ( #10881 )
...
* Add example for phi-3
* add in readme and index
* fix
* fix
* fix
* fix indent
* fix
2024-04-29 16:43:55 +08:00
Yuwen Hu
c936ba3b64
Small fix for supporting workflow dispatch in nightly perf ( #10908 )
2024-04-29 13:25:14 +08:00
Yishuo Wang
d884c62dc4
remove new_layout parameter ( #10906 )
2024-04-29 10:31:50 +08:00
Guancheng Fu
fbcd7bc737
Fix Loader issue with dtype fp16 ( #10907 )
2024-04-29 10:16:02 +08:00
Guancheng Fu
c9fac8c26b
Fix sdp logic ( #10896 )
...
* fix
* fix
2024-04-28 22:02:14 +08:00
Yina Chen
015d07a58f
Fix lookahead sample error & add update strategy ( #10894 )
...
* Fix sample error & add update strategy
* add mtl config
* fix style
* remove print
2024-04-28 17:21:00 +08:00
Yuwen Hu
94b4e96fa6
Small updates for workflow-dispatch triggered nightly perf ( #10902 )
...
* Small fix for workflow-dispatch triggerd nightly perf
* Small fix
2024-04-28 11:27:20 +08:00
Yuwen Hu
1a8a93d5e0
Further fix nightly perf ( #10901 )
2024-04-28 10:18:58 +08:00
Yuwen Hu
7c290d3f92
Add workflow dispatch trigger to nightly perf ( #10900 )
2024-04-28 09:54:30 +08:00
Yuwen Hu
ddfdaec137
Fix nightly perf ( #10899 )
...
* Fix nightly perf by adding default value in benchmark for use_fp16_torch_dtype
* further fixes
2024-04-28 09:39:29 +08:00
Jason Dai
ea035f5e15
Update README.md ( #10898 )
2024-04-26 22:32:45 +08:00
Cengguang Zhang
9752ffe979
LLM: update split qkv native sdp. ( #10895 )
...
* LLM: update split qkv native sdp.
* fix typo.
2024-04-26 18:47:35 +08:00
Guancheng Fu
990535b1cf
Add tensor parallel for vLLM ( #10879 )
...
* initial
* test initial tp
* initial sup
* fix format
* fix
* fix
2024-04-26 17:10:49 +08:00
Shaojun Liu
d058f2b403
Fix apt install oneapi scripts ( #10891 )
...
* Fix apt install oneapi scripts
* add intel-oneapi-mkl-devel
* add apt pkgs
2024-04-26 16:39:37 +08:00
binbin Deng
f51bf018eb
Add benchmark script for pipeline parallel inference ( #10873 )
2024-04-26 15:28:11 +08:00
Yishuo Wang
46ba962168
use new quantize kv ( #10888 )
2024-04-26 14:42:17 +08:00
Heyang Sun
751f6d11d8
fix typos in qlora README ( #10893 )
2024-04-26 14:03:06 +08:00
Xiangyu Tian
3d4950b0f0
LLM: Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example ( #10876 )
...
Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example.
2024-04-26 13:24:28 +08:00
Wang, Jian4
3e8ed54270
LLM: Fix bigdl_ipex_int8 warning ( #10890 )
2024-04-26 11:18:44 +08:00
Jin Qiao
fb3c268d13
Add phi-3 to perf ( #10883 )
2024-04-25 20:21:56 +08:00
Yina Chen
8811f268ff
Use new fp16 sdp in Qwen and modify the constraint ( #10882 )
2024-04-25 19:23:37 +08:00
Yuxuan Xia
0213c1c1da
Add phi3 to the nightly test ( #10885 )
...
* Add llama3 and phi2 nightly test
* Change llama3-8b to llama3-8b-instruct
* Add phi3 to nightly test
* Add phi3 to nightly test
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-04-25 17:39:12 +08:00
Yuxuan Xia
ca2479be87
Update scripts readme ( #10725 )
...
* Update scripts readme
* Update scripts readme
* Update README
* Update readme
* Update readme
* Update windows env check readme
* Adjust env check readme
* Update windows env check
* Update env check readme
* Adjust the env-check README
* Modify the env-check README
2024-04-25 17:24:37 +08:00
Cengguang Zhang
cd369c2715
LLM: add device id to benchmark utils. ( #10877 )
2024-04-25 14:01:51 +08:00
Yang Wang
1ce8d7bcd9
Support the desc_act feature in GPTQ model ( #10851 )
...
* support act_order
* update versions
* fix style
* fix bug
* clean up
2024-04-24 10:17:13 -07:00
Yina Chen
dc27b3bc35
Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) ( #10790 )
...
* update sdp condition
* update
* fix
* update & test llama
* mistral
* fix style
* update
* fix style
* remove pvc constrain
* update ds on arc
* fix style
2024-04-24 17:24:01 +08:00
Yuxuan Xia
844e18b1db
Add llama3 and phi2 nightly test ( #10874 )
...
* Add llama3 and phi2 nightly test
* Change llama3-8b to llama3-8b-instruct
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-04-24 16:58:56 +08:00
Qiyuan Gong
634726211a
Add video to axolotl quick start ( #10870 )
...
* Add video to axolotl quick start.
* Fix wget url.
2024-04-24 16:53:14 +08:00
binbin Deng
c9feffff9a
LLM: support Qwen1.5-MoE-A2.7B-Chat pipeline parallel inference ( #10864 )
2024-04-24 16:02:27 +08:00
Yishuo Wang
2d210817ff
add phi3 optimization ( #10871 )
2024-04-24 15:17:40 +08:00
Cengguang Zhang
eb39c61607
LLM: add min new token to perf test. ( #10869 )
2024-04-24 14:32:02 +08:00