-
56cb992497
LLM: Modify CPU Installation Command for most examples (#11049)
ZehuaCao
2024-05-17 15:52:20 +0800
-
f1156e6b20
support gguf_q4k_m / gguf_q4k_s (#10887)
Ruonan Wang
2024-05-17 06:30:09 +0000
-
981d668be6
refactor baichuan2-7b (#11062)
Yishuo Wang
2024-05-17 13:01:34 +0800
-
84239d0bd3
Update docker image tags in Docker Quickstart (#11061)
Shaojun Liu
2024-05-17 11:06:11 +0800
-
b3027e2d60
Update for cpu install option in performance tests (#11060)
Yuwen Hu
2024-05-17 10:33:43 +0800
-
d963e95363
LLM: Modify CPU Installation Command for documentation (#11042)
Xiangyu Tian
2024-05-17 10:14:00 +0800
-
fff067d240
Make install ut for cpu exactly the same as what we want for users (#11051)
Yuwen Hu
2024-05-17 10:11:01 +0800
-
3a72e5df8c
disable mlp fusion of fp6 on mtl (#11059)
Ruonan Wang
2024-05-17 10:10:16 +0800
-
192ae35012
Add support for llama2 quantize_kv with transformers 4.38.0 (#11054)
SONG Ge
2024-05-16 22:23:39 +0800
-
16b2a418be
hotfix native_sdp ut (#11046)
SONG Ge
2024-05-16 17:15:37 +0800
-
6be70283b7
fix chatglm run error (#11045)
Xin Qiu
2024-05-16 15:39:18 +0800
-
8cae897643
use new rope in phi3 (#11047)
Yishuo Wang
2024-05-16 15:12:35 +0800
-
00d4410746
Update cpp docker quickstart (#11040)
Wang, Jian4
2024-05-16 14:55:13 +0800
-
c62e828281
Create release-ipex-llm.yaml (#11039)
Shaojun Liu
2024-05-16 11:10:10 +0800
-
4638682140
Fix xpu finetune image path in action (#11037)
Qiyuan Gong
2024-05-16 10:48:02 +0800
-
9a96af4232
Remove oneAPI pip install command in related examples (#11030)
Jin Qiao
2024-05-16 10:46:29 +0800
-
612a365479
LLM: Install CPU version torch with extras [all] (#10868)
Xiangyu Tian
2024-05-16 10:39:55 +0800
-
59df750326
Use new sdp again (#11025)
Yishuo Wang
2024-05-16 09:33:34 +0800
-
7e29928865
refactor serving docker image (#11028)
Guancheng Fu
2024-05-16 09:30:36 +0800
-
9942a4ba69
[WIP] Support llama2 with transformers==4.38.0 (#11024)
SONG Ge
2024-05-15 18:07:00 +0800
-
686f6038a8
Support fp6 save & load (#11034)
Yina Chen
2024-05-15 17:52:02 +0800
-
ac384e0f45
add fp6 mlp fusion (#11032)
Ruonan Wang
2024-05-15 17:42:50 +0800
-
2084ebe4ee
Enable fastchat benchmark latency (#11017)
Wang, Jian4
2024-05-15 14:52:09 +0800
-
93d40ab127
Update lookahead strategy (#11021)
hxsz1997
2024-05-15 14:48:05 +0800
-
1d73fc8106
update cpp quickstart (#11031)
Ruonan Wang
2024-05-15 14:33:36 +0800
-
d9f71f1f53
Update benchmark util for example using (#11027)
Wang, Jian4
2024-05-15 14:16:35 +0800
-
86cec80b51
LLM: Add llm inference_cpp_xpu_docker (#10933)
Wang, Jian4
2024-05-15 11:10:22 +0800
-
4053a6ef94
Update environment variable setting in AutoTP with arc (#11018)
binbin Deng
2024-05-15 10:23:58 +0800
-
fad1dbaf60
use sdp fp8 causal kernel (#11023)
Yishuo Wang
2024-05-15 10:22:35 +0800
-
c34f85e7d0
[Doc] Simplify installation on Windows for Intel GPU (#11004)
Yuwen Hu
2024-05-15 09:55:41 +0800
-
1e00bd7bbe
Re-org XPU finetune images (#10971)
Qiyuan Gong
2024-05-15 09:42:43 +0800
-
ee325e9cc9
fix phi3 (#11022)
Yishuo Wang
2024-05-15 09:32:12 +0800
-
7d3791c819
[LLM] Add llama3 alpaca qlora example (#11011)
Ziteng Zhang
2024-05-15 09:17:32 +0800
-
0a732bebe7
Add phi3 cached RotaryEmbedding (#11013)
Zhao Changmin
2024-05-15 08:16:43 +0800
-
0b7e78b592
revise the benchmark part in python inference docker (#11020)
Shengsheng Huang
2024-05-14 18:43:41 +0800
-
586a151f9c
update the README and reorganize the docker guides structure. (#11016)
Shengsheng Huang
2024-05-14 17:56:11 +0800
-
893197434d
Add fp6 support on gpu (#11008)
Yina Chen
2024-05-14 16:31:44 +0800
-
b03c859278
Add phi3RMS (#10988)
Zhao Changmin
2024-05-14 15:16:27 +0800
-
170e3d65e0
use new sdp and fp32 sdp (#11007)
Yishuo Wang
2024-05-14 14:29:18 +0800
-
8010af700f
Update igpu performance test to use pypi installed oneAPI (#11010)
Yuwen Hu
2024-05-14 14:05:33 +0800
-
c957ea3831
Add axolotl main support and axolotl Llama-3-8B QLoRA example (#10984)
Qiyuan Gong
2024-05-14 13:43:59 +0800
-
fb656fbf74
Add requirements for oneAPI pypi packages for windows Intel GPU users (#11009)
Yuwen Hu
2024-05-14 13:40:54 +0800
-
7f8c5b410b
Quickstart: Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL) (#10970)
Shaojun Liu
2024-05-14 12:58:31 +0800
-
a465111cf4
Update README.md (#11003)
Guancheng Fu
2024-05-13 16:44:48 +0800
-
74997a3ed1
Adding load_low_bit interface for ipex_llm_worker (#11000)
Guancheng Fu
2024-05-13 15:30:19 +0800
-
1b3c7a6928
remove phi3 empty cache (#10997)
Yishuo Wang
2024-05-13 14:09:55 +0800
-
99255fe36e
fix ppl (#10996)
ZehuaCao
2024-05-13 13:57:19 +0800
-
04d5a900e1
update troubleshooting of llama.cpp (#10990)
Ruonan Wang
2024-05-13 11:18:38 +0800
-
f8dd2e52ad
Fix Langchain upstream ut (#10985)
Kai Huang
2024-05-11 14:40:37 +0800
-
9f6358e4c2
Deprecate support for pytorch 2.0 on Linux for
ipex-llm >= 2.1.0b20240511 (#10986)
Yuwen Hu
2024-05-11 12:33:35 +0800
-
5e0872073e
add version for llama.cpp and ollama (#10982)
Ruonan Wang
2024-05-11 09:20:31 +0800
-
ad96f32ce0
optimize phi3 1st token performance (#10981)
Yishuo Wang
2024-05-10 17:33:46 +0800
-
cfed76b2ed
LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. (#10937)
Cengguang Zhang
2024-05-10 16:40:15 +0800
-
f9615f12d1
Add driver related packages version check in env script (#10977)
binbin Deng
2024-05-10 15:02:58 +0800
-
a6342cc068
Empty cache after phi first attention to support 4k input (#10972)
Kai Huang
2024-05-09 19:50:04 +0800
-
e753125880
use fp16_sdp when head_dim=96 (#10976)
Yishuo Wang
2024-05-09 17:02:59 +0800
-
b7f7d05a7e
update llama.cpp usage of llama3 (#10975)
Ruonan Wang
2024-05-09 16:44:12 +0800
-
697ca79eca
use quantize kv and sdp in phi3-mini (#10973)
Yishuo Wang
2024-05-09 15:16:18 +0800
-
e3159c45e4
update private gpt quickstart and a small fix for dify (#10969)
Shengsheng Huang
2024-05-09 13:57:45 +0800
-
459b764406
Remove munually_build_for_test push outside (#10968)
Wang, Jian4
2024-05-09 10:40:34 +0800
-
11df5f9773
revise private GPT quickstart and a few fixes for other quickstart (#10967)
Shengsheng Huang
2024-05-08 21:18:20 +0800
-
37820e1d86
Add privateGPT quickstart (#10932)
Keyan (Kyrie) Zhang
2024-05-08 05:48:00 -0700
-
f4c615b1ee
Add cohere example (#10954)
Wang, Jian4
2024-05-08 17:19:59 +0800
-
7e7d969dcb
a experimental for workflow abuse step1 fix a typo (#10965)
Zephyr1101
2024-05-08 17:12:50 +0800
-
3209d6b057
Fix spculative llama3 no stop error (#10963)
Wang, Jian4
2024-05-08 17:09:47 +0800
-
02870dc385
LLM: Refine README of AutoTP-FastAPI example (#10960)
Xiangyu Tian
2024-05-08 16:55:23 +0800
-
2ebec0395c
optimize phi-3-mini-128 (#10959)
Yishuo Wang
2024-05-08 16:33:17 +0800
-
dfa3147278
update (#10944)
Xin Qiu
2024-05-08 14:28:05 +0800
-
5973d6c753
make gemma's output better (#10943)
Xin Qiu
2024-05-08 14:27:51 +0800
-
15ee3fd542
Update igpu perf internlm (#10958)
Jin Qiao
2024-05-08 14:16:43 +0800
-
0d6e12036f
Disable fast_init_ in load_low_bit (#10945)
Zhao Changmin
2024-05-08 10:46:19 +0800
-
164e6957af
Refine axolotl quickstart (#10957)
Qiyuan Gong
2024-05-08 09:34:02 +0800
-
c801c37bc6
optimize phi3 again: use quantize kv if possible (#10953)
Yishuo Wang
2024-05-07 17:26:19 +0800
-
aa2fa9fde1
optimize phi3 again: use sdp if possible (#10951)
Yishuo Wang
2024-05-07 15:53:08 +0800
-
c11170b96f
Upgrade Peft to 0.10.0 in finetune examples and docker (#10930)
Qiyuan Gong
2024-05-07 15:12:26 +0800
-
d7ca5d935b
Upgrade Peft version to 0.10.0 for LLM finetune (#10886)
Qiyuan Gong
2024-05-07 15:09:14 +0800
-
0efe26c3b6
Change order of chatglm2-6b and chatglm3-6b in iGPU perf test for more stable performance (#10948)
Yuwen Hu
2024-05-07 13:48:39 +0800
-
245c7348bc
Add codegemma example (#10884)
hxsz1997
2024-05-07 13:35:42 +0800
-
08ad40b251
improve ipex-llm-init for Linux (#10928)
Shaojun Liu
2024-05-07 12:55:14 +0800
-
33b8f524c2
Add cpp docker manually_test (#10946)
Wang, Jian4
2024-05-07 11:23:28 +0800
-
191b184341
LLM: Optimize cohere model (#10878)
Wang, Jian4
2024-05-07 10:19:50 +0800
-
13a44cdacb
LLM: Refine Deepspped-AutoTP-FastAPI example (#10916)
Xiangyu Tian
2024-05-07 09:37:31 +0800
-
1de878bee1
LLM: Fix speculative llama3 long input error (#10934)
Wang, Jian4
2024-05-07 09:25:20 +0800
-
49ab5a2b0e
Add embeddings (#10931)
Guancheng Fu
2024-05-07 09:07:02 +0800
-
d649236321
make images clickable (#10939)
Shengsheng Huang
2024-05-06 20:24:15 +0800
-
64938c2ca7
Dify quickstart revision (#10938)
Shengsheng Huang
2024-05-06 19:59:17 +0800
-
3f438495e4
update llama.cpp and ollama quickstart (#10929)
Ruonan Wang
2024-05-06 15:01:06 +0800
-
41ffe1526c
Modify CPU finetune docker for bz2 error (#10919)
Qiyuan Gong
2024-05-06 10:41:50 +0800
-
0e0bd309e2
LLM: Enable Speculative on Fastchat (#10909)
Wang, Jian4
2024-05-06 10:06:20 +0800
-
8379f02a74
Add Dify quickstart (#10903)
Zhicun
2024-05-06 10:01:34 +0800
-
0edef1f94c
LLM: add min_new_tokens to all in one benchmark. (#10911)
Cengguang Zhang
2024-05-06 09:32:59 +0800
-
c78a8e3677
update quickstart (#10923)
Shengsheng Huang
2024-04-30 18:19:31 +0800
-
282d676561
update continue quickstart (#10922)
Shengsheng Huang
2024-04-30 17:51:21 +0800
-
75dbf240ec
LLM: update split tensor conditions. (#10872)
Cengguang Zhang
2024-04-30 17:07:21 +0800
-
71f51ce589
Initial Update for Continue Quickstart with Ollama backend (#10918)
Yuwen Hu
2024-04-30 15:10:30 +0800
-
2c64754eb0
Add vLLM to ipex-llm serving image (#10807)
Guancheng Fu
2024-04-29 17:25:42 +0800
-
1f876fd837
Add example for phi-3 (#10881)
Jin Qiao
2024-04-29 16:43:55 +0800
-
c936ba3b64
Small fix for supporting workflow dispatch in nightly perf (#10908)
Yuwen Hu
2024-04-29 13:25:14 +0800
-
d884c62dc4
remove new_layout parameter (#10906)
Yishuo Wang
2024-04-29 10:31:50 +0800
-
fbcd7bc737
Fix Loader issue with dtype fp16 (#10907)
Guancheng Fu
2024-04-29 10:16:02 +0800