Guancheng Fu
74997a3ed1
Adding load_low_bit interface for ipex_llm_worker ( #11000 )
...
* initial implementation, need tests
* fix
* fix baichuan issue
* fix typo
2024-05-13 15:30:19 +08:00
Yishuo Wang
1b3c7a6928
remove phi3 empty cache ( #10997 )
2024-05-13 14:09:55 +08:00
ZehuaCao
99255fe36e
fix ppl ( #10996 )
2024-05-13 13:57:19 +08:00
Ruonan Wang
04d5a900e1
update troubleshooting of llama.cpp ( #10990 )
...
* update troubleshooting
* small update
2024-05-13 11:18:38 +08:00
Kai Huang
f8dd2e52ad
Fix Langchain upstream ut ( #10985 )
...
* Fix Langchain upstream ut
* Small fix
* Install bigdl-llm
* Update run-langchain-upstream-tests.sh
* Update run-langchain-upstream-tests.sh
* Update llm_unit_tests.yml
* Update run-langchain-upstream-tests.sh
* Update llm_unit_tests.yml
* Update run-langchain-upstream-tests.sh
* fix git checkout
* fix
---------
Co-authored-by: Zhangky11 <2321096202@qq.com>
Co-authored-by: Keyan (Kyrie) Zhang <79576162+Zhangky11@users.noreply.github.com>
2024-05-11 14:40:37 +08:00
Yuwen Hu
9f6358e4c2
Deprecate support for pytorch 2.0 on Linux for ipex-llm >= 2.1.0b20240511 ( #10986 )
...
* Remove xpu_2.0 option in setup.py
* Disable xpu_2.0 test in UT and nightly
* Update docs for deprecated pytorch 2.0
* Small doc update
2024-05-11 12:33:35 +08:00
Ruonan Wang
5e0872073e
add version for llama.cpp and ollama ( #10982 )
...
* add version for cpp
* meet review
2024-05-11 09:20:31 +08:00
Yishuo Wang
ad96f32ce0
optimize phi3 1st token performance ( #10981 )
2024-05-10 17:33:46 +08:00
Cengguang Zhang
cfed76b2ed
LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. ( #10937 )
...
* LLM: add split tensor support for baichuan2-7b and qwen1.5-7b.
* fix style.
* fix style.
* fix style.
* add support for mistral and fix condition threshold.
* fix style.
* fix comments.
2024-05-10 16:40:15 +08:00
binbin Deng
f9615f12d1
Add driver related packages version check in env script ( #10977 )
2024-05-10 15:02:58 +08:00
Kai Huang
a6342cc068
Empty cache after phi first attention to support 4k input ( #10972 )
...
* empty cache
* fix style
2024-05-09 19:50:04 +08:00
Yishuo Wang
e753125880
use fp16_sdp when head_dim=96 ( #10976 )
2024-05-09 17:02:59 +08:00
Ruonan Wang
b7f7d05a7e
update llama.cpp usage of llama3 ( #10975 )
...
* update llama.cpp usage of llama3
* fix
2024-05-09 16:44:12 +08:00
Yishuo Wang
697ca79eca
use quantize kv and sdp in phi3-mini ( #10973 )
2024-05-09 15:16:18 +08:00
Shengsheng Huang
e3159c45e4
update private gpt quickstart and a small fix for dify ( #10969 )
2024-05-09 13:57:45 +08:00
Wang, Jian4
459b764406
Remove munually_build_for_test push outside ( #10968 )
2024-05-09 10:40:34 +08:00
Shengsheng Huang
11df5f9773
revise private GPT quickstart and a few fixes for other quickstart ( #10967 )
2024-05-08 21:18:20 +08:00
Keyan (Kyrie) Zhang
37820e1d86
Add privateGPT quickstart ( #10932 )
...
* Add privateGPT quickstart
* Update privateGPT_quickstart.md
* Update _toc.yml
* Update _toc.yml
---------
Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>
2024-05-08 20:48:00 +08:00
Wang, Jian4
f4c615b1ee
Add cohere example ( #10954 )
...
* add link first
* add_cpu_example
* add GPU example
2024-05-08 17:19:59 +08:00
Zephyr1101
7e7d969dcb
a experimental for workflow abuse step1 fix a typo ( #10965 )
...
* Update llm_unit_tests.yml
* Update README.md
* Update llm_unit_tests.yml
* Update llm_unit_tests.yml
2024-05-08 17:12:50 +08:00
Wang, Jian4
3209d6b057
Fix spculative llama3 no stop error ( #10963 )
...
* fix normal
* add eos_tokens_id on sp and add list if
* update
* no none
2024-05-08 17:09:47 +08:00
Xiangyu Tian
02870dc385
LLM: Refine README of AutoTP-FastAPI example ( #10960 )
2024-05-08 16:55:23 +08:00
Yishuo Wang
2ebec0395c
optimize phi-3-mini-128 ( #10959 )
2024-05-08 16:33:17 +08:00
Xin Qiu
dfa3147278
update ( #10944 )
2024-05-08 14:28:05 +08:00
Xin Qiu
5973d6c753
make gemma's output better ( #10943 )
2024-05-08 14:27:51 +08:00
Jin Qiao
15ee3fd542
Update igpu perf internlm ( #10958 )
2024-05-08 14:16:43 +08:00
Zhao Changmin
0d6e12036f
Disable fast_init_ in load_low_bit ( #10945 )
...
* fast_init_ disable
2024-05-08 10:46:19 +08:00
Qiyuan Gong
164e6957af
Refine axolotl quickstart ( #10957 )
...
* Add default accelerate config for axolotl quickstart.
* Fix requirement link.
* Upgrade peft to 0.10.0 in requirement.
2024-05-08 09:34:02 +08:00
Yishuo Wang
c801c37bc6
optimize phi3 again: use quantize kv if possible ( #10953 )
2024-05-07 17:26:19 +08:00
Yishuo Wang
aa2fa9fde1
optimize phi3 again: use sdp if possible ( #10951 )
2024-05-07 15:53:08 +08:00
Qiyuan Gong
c11170b96f
Upgrade Peft to 0.10.0 in finetune examples and docker ( #10930 )
...
* Upgrade Peft to 0.10.0 in finetune examples.
* Upgrade Peft to 0.10.0 in docker.
2024-05-07 15:12:26 +08:00
Qiyuan Gong
d7ca5d935b
Upgrade Peft version to 0.10.0 for LLM finetune ( #10886 )
...
* Upgrade Peft version to 0.10.0
* Upgrade Peft version in ARC unit test and HF-Peft example.
2024-05-07 15:09:14 +08:00
Yuwen Hu
0efe26c3b6
Change order of chatglm2-6b and chatglm3-6b in iGPU perf test for more stable performance ( #10948 )
2024-05-07 13:48:39 +08:00
hxsz1997
245c7348bc
Add codegemma example ( #10884 )
...
* add codegemma example in GPU/HF-Transformers-AutoModels/
* add README of codegemma example in GPU/HF-Transformers-AutoModels/
* add codegemma example in GPU/PyTorch-Models/
* add readme of codegemma example in GPU/PyTorch-Models/
* add codegemma example in CPU/HF-Transformers-AutoModels/
* add readme of codegemma example in CPU/HF-Transformers-AutoModels/
* add codegemma example in CPU/PyTorch-Models/
* add readme of codegemma example in CPU/PyTorch-Models/
* fix typos
* fix filename typo
* add codegemma in tables
* add comments of lm_head
* remove comments of use_cache
2024-05-07 13:35:42 +08:00
Shaojun Liu
08ad40b251
improve ipex-llm-init for Linux ( #10928 )
...
* refine ipex-llm-init
* install libtcmalloc.so for Max
* update based on comment
* remove unneeded code
2024-05-07 12:55:14 +08:00
Wang, Jian4
33b8f524c2
Add cpp docker manually_test ( #10946 )
...
* add cpp docker
* update
2024-05-07 11:23:28 +08:00
Wang, Jian4
191b184341
LLM: Optimize cohere model ( #10878 )
...
* use mlp and rms
* optimize kv_cache
* add fuse qkv
* add flash attention and fp16 sdp
* error fp8 sdp
* fix optimized
* fix style
* update
* add for pp
2024-05-07 10:19:50 +08:00
Xiangyu Tian
13a44cdacb
LLM: Refine Deepspped-AutoTP-FastAPI example ( #10916 )
2024-05-07 09:37:31 +08:00
Wang, Jian4
1de878bee1
LLM: Fix speculative llama3 long input error ( #10934 )
2024-05-07 09:25:20 +08:00
Guancheng Fu
49ab5a2b0e
Add embeddings ( #10931 )
2024-05-07 09:07:02 +08:00
Shengsheng Huang
d649236321
make images clickable ( #10939 )
2024-05-06 20:24:15 +08:00
Shengsheng Huang
64938c2ca7
Dify quickstart revision ( #10938 )
...
* revise dify quickstart guide
* update quick links and a small typo
2024-05-06 19:59:17 +08:00
Ruonan Wang
3f438495e4
update llama.cpp and ollama quickstart ( #10929 )
2024-05-06 15:01:06 +08:00
Qiyuan Gong
41ffe1526c
Modify CPU finetune docker for bz2 error ( #10919 )
...
* Avoid bz2 error
* change to cpu torch
2024-05-06 10:41:50 +08:00
Wang, Jian4
0e0bd309e2
LLM: Enable Speculative on Fastchat ( #10909 )
...
* init
* enable streamer
* update
* update
* remove deprecated
* update
* update
* add gpu example
2024-05-06 10:06:20 +08:00
Zhicun
8379f02a74
Add Dify quickstart ( #10903 )
...
* add quick start
* modify
* modify
* add
* add
* resize
* add mp4
* add vedio
* add video
* video
* add
* modify
* add
* modify
2024-05-06 10:01:34 +08:00
Cengguang Zhang
0edef1f94c
LLM: add min_new_tokens to all in one benchmark. ( #10911 )
2024-05-06 09:32:59 +08:00
Shengsheng Huang
c78a8e3677
update quickstart ( #10923 )
2024-04-30 18:19:31 +08:00
Shengsheng Huang
282d676561
update continue quickstart ( #10922 )
2024-04-30 17:51:21 +08:00
Cengguang Zhang
75dbf240ec
LLM: update split tensor conditions. ( #10872 )
...
* LLM: update split tensor condition.
* add cond for split tensor.
* update priority of env.
* fix style.
* update env name.
2024-04-30 17:07:21 +08:00