Commit graph

2793 commits

Author SHA1 Message Date
Yina Chen
8811f268ff
Use new fp16 sdp in Qwen and modify the constraint (#10882) 2024-04-25 19:23:37 +08:00
Yuxuan Xia
0213c1c1da
Add phi3 to the nightly test (#10885)
* Add llama3 and phi2 nightly test

* Change llama3-8b to llama3-8b-instruct

* Add phi3 to nightly test

* Add phi3 to nightly test

---------

Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-04-25 17:39:12 +08:00
Yuxuan Xia
ca2479be87
Update scripts readme (#10725)
* Update scripts readme

* Update scripts readme

* Update README

* Update readme

* Update readme

* Update windows env check readme

* Adjust env check readme

* Update windows env check

* Update env check readme

* Adjust the env-check README

* Modify the env-check README
2024-04-25 17:24:37 +08:00
Cengguang Zhang
cd369c2715
LLM: add device id to benchmark utils. (#10877) 2024-04-25 14:01:51 +08:00
Yang Wang
1ce8d7bcd9
Support the desc_act feature in GPTQ model (#10851)
* support act_order

* update versions

* fix style

* fix bug

* clean up
2024-04-24 10:17:13 -07:00
Yina Chen
dc27b3bc35
Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) (#10790)
* update sdp condition

* update

* fix

* update & test llama

* mistral

* fix style

* update

* fix style

* remove pvc constrain

* update ds on arc

* fix style
2024-04-24 17:24:01 +08:00
Yuxuan Xia
844e18b1db
Add llama3 and phi2 nightly test (#10874)
* Add llama3 and phi2 nightly test

* Change llama3-8b to llama3-8b-instruct

---------

Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-04-24 16:58:56 +08:00
Qiyuan Gong
634726211a
Add video to axolotl quick start (#10870)
* Add video to axolotl quick start.
* Fix wget url.
2024-04-24 16:53:14 +08:00
binbin Deng
c9feffff9a
LLM: support Qwen1.5-MoE-A2.7B-Chat pipeline parallel inference (#10864) 2024-04-24 16:02:27 +08:00
Yishuo Wang
2d210817ff
add phi3 optimization (#10871) 2024-04-24 15:17:40 +08:00
Cengguang Zhang
eb39c61607
LLM: add min new token to perf test. (#10869) 2024-04-24 14:32:02 +08:00
Yuwen Hu
fb2a160af3
Add phi-2 to 2048-256 test for fixes (#10867) 2024-04-24 10:00:25 +08:00
binbin Deng
fabf54e052
LLM: make pipeline parallel inference example more common (#10786) 2024-04-24 09:28:52 +08:00
hxsz1997
328b1a1de9
Fix the not stop issue of llama3 examples (#10860)
* fix not stop issue in GPU/HF-Transformers-AutoModels

* fix not stop issue in GPU/PyTorch-Models/Model/llama3

* fix not stop issue in CPU/HF-Transformers-AutoModels/Model/llama3

* fix not stop issue in CPU/PyTorch-Models/Model/llama3

* update the output in readme

* update format

* add reference

* update prompt format

* update output format in readme

* update example output in readme
2024-04-23 19:10:09 +08:00
Yuwen Hu
5c9eb5d0f5
Support llama-index install option for upstreaming purposes (#10866)
* Support llama-index install option for upstreaming purposes

* Small fix

* Small fix
2024-04-23 19:08:29 +08:00
Yuwen Hu
21bb8bd164
Add phi-2 to igpu performance test (#10865) 2024-04-23 18:13:14 +08:00
ZehuaCao
36eb8b2e96
Add llama3 speculative example (#10856)
* Initial llama3 speculative example

* update README

* update README

* update README
2024-04-23 17:03:54 +08:00
Zhicun
a017bf2981
add quick start for dify (#10813)
* add quick start

* modify

* modify

* add

* add

* resize

* add mp4

* add vedio

* add video

* video

* add
2024-04-23 16:32:22 +08:00
Cengguang Zhang
763413b7e1
LLM: support llama split tensor for long context in transformers>=4.36. (#10844)
* LLm: support llama split tensor for long context in transformers>=4.36.

* fix dtype.

* fix style.

* fix style.

* fix style.

* fix style.

* fix dtype.

* fix style.
2024-04-23 16:13:25 +08:00
Qiyuan Gong
bce99a5b00
Minior fix for quick start (#10857)
* Fix typo and space in quick start.
2024-04-23 15:22:01 +08:00
Qiyuan Gong
5eee1976ac
Add Axolotl v0.4.0 quickstart (#10840)
* Add Axolotl v0.4.0 quickstart
2024-04-23 14:57:34 +08:00
ZehuaCao
92ea54b512
Fix speculative decoding bug (#10855) 2024-04-23 14:28:31 +08:00
yb-peng
c9dee6cd0e
Update 8192.txt (#10824)
* Update 8192.txt

* Update 8192.txt with original text
2024-04-23 14:02:09 +08:00
Wang, Jian4
18c032652d
LLM: Add mixtral speculative CPU example (#10830)
* init mixtral sp example

* use different prompt_format

* update output

* update
2024-04-23 10:05:51 +08:00
Qiyuan Gong
5494aa55f6
Downgrade datasets in axolotl example (#10849)
* Downgrade datasets to 2.15.0 to address axolotl prepare issue https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544

Tks to @kwaa for providing the solution in https://github.com/intel-analytics/ipex-llm/issues/10821#issuecomment-2068861571
2024-04-23 09:41:58 +08:00
Ruonan Wang
2ec45c49d3
fix ollama quickstart(#10846) 2024-04-22 22:04:49 +08:00
Yishuo Wang
fe5a082b84
add phi-2 optimization (#10843) 2024-04-22 18:56:47 +08:00
Guancheng Fu
47bd5f504c
[vLLM]Remove vllm-v1, refactor v2 (#10842)
* remove vllm-v1

* fix format
2024-04-22 17:51:32 +08:00
Wang, Jian4
23c6a52fb0
LLM: Fix ipex torchscript=True error (#10832)
* remove

* update

* remove torchscript
2024-04-22 15:53:09 +08:00
Heyang Sun
fc33aa3721
fix missing import (#10839) 2024-04-22 14:34:52 +08:00
Guancheng Fu
3b82834aaf
Update README.md (#10838) 2024-04-22 14:18:51 +08:00
Yina Chen
3daad242b8
Fix No module named 'transformers.cache_utils' with transformers < 4.36 (#10835)
* update sdp condition

* update

* fix

* fix 431 error

* revert sdp & style fix

* fix

* meet comments
2024-04-22 14:05:50 +08:00
Ruonan Wang
c6e868f7ad
update oneapi usage in cpp quickstart (#10836)
* update oneapi usage

* update

* small fix
2024-04-22 11:48:05 +08:00
Guancheng Fu
ae3b577537
Update README.md (#10833) 2024-04-22 11:07:10 +08:00
Wang, Jian4
5f95054f97
LLM:Add qwen moe example libs md (#10828) 2024-04-22 10:03:19 +08:00
Ruonan Wang
1edb19c1dd
small fix of cpp quickstart(#10829) 2024-04-22 09:44:08 +08:00
Guancheng Fu
61c67af386
Fix vLLM-v2 install instructions(#10822) 2024-04-22 09:02:48 +08:00
Jason Dai
3cd21d5105
Update readme (#10817) 2024-04-19 22:16:17 +08:00
SONG Ge
197f8dece9
Add open-webui windows document (#10775)
* add windows document

* update

* fix document

* build fix

* update some description

* reorg document structure

* update doc

* re-update to better view

* add reminder for running model on gpus

* update

* remove useless part
2024-04-19 18:06:40 +08:00
Ruonan Wang
a8df429985
QuickStart: Run Llama 3 on Intel GPU using llama.cpp and ollama with IPEX-LLM (#10809)
* initial commit

* update llama.cpp

* add demo video at first

* fix ollama link in readme

* meet review

* update

* small fix
2024-04-19 17:44:59 +08:00
Guancheng Fu
caf75beef8
Disable sdpa (#10814) 2024-04-19 17:33:18 +08:00
Yishuo Wang
57edf2033c
fix lookahead with transformers >= 4.36 (#10808) 2024-04-19 16:24:56 +08:00
Yuwen Hu
34ff07b689
Add CPU related info to langchain-chatchat quickstart (#10812) 2024-04-19 15:59:51 +08:00
Ovo233
1a885020ee
Updated importing of top_k_top_p_filtering for transformers>=4.39.0 (#10794)
* In transformers>=4.39.0, the top_k_top_p_filtering function has been deprecated and moved to the hugging face package trl. Thus, for versions >= 4.39.0, import this function from trl.
2024-04-19 15:34:39 +08:00
Yuwen Hu
07e8b045a9
Add Meta-llama-3-8B-Instruct and Yi-6B-Chat to igpu nightly perf (#10810) 2024-04-19 15:09:58 +08:00
SONG Ge
fbd1743b5e
Ollama quickstart update (#10806)
* add ollama doc for OLLAMA_NUM_GPU

* remove useless params

* revert unexpected changes back

* move env setting to server part

* update
2024-04-19 15:00:25 +08:00
Yishuo Wang
08458b4f74
remove rms norm copy (#10793) 2024-04-19 13:57:48 +08:00
Yuwen Hu
c7235e34a8
Small update to ut (#10804) 2024-04-19 10:59:00 +08:00
Jason Dai
995c01367d
Update readme (#10802) 2024-04-19 06:52:57 +08:00
Yang Wang
8153c3008e
Initial llama3 example (#10799)
* Add initial hf huggingface GPU example

* Small fix

* Add llama3 gpu pytorch model example

* Add llama 3 hf transformers CPU example

* Add llama 3 pytorch model CPU example

* Fixes

* Small fix

* Small fixes

* Small fix

* Small fix

* Add links

* update repo id

* change prompt tuning url

* remove system header if there is no system prompt

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com>
2024-04-18 11:01:33 -07:00