Commit graph

3017 commits

Author SHA1 Message Date
Shaojun Liu
5aa3e427a9
Fix docker images (#11362)
* Fix docker images

* add-apt-repository requires gnupg, gpg-agent, software-properties-common

* update

* avoid importing ipex again
2024-06-20 15:44:55 +08:00
Yuwen Hu
d9dd1b70bd
Remove example page in mddocs (#11373) 2024-06-20 14:23:43 +08:00
Wenjing Margaret Mao
c0e86c523a
Add qwen-moe batch1 to nightly perf (#11369)
* add moe

* reduce 437 models

* rename

* fix syntax

* add moe check result

* add 430 + 437

* all modes

* 4-37-4 exclud

* revert & comment

---------

Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-20 14:17:41 +08:00
Yuwen Hu
769728c1eb
Add initial md docs (#11371) 2024-06-20 13:47:49 +08:00
Shengsheng Huang
9601fae5d5
fix system note (#11368) 2024-06-20 11:09:53 +08:00
Yishuo Wang
a5e7d93242
Add initial save/load low bit support for NPU(now only fp16 is supported) (#11359) 2024-06-20 10:49:39 +08:00
Shengsheng Huang
ed4c439497
small fix (#11366) 2024-06-20 10:38:20 +08:00
RyuKosei
05a8d051f6
Fix run.py run_ipex_fp16_gpu (#11361)
* fix a bug on run.py

* Update run.py

fixed the format problem

---------

Co-authored-by: sgwhat <ge.song@intel.com>
2024-06-20 10:29:32 +08:00
Wenjing Margaret Mao
b2f62a8561
Add batch 4 perf test (#11355)
* copy files to this branch

* add tasks

* comment one model

* change the model to test the 4.36

* only test batch-4

* typo

* typo

* typo

* typo

* typo

* typo

* add 4.37-batch4

* change the file name

* revet yaml file

* no print

* add batch4 task

* revert

---------

Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-20 09:48:52 +08:00
Shengsheng Huang
a721c1ae43
minor fix of ragflow_quickstart.md (#11364) 2024-06-19 22:30:33 +08:00
Shengsheng Huang
13727635e8
revise ragflow quickstart (#11363)
* revise ragflow quickstart

* update titles and split the quickstart into sections

* update
2024-06-19 22:24:31 +08:00
Zijie Li
5283df0078
LLM: Add RAGFlow with Ollama Example QuickStart (#11338)
* Create ragflow.md

* Update ragflow.md

* Update ragflow_quickstart

* Update ragflow_quickstart.md

* Upload RAGFlow quickstart without images

* Update ragflow_quickstart.md

* Update ragflow_quickstart.md

* Update ragflow_quickstart.md

* Update ragflow_quickstart.md

* fix typos in readme

* Fix typos in quickstart readme
2024-06-19 20:00:50 +08:00
Zijie Li
ae452688c2
Add NPU HF example (#11358) 2024-06-19 18:07:28 +08:00
Qiyuan Gong
1eb884a249
IPEX Duplicate importer V2 (#11310)
* Add gguf support.
* Avoid error when import ipex-llm for multiple times.
* Add check to avoid duplicate replace and revert.
* Add calling from check to avoid raising exceptions in the submodule.
* Add BIGDL_CHECK_DUPLICATE_IMPORT for controlling duplicate checker. Default is true.
2024-06-19 16:29:19 +08:00
Jason Dai
271d82a4fc
Update readme (#11357) 2024-06-19 10:05:42 +08:00
Yishuo Wang
ae7b662ed2
add fp16 NPU Linear support and fix intel_npu_acceleration_library version 1.0 support (#11352) 2024-06-19 09:14:59 +08:00
Guoqiong Song
c44b1942ed
fix mistral for transformers>=4.39 (#11191)
* fix mistral for transformers>=4.39
2024-06-18 13:39:35 -07:00
Heyang Sun
67a1e05876
Remove zero3 context manager from LoRA (#11346) 2024-06-18 17:24:43 +08:00
Xiangyu Tian
f6cd628cd8
Fix script usage in vLLM CPU Quickstart (#11353) 2024-06-18 16:50:48 +08:00
Xiangyu Tian
ef9f740801
Docs: Fix CPU Serving Docker README (#11351)
Fix CPU Serving Docker README
2024-06-18 16:27:51 +08:00
Guancheng Fu
c9b4cadd81
fix vLLM/docker issues (#11348)
* fix

* fix

* ffix
2024-06-18 16:23:53 +08:00
Yishuo Wang
83082e5cc7
add initial support for intel npu acceleration library (#11347) 2024-06-18 16:07:16 +08:00
Shaojun Liu
694912698e
Upgrade scikit-learn to 1.5.0 to fix dependabot issue (#11349) 2024-06-18 15:47:25 +08:00
hxsz1997
44f22cba70
add config and default value (#11344)
* add config and default value

* add config in taml

* remove lookahead and max_matching_ngram_size in config

* remove streaming and use_fp16_torch_dtype in test yaml

* update task in readme

* update commit of task
2024-06-18 15:28:57 +08:00
Shengsheng Huang
1f39bb84c7
update readthedocs perf data (#11345) 2024-06-18 13:23:47 +08:00
Heyang Sun
00f322d8ee
Finetune ChatGLM with Deepspeed Zero3 LoRA (#11314)
* Fintune ChatGLM with Deepspeed Zero3 LoRA

* add deepspeed zero3 config

* rename config

* remove offload_param

* add save_checkpoint parameter

* Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh

* refine
2024-06-18 12:31:26 +08:00
Yina Chen
5dad33e5af
Support fp8_e4m3 scale search (#11339)
* fp8e4m3 switch off

* fix style
2024-06-18 11:47:43 +08:00
binbin Deng
e50c890e1f
Support finishing PP inference once eos_token_id is found (#11336) 2024-06-18 09:55:40 +08:00
Qiyuan Gong
de4bb97b4f
Remove accelerate 0.23.0 install command in readme and docker (#11333)
*ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。
2024-06-17 17:52:12 +08:00
SONG Ge
ef4b6519fb
Add phi-3 model support for pipeline parallel inference (#11334)
* add phi-3 model support

* add phi3 example
2024-06-17 17:44:24 +08:00
hxsz1997
99b309928b
Add lookahead in test_api: transformer_int4_fp16_gpu (#11337)
* add lookahead in test_api:transformer_int4_fp16_gpu

* change the short prompt of summarize

* change short prompt to cnn_64

* change short prompt of summarize
2024-06-17 17:41:41 +08:00
Jason Dai
bc4bafffc7
Update README.md (#11335) 2024-06-17 16:24:23 +08:00
Qiyuan Gong
5d7c9bf901
Upgrade accelerate to 0.23.0 (#11331)
* Upgrade accelerate to 0.23.0
2024-06-17 15:03:11 +08:00
Xin Qiu
183e0c6cf5
glm-4v-9b support (#11327)
* chatglm4v support

* fix style check

* update glm4v
2024-06-17 13:52:37 +08:00
Wenjing Margaret Mao
bca5cbd96c
Modify arc nightly perf to fp16 (#11275)
* change api

* move to pr mode and remove the build

* add batch4 yaml and remove the bigcode

* remove batch4

* revert the starcode

* remove the exclude

* revert

---------

Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-17 13:47:22 +08:00
Yuwen Hu
a2a5890b48
Make manually-triggered perf test able to choose which test to run (#11324) 2024-06-17 10:23:13 +08:00
Yuwen Hu
1978f63f6b
Fix igpu performance guide regarding html generation (#11328) 2024-06-17 10:21:30 +08:00
binbin Deng
6ea1e71af0
Update PP inference benchmark script (#11323) 2024-06-17 09:59:36 +08:00
SONG Ge
be00380f1a
Fix pipeline parallel inference past_key_value error in Baichuan (#11318)
* fix past_key_value error

* add baichuan2 example

* fix style

* update doc

* add script link in doc

* fix import error

* update
2024-06-17 09:29:32 +08:00
Yina Chen
0af0102e61
Add quantization scale search switch (#11326)
* add scale_search switch

* remove llama3 instruct

* remove print
2024-06-14 18:46:52 +08:00
Ruonan Wang
8a3247ac71
support batch forward for q4_k, q6_k (#11325) 2024-06-14 18:25:50 +08:00
Yishuo Wang
e8dd8e97ef
fix chatglm lookahead on ARC (#11320) 2024-06-14 16:26:11 +08:00
Shaojun Liu
f5ef94046e
exclude dolly-v2-12b for arc perf test (#11315)
* test arc perf

* test

* test

* exclude dolly-v2-12b:2048

* revert changes
2024-06-14 15:35:56 +08:00
Shaojun Liu
77809be946
Install packages for ipex-llm-serving-cpu docker image (#11321)
* apt-get install patch

* Update Dockerfile

* Update Dockerfile

* revert
2024-06-14 15:26:01 +08:00
Xiangyu Tian
4359ab3172
LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example (#11187)
Add /generate_stream and OpenAI-formatted endpoint for Pipeline-Parallel-FastAPI example
2024-06-14 15:15:32 +08:00
Yuwen Hu
9e4d87a696
Langchain-chatchat QuickStart small link fix (#11317) 2024-06-14 14:02:17 +08:00
Jin Qiao
0e7a31a09c
ChatGLM Examples Restructure regarding Installation Steps (#11285)
* merge install step in glm examples

* fix section

* fix section

* fix tiktoken
2024-06-14 12:37:05 +08:00
Yishuo Wang
91965b5d05
add glm_sdpa back to fix chatglm-6b (#11313) 2024-06-14 10:31:43 +08:00
Yishuo Wang
7f65836cb9
fix chatglm2/3-32k/128k fp16 (#11311) 2024-06-14 09:58:07 +08:00
Xin Qiu
1b0c4c8cb8
use new rotary two in chatglm4 (#11312)
* use new rotary two in chatglm4

* rempve
2024-06-13 19:02:18 +08:00