Shaojun Liu
5aa3e427a9
Fix docker images ( #11362 )
...
* Fix docker images
* add-apt-repository requires gnupg, gpg-agent, software-properties-common
* update
* avoid importing ipex again
2024-06-20 15:44:55 +08:00
Yuwen Hu
d9dd1b70bd
Remove example page in mddocs ( #11373 )
2024-06-20 14:23:43 +08:00
Wenjing Margaret Mao
c0e86c523a
Add qwen-moe batch1 to nightly perf ( #11369 )
...
* add moe
* reduce 437 models
* rename
* fix syntax
* add moe check result
* add 430 + 437
* all modes
* 4-37-4 exclud
* revert & comment
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-20 14:17:41 +08:00
Yuwen Hu
769728c1eb
Add initial md docs ( #11371 )
2024-06-20 13:47:49 +08:00
Shengsheng Huang
9601fae5d5
fix system note ( #11368 )
2024-06-20 11:09:53 +08:00
Yishuo Wang
a5e7d93242
Add initial save/load low bit support for NPU(now only fp16 is supported) ( #11359 )
2024-06-20 10:49:39 +08:00
Shengsheng Huang
ed4c439497
small fix ( #11366 )
2024-06-20 10:38:20 +08:00
RyuKosei
05a8d051f6
Fix run.py run_ipex_fp16_gpu ( #11361 )
...
* fix a bug on run.py
* Update run.py
fixed the format problem
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2024-06-20 10:29:32 +08:00
Wenjing Margaret Mao
b2f62a8561
Add batch 4 perf test ( #11355 )
...
* copy files to this branch
* add tasks
* comment one model
* change the model to test the 4.36
* only test batch-4
* typo
* typo
* typo
* typo
* typo
* typo
* add 4.37-batch4
* change the file name
* revet yaml file
* no print
* add batch4 task
* revert
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-20 09:48:52 +08:00
Shengsheng Huang
a721c1ae43
minor fix of ragflow_quickstart.md ( #11364 )
2024-06-19 22:30:33 +08:00
Shengsheng Huang
13727635e8
revise ragflow quickstart ( #11363 )
...
* revise ragflow quickstart
* update titles and split the quickstart into sections
* update
2024-06-19 22:24:31 +08:00
Zijie Li
5283df0078
LLM: Add RAGFlow with Ollama Example QuickStart ( #11338 )
...
* Create ragflow.md
* Update ragflow.md
* Update ragflow_quickstart
* Update ragflow_quickstart.md
* Upload RAGFlow quickstart without images
* Update ragflow_quickstart.md
* Update ragflow_quickstart.md
* Update ragflow_quickstart.md
* Update ragflow_quickstart.md
* fix typos in readme
* Fix typos in quickstart readme
2024-06-19 20:00:50 +08:00
Zijie Li
ae452688c2
Add NPU HF example ( #11358 )
2024-06-19 18:07:28 +08:00
Qiyuan Gong
1eb884a249
IPEX Duplicate importer V2 ( #11310 )
...
* Add gguf support.
* Avoid error when import ipex-llm for multiple times.
* Add check to avoid duplicate replace and revert.
* Add calling from check to avoid raising exceptions in the submodule.
* Add BIGDL_CHECK_DUPLICATE_IMPORT for controlling duplicate checker. Default is true.
2024-06-19 16:29:19 +08:00
Jason Dai
271d82a4fc
Update readme ( #11357 )
2024-06-19 10:05:42 +08:00
Yishuo Wang
ae7b662ed2
add fp16 NPU Linear support and fix intel_npu_acceleration_library version 1.0 support ( #11352 )
2024-06-19 09:14:59 +08:00
Guoqiong Song
c44b1942ed
fix mistral for transformers>=4.39 ( #11191 )
...
* fix mistral for transformers>=4.39
2024-06-18 13:39:35 -07:00
Heyang Sun
67a1e05876
Remove zero3 context manager from LoRA ( #11346 )
2024-06-18 17:24:43 +08:00
Xiangyu Tian
f6cd628cd8
Fix script usage in vLLM CPU Quickstart ( #11353 )
2024-06-18 16:50:48 +08:00
Xiangyu Tian
ef9f740801
Docs: Fix CPU Serving Docker README ( #11351 )
...
Fix CPU Serving Docker README
2024-06-18 16:27:51 +08:00
Guancheng Fu
c9b4cadd81
fix vLLM/docker issues ( #11348 )
...
* fix
* fix
* ffix
2024-06-18 16:23:53 +08:00
Yishuo Wang
83082e5cc7
add initial support for intel npu acceleration library ( #11347 )
2024-06-18 16:07:16 +08:00
Shaojun Liu
694912698e
Upgrade scikit-learn to 1.5.0 to fix dependabot issue ( #11349 )
2024-06-18 15:47:25 +08:00
hxsz1997
44f22cba70
add config and default value ( #11344 )
...
* add config and default value
* add config in taml
* remove lookahead and max_matching_ngram_size in config
* remove streaming and use_fp16_torch_dtype in test yaml
* update task in readme
* update commit of task
2024-06-18 15:28:57 +08:00
Shengsheng Huang
1f39bb84c7
update readthedocs perf data ( #11345 )
2024-06-18 13:23:47 +08:00
Heyang Sun
00f322d8ee
Finetune ChatGLM with Deepspeed Zero3 LoRA ( #11314 )
...
* Fintune ChatGLM with Deepspeed Zero3 LoRA
* add deepspeed zero3 config
* rename config
* remove offload_param
* add save_checkpoint parameter
* Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh
* refine
2024-06-18 12:31:26 +08:00
Yina Chen
5dad33e5af
Support fp8_e4m3 scale search ( #11339 )
...
* fp8e4m3 switch off
* fix style
2024-06-18 11:47:43 +08:00
binbin Deng
e50c890e1f
Support finishing PP inference once eos_token_id is found ( #11336 )
2024-06-18 09:55:40 +08:00
Qiyuan Gong
de4bb97b4f
Remove accelerate 0.23.0 install command in readme and docker ( #11333 )
...
*ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。
2024-06-17 17:52:12 +08:00
SONG Ge
ef4b6519fb
Add phi-3 model support for pipeline parallel inference ( #11334 )
...
* add phi-3 model support
* add phi3 example
2024-06-17 17:44:24 +08:00
hxsz1997
99b309928b
Add lookahead in test_api: transformer_int4_fp16_gpu ( #11337 )
...
* add lookahead in test_api:transformer_int4_fp16_gpu
* change the short prompt of summarize
* change short prompt to cnn_64
* change short prompt of summarize
2024-06-17 17:41:41 +08:00
Jason Dai
bc4bafffc7
Update README.md ( #11335 )
2024-06-17 16:24:23 +08:00
Qiyuan Gong
5d7c9bf901
Upgrade accelerate to 0.23.0 ( #11331 )
...
* Upgrade accelerate to 0.23.0
2024-06-17 15:03:11 +08:00
Xin Qiu
183e0c6cf5
glm-4v-9b support ( #11327 )
...
* chatglm4v support
* fix style check
* update glm4v
2024-06-17 13:52:37 +08:00
Wenjing Margaret Mao
bca5cbd96c
Modify arc nightly perf to fp16 ( #11275 )
...
* change api
* move to pr mode and remove the build
* add batch4 yaml and remove the bigcode
* remove batch4
* revert the starcode
* remove the exclude
* revert
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-17 13:47:22 +08:00
Yuwen Hu
a2a5890b48
Make manually-triggered perf test able to choose which test to run ( #11324 )
2024-06-17 10:23:13 +08:00
Yuwen Hu
1978f63f6b
Fix igpu performance guide regarding html generation ( #11328 )
2024-06-17 10:21:30 +08:00
binbin Deng
6ea1e71af0
Update PP inference benchmark script ( #11323 )
2024-06-17 09:59:36 +08:00
SONG Ge
be00380f1a
Fix pipeline parallel inference past_key_value error in Baichuan ( #11318 )
...
* fix past_key_value error
* add baichuan2 example
* fix style
* update doc
* add script link in doc
* fix import error
* update
2024-06-17 09:29:32 +08:00
Yina Chen
0af0102e61
Add quantization scale search switch ( #11326 )
...
* add scale_search switch
* remove llama3 instruct
* remove print
2024-06-14 18:46:52 +08:00
Ruonan Wang
8a3247ac71
support batch forward for q4_k, q6_k ( #11325 )
2024-06-14 18:25:50 +08:00
Yishuo Wang
e8dd8e97ef
fix chatglm lookahead on ARC ( #11320 )
2024-06-14 16:26:11 +08:00
Shaojun Liu
f5ef94046e
exclude dolly-v2-12b for arc perf test ( #11315 )
...
* test arc perf
* test
* test
* exclude dolly-v2-12b:2048
* revert changes
2024-06-14 15:35:56 +08:00
Shaojun Liu
77809be946
Install packages for ipex-llm-serving-cpu docker image ( #11321 )
...
* apt-get install patch
* Update Dockerfile
* Update Dockerfile
* revert
2024-06-14 15:26:01 +08:00
Xiangyu Tian
4359ab3172
LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example ( #11187 )
...
Add /generate_stream and OpenAI-formatted endpoint for Pipeline-Parallel-FastAPI example
2024-06-14 15:15:32 +08:00
Yuwen Hu
9e4d87a696
Langchain-chatchat QuickStart small link fix ( #11317 )
2024-06-14 14:02:17 +08:00
Jin Qiao
0e7a31a09c
ChatGLM Examples Restructure regarding Installation Steps ( #11285 )
...
* merge install step in glm examples
* fix section
* fix section
* fix tiktoken
2024-06-14 12:37:05 +08:00
Yishuo Wang
91965b5d05
add glm_sdpa back to fix chatglm-6b ( #11313 )
2024-06-14 10:31:43 +08:00
Yishuo Wang
7f65836cb9
fix chatglm2/3-32k/128k fp16 ( #11311 )
2024-06-14 09:58:07 +08:00
Xin Qiu
1b0c4c8cb8
use new rotary two in chatglm4 ( #11312 )
...
* use new rotary two in chatglm4
* rempve
2024-06-13 19:02:18 +08:00