hxsz1997
44f22cba70
add config and default value ( #11344 )
...
* add config and default value
* add config in taml
* remove lookahead and max_matching_ngram_size in config
* remove streaming and use_fp16_torch_dtype in test yaml
* update task in readme
* update commit of task
2024-06-18 15:28:57 +08:00
Shengsheng Huang
1f39bb84c7
update readthedocs perf data ( #11345 )
2024-06-18 13:23:47 +08:00
Heyang Sun
00f322d8ee
Finetune ChatGLM with Deepspeed Zero3 LoRA ( #11314 )
...
* Fintune ChatGLM with Deepspeed Zero3 LoRA
* add deepspeed zero3 config
* rename config
* remove offload_param
* add save_checkpoint parameter
* Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh
* refine
2024-06-18 12:31:26 +08:00
Yina Chen
5dad33e5af
Support fp8_e4m3 scale search ( #11339 )
...
* fp8e4m3 switch off
* fix style
2024-06-18 11:47:43 +08:00
binbin Deng
e50c890e1f
Support finishing PP inference once eos_token_id is found ( #11336 )
2024-06-18 09:55:40 +08:00
Qiyuan Gong
de4bb97b4f
Remove accelerate 0.23.0 install command in readme and docker ( #11333 )
...
*ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。
2024-06-17 17:52:12 +08:00
SONG Ge
ef4b6519fb
Add phi-3 model support for pipeline parallel inference ( #11334 )
...
* add phi-3 model support
* add phi3 example
2024-06-17 17:44:24 +08:00
hxsz1997
99b309928b
Add lookahead in test_api: transformer_int4_fp16_gpu ( #11337 )
...
* add lookahead in test_api:transformer_int4_fp16_gpu
* change the short prompt of summarize
* change short prompt to cnn_64
* change short prompt of summarize
2024-06-17 17:41:41 +08:00
Jason Dai
bc4bafffc7
Update README.md ( #11335 )
2024-06-17 16:24:23 +08:00
Qiyuan Gong
5d7c9bf901
Upgrade accelerate to 0.23.0 ( #11331 )
...
* Upgrade accelerate to 0.23.0
2024-06-17 15:03:11 +08:00
Xin Qiu
183e0c6cf5
glm-4v-9b support ( #11327 )
...
* chatglm4v support
* fix style check
* update glm4v
2024-06-17 13:52:37 +08:00
Wenjing Margaret Mao
bca5cbd96c
Modify arc nightly perf to fp16 ( #11275 )
...
* change api
* move to pr mode and remove the build
* add batch4 yaml and remove the bigcode
* remove batch4
* revert the starcode
* remove the exclude
* revert
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-17 13:47:22 +08:00
Yuwen Hu
a2a5890b48
Make manually-triggered perf test able to choose which test to run ( #11324 )
2024-06-17 10:23:13 +08:00
Yuwen Hu
1978f63f6b
Fix igpu performance guide regarding html generation ( #11328 )
2024-06-17 10:21:30 +08:00
binbin Deng
6ea1e71af0
Update PP inference benchmark script ( #11323 )
2024-06-17 09:59:36 +08:00
SONG Ge
be00380f1a
Fix pipeline parallel inference past_key_value error in Baichuan ( #11318 )
...
* fix past_key_value error
* add baichuan2 example
* fix style
* update doc
* add script link in doc
* fix import error
* update
2024-06-17 09:29:32 +08:00
Yina Chen
0af0102e61
Add quantization scale search switch ( #11326 )
...
* add scale_search switch
* remove llama3 instruct
* remove print
2024-06-14 18:46:52 +08:00
Ruonan Wang
8a3247ac71
support batch forward for q4_k, q6_k ( #11325 )
2024-06-14 18:25:50 +08:00
Yishuo Wang
e8dd8e97ef
fix chatglm lookahead on ARC ( #11320 )
2024-06-14 16:26:11 +08:00
Shaojun Liu
f5ef94046e
exclude dolly-v2-12b for arc perf test ( #11315 )
...
* test arc perf
* test
* test
* exclude dolly-v2-12b:2048
* revert changes
2024-06-14 15:35:56 +08:00
Shaojun Liu
77809be946
Install packages for ipex-llm-serving-cpu docker image ( #11321 )
...
* apt-get install patch
* Update Dockerfile
* Update Dockerfile
* revert
2024-06-14 15:26:01 +08:00
Xiangyu Tian
4359ab3172
LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example ( #11187 )
...
Add /generate_stream and OpenAI-formatted endpoint for Pipeline-Parallel-FastAPI example
2024-06-14 15:15:32 +08:00
Yuwen Hu
9e4d87a696
Langchain-chatchat QuickStart small link fix ( #11317 )
2024-06-14 14:02:17 +08:00
Jin Qiao
0e7a31a09c
ChatGLM Examples Restructure regarding Installation Steps ( #11285 )
...
* merge install step in glm examples
* fix section
* fix section
* fix tiktoken
2024-06-14 12:37:05 +08:00
Yishuo Wang
91965b5d05
add glm_sdpa back to fix chatglm-6b ( #11313 )
2024-06-14 10:31:43 +08:00
Yishuo Wang
7f65836cb9
fix chatglm2/3-32k/128k fp16 ( #11311 )
2024-06-14 09:58:07 +08:00
Xin Qiu
1b0c4c8cb8
use new rotary two in chatglm4 ( #11312 )
...
* use new rotary two in chatglm4
* rempve
2024-06-13 19:02:18 +08:00
Xin Qiu
f1410d6823
refactor chatglm4 ( #11301 )
...
* glm4
* remove useless code
* stype
* add rope_ratio
* update
* fix fp16
* fix style
2024-06-13 18:06:04 +08:00
Yishuo Wang
5e25766855
fix and optimize chatglm2-32k and chatglm3-128k ( #11306 )
2024-06-13 17:37:58 +08:00
binbin Deng
60cb1dac7c
Support PP for qwen1.5 ( #11300 )
2024-06-13 17:35:24 +08:00
binbin Deng
f97cce2642
Fix import error of ds autotp ( #11307 )
2024-06-13 16:22:52 +08:00
Jin Qiao
3682c6a979
add glm4 and qwen2 to igpu perf ( #11304 )
2024-06-13 16:16:35 +08:00
Yishuo Wang
a24666b8f3
fix chatglm3-6b-32k ( #11303 )
2024-06-13 16:01:34 +08:00
Shaojun Liu
9760ffc256
Fix SDLe CT222 Vulnerabilities ( #11237 )
...
* fix ct222 vuln
* update
* fix
* update ENTRYPOINT
* revert ENTRYPOINT
* Fix CT222 Vulns
* fix
* revert changes
* fix
* revert
* add sudo permission to ipex-llm user
* do not use ipex-llm user
2024-06-13 15:31:22 +08:00
Yuwen Hu
bfab294f08
Update langchain-chatchat QuickStart to include Core Ultra iGPU Linux Guide ( #11302 )
2024-06-13 15:09:55 +08:00
Shaojun Liu
84f04087fb
Add intelanalytics/ipex-llm:sources image for OSPDT ( #11296 )
...
* Add intelanalytics/ipex-llm:sources image
* apt-get source
2024-06-13 14:29:14 +08:00
Yishuo Wang
01fe0fc1a2
refactor chatglm2/3 ( #11290 )
2024-06-13 12:22:58 +08:00
Shengsheng Huang
ea372cc472
update demos section ( #11298 )
...
* update demos section
* update format
2024-06-13 11:58:19 +08:00
Guancheng Fu
57a023aadc
Fix vllm tp ( #11297 )
2024-06-13 10:47:48 +08:00
Ruonan Wang
986af21896
fix perf test( #11295 )
2024-06-13 10:35:48 +08:00
binbin Deng
220151e2a1
Refactor pipeline parallel multi-stage implementation ( #11286 )
2024-06-13 10:00:23 +08:00
Ruonan Wang
14b1e6b699
Fix gguf_q4k ( #11293 )
...
* udpate embedding parameter
* update benchmark
2024-06-12 20:43:08 +08:00
Yuwen Hu
8edcdeb0e7
Fix bug that torch.ops.torch_ipex.matmul_bias_out cannot work on Linux MTL for short input ( #11292 )
2024-06-12 19:12:57 +08:00
Wenjing Margaret Mao
b61f6e3ab1
Add update_parent_folder for nightly_perf_test ( #11287 )
...
* add update_parent_folder and change the workflow file
* add update_parent_folder and change the workflow file
* move to pr mode and comment the test
* use one model per comfig
* revert
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-12 17:58:13 +08:00
Guancheng Fu
2e75bbccf9
Add more control arguments for benchmark_vllm_throughput ( #11291 )
2024-06-12 17:43:06 +08:00
Xin Qiu
592f7aa61e
Refine glm1-4 sdp ( #11276 )
...
* chatglm
* update
* update
* change chatglm
* update sdpa
* update
* fix style
* fix
* fix glm
* update glm2-32k
* update glm2-32k
* fix cpu
* update
* change lower_bound
2024-06-12 17:11:56 +08:00
Yuwen Hu
cffb932f05
Expose timeout for streamer for fastchat worker ( #11288 )
...
* Expose timeout for stremer for fastchat worker
* Change to read from env variables
2024-06-12 17:02:40 +08:00
Shengsheng Huang
d99423b75a
Readme demo ( #11283 )
2024-06-12 17:01:53 +08:00
ivy-lv11
e7a4e2296f
Add Stable Diffusion examples on GPU and CPU ( #11166 )
...
* add sdxl and lcm-lora
* readme
* modify
* add cpu
* add license
* modify
* add file
2024-06-12 16:33:25 +08:00
Jin Qiao
f224e98297
Add GLM-4 CPU example ( #11223 )
...
* Add GLM-4 example
* add tiktoken dependency
* fix
* fix
2024-06-12 15:30:51 +08:00