ipex-llm

Author	SHA1	Message	Date
Shaojun Liu	5aa3e427a9	Fix docker images (#11362 ) * Fix docker images * add-apt-repository requires gnupg, gpg-agent, software-properties-common * update * avoid importing ipex again	2024-06-20 15:44:55 +08:00
Yuwen Hu	d9dd1b70bd	Remove example page in mddocs (#11373 )	2024-06-20 14:23:43 +08:00
Wenjing Margaret Mao	c0e86c523a	Add qwen-moe batch1 to nightly perf (#11369 ) * add moe * reduce 437 models * rename * fix syntax * add moe check result * add 430 + 437 * all modes * 4-37-4 exclud * revert & comment --------- Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>	2024-06-20 14:17:41 +08:00
Yuwen Hu	769728c1eb	Add initial md docs (#11371 )	2024-06-20 13:47:49 +08:00
Shengsheng Huang	9601fae5d5	fix system note (#11368 )	2024-06-20 11:09:53 +08:00
Yishuo Wang	a5e7d93242	Add initial save/load low bit support for NPU(now only fp16 is supported) (#11359 )	2024-06-20 10:49:39 +08:00
Shengsheng Huang	ed4c439497	small fix (#11366 )	2024-06-20 10:38:20 +08:00
RyuKosei	05a8d051f6	Fix run.py run_ipex_fp16_gpu (#11361 ) * fix a bug on run.py * Update run.py fixed the format problem --------- Co-authored-by: sgwhat <ge.song@intel.com>	2024-06-20 10:29:32 +08:00
Wenjing Margaret Mao	b2f62a8561	Add batch 4 perf test (#11355 ) * copy files to this branch * add tasks * comment one model * change the model to test the 4.36 * only test batch-4 * typo * typo * typo * typo * typo * typo * add 4.37-batch4 * change the file name * revet yaml file * no print * add batch4 task * revert --------- Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>	2024-06-20 09:48:52 +08:00
Shengsheng Huang	a721c1ae43	minor fix of ragflow_quickstart.md (#11364 )	2024-06-19 22:30:33 +08:00
Shengsheng Huang	13727635e8	revise ragflow quickstart (#11363 ) * revise ragflow quickstart * update titles and split the quickstart into sections * update	2024-06-19 22:24:31 +08:00
Zijie Li	5283df0078	LLM: Add RAGFlow with Ollama Example QuickStart (#11338 ) * Create ragflow.md * Update ragflow.md * Update ragflow_quickstart * Update ragflow_quickstart.md * Upload RAGFlow quickstart without images * Update ragflow_quickstart.md * Update ragflow_quickstart.md * Update ragflow_quickstart.md * Update ragflow_quickstart.md * fix typos in readme * Fix typos in quickstart readme	2024-06-19 20:00:50 +08:00
Zijie Li	ae452688c2	Add NPU HF example (#11358 )	2024-06-19 18:07:28 +08:00
Qiyuan Gong	1eb884a249	IPEX Duplicate importer V2 (#11310 ) * Add gguf support. * Avoid error when import ipex-llm for multiple times. * Add check to avoid duplicate replace and revert. * Add calling from check to avoid raising exceptions in the submodule. * Add BIGDL_CHECK_DUPLICATE_IMPORT for controlling duplicate checker. Default is true.	2024-06-19 16:29:19 +08:00
Jason Dai	271d82a4fc	Update readme (#11357 )	2024-06-19 10:05:42 +08:00
Yishuo Wang	ae7b662ed2	add fp16 NPU Linear support and fix intel_npu_acceleration_library version 1.0 support (#11352 )	2024-06-19 09:14:59 +08:00
Guoqiong Song	c44b1942ed	fix mistral for transformers>=4.39 (#11191 ) * fix mistral for transformers>=4.39	2024-06-18 13:39:35 -07:00
Heyang Sun	67a1e05876	Remove zero3 context manager from LoRA (#11346 )	2024-06-18 17:24:43 +08:00
Xiangyu Tian	f6cd628cd8	Fix script usage in vLLM CPU Quickstart (#11353 )	2024-06-18 16:50:48 +08:00
Xiangyu Tian	ef9f740801	Docs: Fix CPU Serving Docker README (#11351 ) Fix CPU Serving Docker README	2024-06-18 16:27:51 +08:00
Guancheng Fu	c9b4cadd81	fix vLLM/docker issues (#11348 ) * fix * fix * ffix	2024-06-18 16:23:53 +08:00
Yishuo Wang	83082e5cc7	add initial support for intel npu acceleration library (#11347 )	2024-06-18 16:07:16 +08:00
Shaojun Liu	694912698e	Upgrade scikit-learn to 1.5.0 to fix dependabot issue (#11349 )	2024-06-18 15:47:25 +08:00
hxsz1997	44f22cba70	add config and default value (#11344 ) * add config and default value * add config in taml * remove lookahead and max_matching_ngram_size in config * remove streaming and use_fp16_torch_dtype in test yaml * update task in readme * update commit of task	2024-06-18 15:28:57 +08:00
Shengsheng Huang	1f39bb84c7	update readthedocs perf data (#11345 )	2024-06-18 13:23:47 +08:00
Heyang Sun	00f322d8ee	Finetune ChatGLM with Deepspeed Zero3 LoRA (#11314 ) * Fintune ChatGLM with Deepspeed Zero3 LoRA * add deepspeed zero3 config * rename config * remove offload_param * add save_checkpoint parameter * Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh * refine	2024-06-18 12:31:26 +08:00
Yina Chen	5dad33e5af	Support fp8_e4m3 scale search (#11339 ) * fp8e4m3 switch off * fix style	2024-06-18 11:47:43 +08:00
binbin Deng	e50c890e1f	Support finishing PP inference once `eos_token_id` is found (#11336 )	2024-06-18 09:55:40 +08:00
Qiyuan Gong	de4bb97b4f	Remove accelerate 0.23.0 install command in readme and docker (#11333 ) *ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。	2024-06-17 17:52:12 +08:00
SONG Ge	ef4b6519fb	Add phi-3 model support for pipeline parallel inference (#11334 ) * add phi-3 model support * add phi3 example	2024-06-17 17:44:24 +08:00
hxsz1997	99b309928b	Add lookahead in test_api: transformer_int4_fp16_gpu (#11337 ) * add lookahead in test_api:transformer_int4_fp16_gpu * change the short prompt of summarize * change short prompt to cnn_64 * change short prompt of summarize	2024-06-17 17:41:41 +08:00
Jason Dai	bc4bafffc7	Update README.md (#11335 )	2024-06-17 16:24:23 +08:00
Qiyuan Gong	5d7c9bf901	Upgrade accelerate to 0.23.0 (#11331 ) * Upgrade accelerate to 0.23.0	2024-06-17 15:03:11 +08:00
Xin Qiu	183e0c6cf5	glm-4v-9b support (#11327 ) * chatglm4v support * fix style check * update glm4v	2024-06-17 13:52:37 +08:00
Wenjing Margaret Mao	bca5cbd96c	Modify arc nightly perf to fp16 (#11275 ) * change api * move to pr mode and remove the build * add batch4 yaml and remove the bigcode * remove batch4 * revert the starcode * remove the exclude * revert --------- Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>	2024-06-17 13:47:22 +08:00
Yuwen Hu	a2a5890b48	Make manually-triggered perf test able to choose which test to run (#11324 )	2024-06-17 10:23:13 +08:00
Yuwen Hu	1978f63f6b	Fix igpu performance guide regarding html generation (#11328 )	2024-06-17 10:21:30 +08:00
binbin Deng	6ea1e71af0	Update PP inference benchmark script (#11323 )	2024-06-17 09:59:36 +08:00
SONG Ge	be00380f1a	Fix pipeline parallel inference past_key_value error in Baichuan (#11318 ) * fix past_key_value error * add baichuan2 example * fix style * update doc * add script link in doc * fix import error * update	2024-06-17 09:29:32 +08:00
Yina Chen	0af0102e61	Add quantization scale search switch (#11326 ) * add scale_search switch * remove llama3 instruct * remove print	2024-06-14 18:46:52 +08:00
Ruonan Wang	8a3247ac71	support batch forward for q4_k, q6_k (#11325 )	2024-06-14 18:25:50 +08:00
Yishuo Wang	e8dd8e97ef	fix chatglm lookahead on ARC (#11320 )	2024-06-14 16:26:11 +08:00
Shaojun Liu	f5ef94046e	exclude dolly-v2-12b for arc perf test (#11315 ) * test arc perf * test * test * exclude dolly-v2-12b:2048 * revert changes	2024-06-14 15:35:56 +08:00
Shaojun Liu	77809be946	Install packages for ipex-llm-serving-cpu docker image (#11321 ) * apt-get install patch * Update Dockerfile * Update Dockerfile * revert	2024-06-14 15:26:01 +08:00
Xiangyu Tian	4359ab3172	LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example (#11187 ) Add /generate_stream and OpenAI-formatted endpoint for Pipeline-Parallel-FastAPI example	2024-06-14 15:15:32 +08:00
Yuwen Hu	9e4d87a696	Langchain-chatchat QuickStart small link fix (#11317 )	2024-06-14 14:02:17 +08:00
Jin Qiao	0e7a31a09c	ChatGLM Examples Restructure regarding Installation Steps (#11285 ) * merge install step in glm examples * fix section * fix section * fix tiktoken	2024-06-14 12:37:05 +08:00
Yishuo Wang	91965b5d05	add glm_sdpa back to fix chatglm-6b (#11313 )	2024-06-14 10:31:43 +08:00
Yishuo Wang	7f65836cb9	fix chatglm2/3-32k/128k fp16 (#11311 )	2024-06-14 09:58:07 +08:00
Xin Qiu	1b0c4c8cb8	use new rotary two in chatglm4 (#11312 ) * use new rotary two in chatglm4 * rempve	2024-06-13 19:02:18 +08:00

1 2 3 4 5 ...

3017 commits