ipex-llm

Author	SHA1	Message	Date
dingbaorong	89069d6173	Add gpu gguf example (#9603 ) * add gpu gguf example * some fixes * address kai's comments * address json's comments	2023-12-06 15:17:54 +08:00
Yuwen Hu	0e8f4020e5	Add traceback error output for win igpu test api in benchmark (#9607 )	2023-12-06 14:35:16 +08:00
Ziteng Zhang	aeb77b2ab1	Add minimum Qwen model version (#9606 )	2023-12-06 11:49:14 +08:00
Yuwen Hu	c998f5f2ba	[LLM] iGPU long context tests (#9598 ) * Temp enable PR * Enable tests for 256-64 * Try again 128-64 * Empty cache after each iteration for igpu benchmark scripts * Try tests for 512 * change order for 512 * Skip chatglm3 and llama2 for now * Separate tests for 512-64 * Small fix * Further fixes * Change back to nightly again	2023-12-06 10:19:20 +08:00
Heyang Sun	4e70e33934	[LLM] code and document for distributed qlora (#9585 ) * [LLM] code and document for distributed qlora * doc * refine for gradient checkpoint * refine * Update alpaca_qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py * add link in doc	2023-12-06 09:23:17 +08:00
Zheng, Yi	d154b38bf9	Add llama2 gpu low memory example (#9514 ) * Add low memory example * Minor fixes * Update readme.md	2023-12-05 17:29:48 +08:00
Jason Dai	06febb5fa7	Update readme for FP8/FP4 inference examples (#9601 )	2023-12-05 15:59:03 +08:00
dingbaorong	a66fbedd7e	add gpu more data types example (#9592 ) * add gpu more data types example * add int8	2023-12-05 15:45:38 +08:00
Ziteng Zhang	65934c9f4f	[LLM] Fix Qwen causal_mask and attention_mask size mismatching (#9600 ) * Fix #9582 , caused by Qwen modified modeling_qwen.py `7f62181c94 (d2h-049182)`	2023-12-05 15:15:54 +08:00
Jinyi Wan	b721138132	Add cpu and gpu examples for BlueLM (#9589 ) * Add cpu int4 example for BlueLM * addexample optimize_model cpu for bluelm * add example gpu int4 blueLM * add example optimiza_model GPU for bluelm * Fixing naming issues and BigDL package version. * Fixing naming issues... * Add BlueLM in README.md "Verified Models"	2023-12-05 13:59:02 +08:00
Guancheng Fu	8b00653039	fix doc (#9599 )	2023-12-05 13:49:31 +08:00
Qiyuan Gong	f211f136b6	Configurable TORCH_LINEAR_THRESHOLD from env (#9588 ) * Add TORCH_LINEAR_THRESHOLD from env (BIGDL_LLM_LINEAR_THRESHOLD) * Change default to 512	2023-12-05 13:19:47 +08:00
Yuwen Hu	1012507a40	[LLM] Fix performance tests (#9596 ) * Fix missing key for cpu_embedding * Remove 512 as it stuck for now * Small fix	2023-12-05 10:59:28 +08:00
Chen, Zhentao	8c8a27ded7	Add harness summary job (#9457 ) * format yml * add make_table_results * add summary job * add a job to print single result * upload full directory	2023-12-05 10:04:10 +08:00
Yuwen Hu	3f4ad97929	[LLM] Add performance tests for windows iGPU (#9584 ) * Add support for win gpu benchmark with peak gpu memory monitoring * Add win igpu tests * Small fix * Forward outputs * Small fix * Test and small fixes * Small fix * Small fix and test * Small fixes * Add tests for 512-64 and change back to nightly tests * Small fix	2023-12-04 20:50:02 +08:00
Chen, Zhentao	29d5bb8df4	Harness workflow dispatch (#9591 ) * add set-matrix job * add workflow_dispatch * fix context * fix manual run * rename step * add quotes * add runner option * not required labels * add runner label to output * use double quote	2023-12-04 15:53:29 +08:00
Chen, Zhentao	9557aa9c21	Fix harness nightly (#9586 ) * update golden * loose the restriction of diff * only compare results when scheduled	2023-12-04 11:45:00 +08:00
Xiangyu Tian	5c03651309	[LLM] vLLM: Add Preempt for scheduler (#9568 ) Implement Preempt_by_recompute method for vllm.	2023-12-03 20:16:25 +08:00
Kai Huang	f7e596d85a	Update doc (#9580 ) * update aiohttp in docs * update doc	2023-12-01 15:40:37 +08:00
Chen, Zhentao	5de92090b3	try to fix deps installation of bigdl (#9578 )	2023-12-01 15:25:47 +08:00
Chen, Zhentao	cb228c70ea	Add harness nightly (#9552 ) * modify output_path as a directory * schedule nightly at 21 on Friday * add tasks and models for nightly * add accuracy regression * comment out if to test * mixed fp4 * for test * add missing delimiter * remove comma * fixed golden results * add mixed 4 golden result * add more options * add mistral results * get golden result of stable lm * move nightly scripts and results to test folder * add license * add fp8 stable lm golden * run on all available devices * trigger only when ready for review * fix new line * update golden * add mistral	2023-12-01 14:16:35 +08:00
Chen, Zhentao	4d7d5d4c59	Add 3 leaderboard tasks (#9566 ) * update leaderboard map * download model and dataset without overwritten * fix task drop * run on all available devices	2023-12-01 14:01:14 +08:00
Heyang Sun	74fd7077a2	[LLM] Multi-process and distributed QLoRA on CPU platform (#9491 ) * [LLM] Multi-process and distributed QLoRA on CPU platform * Update README.md * Update README.md * Update README.md * Update README.md * enable llm-init and bind to socket * refine * Update Dockerfile * add all files of qlora cpu example to /bigdl * fix * fix k8s * Update bigdl-qlora-finetuing-entrypoint.sh * Update bigdl-qlora-finetuing-entrypoint.sh * Update bigdl-qlora-finetuning-job.yaml * fix train sync and performance issues * add node affinity * disable user to tune cpu per pod * Update bigdl-qlora-finetuning-job.yaml	2023-12-01 13:47:19 +08:00
Wang, Jian4	ed0dc57c6e	LLM: Add cpu qlora support other models guide (#9567 ) * use bf16 flag * add using baichuan model * update merge * remove * update	2023-12-01 11:18:04 +08:00
Jason Dai	bda404fc8f	Update readme (#9575 )	2023-11-30 22:45:52 +08:00
Xin Qiu	69c49d21f5	use fused rms norm (#9572 ) * use fused rms norm * meet code review	2023-11-30 21:47:41 +08:00
Lilac09	b785376f5c	Add vllm-example to docker inference image (#9570 ) * add vllm-serving to cpu image * add vllm-serving to cpu image * add vllm-serving	2023-11-30 17:04:53 +08:00
Yishuo Wang	66f5b45f57	[LLM] add a llama2 gguf example (#9553 )	2023-11-30 16:37:17 +08:00
Yishuo Wang	7f6465518a	support loading llama tokenizer from gguf model (#9565 )	2023-11-30 14:56:12 +08:00
Lilac09	2554ba0913	Add usage of vllm (#9564 ) * add usage of vllm * add usage of vllm * add usage of vllm * add usage of vllm * add usage of vllm * add usage of vllm	2023-11-30 14:19:23 +08:00
Wang, Jian4	a0a80d232e	LLM: Add qlora cpu distributed readme (#9561 ) * init readme * add distributed guide * update	2023-11-30 13:42:30 +08:00
Chen, Zhentao	c8e0c2ed48	Fixed dumped logs in harness (#9549 ) * install transformers==4.34.0 * modify output_path as a directory * add device and task to output dir parents	2023-11-30 12:47:56 +08:00
Qiyuan Gong	d85a430a8c	Uing bigdl-llm-init instead of bigdl-nano-init (#9558 ) * Replace `bigdl-nano-init` with `bigdl-llm-init`. * Install `bigdl-llm` instead of `bigdl-nano`. * Remove nano in README.	2023-11-30 10:10:29 +08:00
Yuwen Hu	34503efa6a	Fix cpu pinned embedding (#9556 )	2023-11-29 18:27:56 +08:00
Lilac09	557bb6bbdb	add judgement for running serve (#9555 )	2023-11-29 16:57:00 +08:00
binbin Deng	4ff2ca9d0d	LLM: fix loss error on Arc (#9550 )	2023-11-29 15:16:18 +08:00
Yishuo Wang	65121c7997	support loading q4_1/q5_0/q5_1/q8_0 gguf model (#9546 )	2023-11-29 14:40:37 +08:00
Wang, Jian4	b824754256	LLM: Update for cpu qlora mpirun (#9548 )	2023-11-29 10:56:17 +08:00
Yuwen Hu	5f5ca38b74	[LLM Doc] Fix api doc rendering error (#9542 ) * Fix api rendering error * Fix python style	2023-11-29 09:17:09 +08:00
Yishuo Wang	a86c6e0b56	[LLM] support loading gguf model (#9544 )	2023-11-28 15:51:15 +08:00
Xin Qiu	32b37f3af7	Update gpu install.md (#9541 ) * Update install_gpu.md * Update install_gpu.md	2023-11-28 11:15:03 +08:00
Xiangyu Tian	916c338772	fix bugs in vllm length check (#9543 )	2023-11-28 11:09:54 +08:00
WeiguangHan	5098bc3544	LLM: enable previous models (#9505 ) * enable previous models * test mistral model * for test * run models separately * test all models * for test * revert the llm_performance_test.yaml	2023-11-28 10:21:07 +08:00
Zhao Changmin	e7e0cd3b5e	CPU Pinned embedding Layer (#9538 ) * CPU Pinned embedding	2023-11-28 09:46:31 +08:00
Guancheng Fu	963a5c8d79	Add vLLM-XPU version's README/examples (#9536 ) * test * test * fix last kv cache * add xpu readme * remove numactl for xpu example * fix link error * update max_num_batched_tokens logic * add explaination * add xpu environement version requirement * refine gpu memory * fix * fix style	2023-11-28 09:44:03 +08:00
Guancheng Fu	b6c3520748	Remove xformers from vLLM-CPU (#9535 )	2023-11-27 11:21:25 +08:00
binbin Deng	2b9c7d2a59	LLM: quick fix alpaca qlora finetuning script (#9534 )	2023-11-27 11:04:27 +08:00
Yuwen Hu	11fa3de290	Add sutup support of win gpu for bigdl-llm (#9512 )	2023-11-24 17:49:21 +08:00
Chen, Zhentao	45820cf3b9	add optimize model option (#9530 )	2023-11-24 17:10:49 +08:00
binbin Deng	6bec0faea5	LLM: support Mistral AWQ models (#9520 )	2023-11-24 16:20:22 +08:00

1 2 3 4 5 ...

1824 commits