ipex-llm

Author	SHA1	Message	Date
Yining Wang	a04a027b4c	Edit gpu doc (#9583 ) * harness: run llama2-7b * harness: run llama2-7b * harness: run llama2-7b * harness: run llama2-7b * edit-gpu-doc * fix some format problem * fix spelling problems * fix evaluation yml * delete redundant space * fix some problems * address comments * change link	2023-12-11 14:59:07 +08:00
ZehuaCao	45721f3473	verfiy llava (#9649 )	2023-12-11 14:26:05 +08:00
Heyang Sun	9f02f96160	[LLM] support for Yi AWQ model (#9648 )	2023-12-11 14:07:34 +08:00
Xin Qiu	82255f9726	Enable fused layernorm (#9614 ) * bloom layernorm * fix * layernorm * fix * fix * fix * style fix * fix * replace nn.LayerNorm	2023-12-11 09:26:13 +08:00
Jason Dai	84a19705a6	Update readme (#9617 )	2023-12-09 19:23:14 +08:00
Yuwen Hu	894d0aaf5e	[LLM] iGPU win perf test reorg based on in-out pairs (#9645 ) * trigger pr temparorily * Saparate benchmark run for win igpu based in in-out pairs * Rename fix * Test workflow * Small fix * Skip generation of html for now * Change back to nightly triggered	2023-12-08 20:46:40 +08:00
Chen, Zhentao	972cdb9992	gsm8k OOM workaround (#9597 ) * update bigdl_llm.py * update the installation of harness * fix partial function * import ipex * force seq len in decrease order * put func outside class * move comments * default 'trust_remote_code' as True * Update llm-harness-evaluation.yml	2023-12-08 18:47:25 +08:00
WeiguangHan	1ff4bc43a6	degrade pandas version (#9643 )	2023-12-08 17:44:51 +08:00
Yina Chen	70f5e7bf0d	Support peft LoraConfig (#9636 ) * support peft loraconfig * use testcase to test * fix style * meet comments	2023-12-08 16:13:03 +08:00
Xin Qiu	0b6f29a7fc	add fused rms norm for Yi and Qwen (#9640 )	2023-12-08 16:04:38 +08:00
Xin Qiu	5636b0ba80	set new linear status (#9639 )	2023-12-08 11:02:49 +08:00
binbin Deng	499100daf1	LLM: Add solution to fix `oneccl` related error (#9630 )	2023-12-08 10:51:55 +08:00
ZehuaCao	d204125e88	[LLM] Use to build a more slim docker for k8s (#9608 ) * Create Dockerfile.k8s * Update Dockerfile More slim standalone image * Update Dockerfile * Update Dockerfile.k8s * Update bigdl-qlora-finetuing-entrypoint.sh * Update qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py Refer to this [pr](https://github.com/intel-analytics/BigDL/pull/9551/files#diff-2025188afa54672d21236e6955c7c7f7686bec9239532e41c7983858cc9aaa89), update the LoraConfig * update * update * update * update * update * update * update * update transformer version * update Dockerfile * update Docker image name * fix error	2023-12-08 10:25:36 +08:00
ZehuaCao	6eca8a8bb5	update transformer version (#9631 )	2023-12-08 09:36:00 +08:00
WeiguangHan	e9299adb3b	LLM: Highlight some values in the html (#9635 ) * highlight some values in the html * revert the llm_performance_tests.yml	2023-12-07 19:02:41 +08:00
Yuwen Hu	6f34978b94	[LLM] Add more performance tests for win iGPU (more in-out pairs, RWKV model) (#9626 ) * Add supports for loading rwkv models using from_pretrained api * Temporarily enable pr tests * Add RWKV in tests and more in-out pairs * Add rwkv for 512 tests * Make iterations smaller * Change back to nightly trigger	2023-12-07 18:55:16 +08:00
Ruonan Wang	d9b0c01de3	LLM: fix unlora module in qlora finetune (#9621 ) * fix unlora module * split train and inference	2023-12-07 16:32:02 +08:00
Heyang Sun	3811cf43c9	[LLM] update AWQ documents (#9623 ) * [LLM] update AWQ and verified models' documents * refine * refine links * refine	2023-12-07 16:02:20 +08:00
Yishuo Wang	7319f2c227	use fused mlp in baichuan2 (#9620 )	2023-12-07 15:50:57 +08:00
Xiangyu Tian	deee65785c	[LLM] vLLM: Delete last_kv_cache before prefilling (#9619 ) Remove last_kv_cache before prefilling to reduce peak memory usage.	2023-12-07 11:32:33 +08:00
Yuwen Hu	48b85593b3	Update all-in-one benchmark readme (#9618 )	2023-12-07 10:32:09 +08:00
Xiangyu Tian	0327169b50	[LLM] vLLM: fix memory leak in prepare_kv_cache (#9616 ) Revert modification in prepare_kv_cache to fix memory leak.	2023-12-07 10:08:18 +08:00
Xin Qiu	13d47955a8	use fused rms norm in chatglm2 and baichuan (#9613 ) * use fused rms norm in chatglm2 and baichuan * style fix	2023-12-07 09:21:41 +08:00
Jason Dai	51b668f229	Update GGUF readme (#9611 )	2023-12-06 18:21:54 +08:00
dingbaorong	a7bc89b3a1	remove q4_1 in gguf example (#9610 ) * remove q4_1 * fixes	2023-12-06 16:00:05 +08:00
Yina Chen	404e101ded	QALora example (#9551 ) * Support qa-lora * init * update * update * update * update * update * update merge * update * fix style & update scripts * update * address comments * fix typo * fix typo --------- Co-authored-by: Yang Wang <yang3.wang@intel.com>	2023-12-06 15:36:21 +08:00
Guancheng Fu	6978b2c316	[VLLM] Change padding patterns for vLLM & clean code (#9609 ) * optimize * fix minor error * optimizations * fix style	2023-12-06 15:27:26 +08:00
dingbaorong	89069d6173	Add gpu gguf example (#9603 ) * add gpu gguf example * some fixes * address kai's comments * address json's comments	2023-12-06 15:17:54 +08:00
Yuwen Hu	0e8f4020e5	Add traceback error output for win igpu test api in benchmark (#9607 )	2023-12-06 14:35:16 +08:00
Ziteng Zhang	aeb77b2ab1	Add minimum Qwen model version (#9606 )	2023-12-06 11:49:14 +08:00
Yuwen Hu	c998f5f2ba	[LLM] iGPU long context tests (#9598 ) * Temp enable PR * Enable tests for 256-64 * Try again 128-64 * Empty cache after each iteration for igpu benchmark scripts * Try tests for 512 * change order for 512 * Skip chatglm3 and llama2 for now * Separate tests for 512-64 * Small fix * Further fixes * Change back to nightly again	2023-12-06 10:19:20 +08:00
Heyang Sun	4e70e33934	[LLM] code and document for distributed qlora (#9585 ) * [LLM] code and document for distributed qlora * doc * refine for gradient checkpoint * refine * Update alpaca_qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py * add link in doc	2023-12-06 09:23:17 +08:00
Zheng, Yi	d154b38bf9	Add llama2 gpu low memory example (#9514 ) * Add low memory example * Minor fixes * Update readme.md	2023-12-05 17:29:48 +08:00
Jason Dai	06febb5fa7	Update readme for FP8/FP4 inference examples (#9601 )	2023-12-05 15:59:03 +08:00
dingbaorong	a66fbedd7e	add gpu more data types example (#9592 ) * add gpu more data types example * add int8	2023-12-05 15:45:38 +08:00
Ziteng Zhang	65934c9f4f	[LLM] Fix Qwen causal_mask and attention_mask size mismatching (#9600 ) * Fix #9582 , caused by Qwen modified modeling_qwen.py `7f62181c94 (d2h-049182)`	2023-12-05 15:15:54 +08:00
Jinyi Wan	b721138132	Add cpu and gpu examples for BlueLM (#9589 ) * Add cpu int4 example for BlueLM * addexample optimize_model cpu for bluelm * add example gpu int4 blueLM * add example optimiza_model GPU for bluelm * Fixing naming issues and BigDL package version. * Fixing naming issues... * Add BlueLM in README.md "Verified Models"	2023-12-05 13:59:02 +08:00
Guancheng Fu	8b00653039	fix doc (#9599 )	2023-12-05 13:49:31 +08:00
Qiyuan Gong	f211f136b6	Configurable TORCH_LINEAR_THRESHOLD from env (#9588 ) * Add TORCH_LINEAR_THRESHOLD from env (BIGDL_LLM_LINEAR_THRESHOLD) * Change default to 512	2023-12-05 13:19:47 +08:00
Yuwen Hu	1012507a40	[LLM] Fix performance tests (#9596 ) * Fix missing key for cpu_embedding * Remove 512 as it stuck for now * Small fix	2023-12-05 10:59:28 +08:00
Chen, Zhentao	8c8a27ded7	Add harness summary job (#9457 ) * format yml * add make_table_results * add summary job * add a job to print single result * upload full directory	2023-12-05 10:04:10 +08:00
Yuwen Hu	3f4ad97929	[LLM] Add performance tests for windows iGPU (#9584 ) * Add support for win gpu benchmark with peak gpu memory monitoring * Add win igpu tests * Small fix * Forward outputs * Small fix * Test and small fixes * Small fix * Small fix and test * Small fixes * Add tests for 512-64 and change back to nightly tests * Small fix	2023-12-04 20:50:02 +08:00
Chen, Zhentao	29d5bb8df4	Harness workflow dispatch (#9591 ) * add set-matrix job * add workflow_dispatch * fix context * fix manual run * rename step * add quotes * add runner option * not required labels * add runner label to output * use double quote	2023-12-04 15:53:29 +08:00
Chen, Zhentao	9557aa9c21	Fix harness nightly (#9586 ) * update golden * loose the restriction of diff * only compare results when scheduled	2023-12-04 11:45:00 +08:00
Xiangyu Tian	5c03651309	[LLM] vLLM: Add Preempt for scheduler (#9568 ) Implement Preempt_by_recompute method for vllm.	2023-12-03 20:16:25 +08:00
Kai Huang	f7e596d85a	Update doc (#9580 ) * update aiohttp in docs * update doc	2023-12-01 15:40:37 +08:00
Chen, Zhentao	5de92090b3	try to fix deps installation of bigdl (#9578 )	2023-12-01 15:25:47 +08:00
Chen, Zhentao	cb228c70ea	Add harness nightly (#9552 ) * modify output_path as a directory * schedule nightly at 21 on Friday * add tasks and models for nightly * add accuracy regression * comment out if to test * mixed fp4 * for test * add missing delimiter * remove comma * fixed golden results * add mixed 4 golden result * add more options * add mistral results * get golden result of stable lm * move nightly scripts and results to test folder * add license * add fp8 stable lm golden * run on all available devices * trigger only when ready for review * fix new line * update golden * add mistral	2023-12-01 14:16:35 +08:00
Chen, Zhentao	4d7d5d4c59	Add 3 leaderboard tasks (#9566 ) * update leaderboard map * download model and dataset without overwritten * fix task drop * run on all available devices	2023-12-01 14:01:14 +08:00
Heyang Sun	74fd7077a2	[LLM] Multi-process and distributed QLoRA on CPU platform (#9491 ) * [LLM] Multi-process and distributed QLoRA on CPU platform * Update README.md * Update README.md * Update README.md * Update README.md * enable llm-init and bind to socket * refine * Update Dockerfile * add all files of qlora cpu example to /bigdl * fix * fix k8s * Update bigdl-qlora-finetuing-entrypoint.sh * Update bigdl-qlora-finetuing-entrypoint.sh * Update bigdl-qlora-finetuning-job.yaml * fix train sync and performance issues * add node affinity * disable user to tune cpu per pod * Update bigdl-qlora-finetuning-job.yaml	2023-12-01 13:47:19 +08:00

1 2 3 4 5 ...

1801 commits