ipex-llm

Author	SHA1	Message	Date
Wang, Jian4	a54cd767b1	LLM: Add gguf falcon (#9801 ) * init falcon * update convert.py * update style	2024-01-03 14:49:02 +08:00
Yuwen Hu	668c2095b1	Remove unnecessary warning when installing llm (#9815 )	2024-01-03 10:30:05 +08:00
dingbaorong	f5752ead36	Add whisper test (#9808 ) * add whisper benchmark code * add librispeech_asr.py * add bigdl license	2024-01-02 16:36:05 +08:00
binbin Deng	6584539c91	LLM: fix installation of codellama (#9813 )	2024-01-02 14:32:50 +08:00
Kai Huang	4d01069302	Temp remove baichuan2-13b 1k from arc perf test (#9810 )	2023-12-29 12:54:13 +08:00
dingbaorong	a2e668a61d	fix arc ut test (#9736 )	2023-12-28 16:55:34 +08:00
Qiyuan Gong	f0f9d45eac	[LLM] IPEX import support bigdl-core-xe-21 (#9769 ) Add support for bigdl-core-xe-21.	2023-12-28 15:23:58 +08:00
dingbaorong	a8baf68865	fix csv_to_html (#9802 )	2023-12-28 14:58:51 +08:00
Guancheng Fu	5857a38321	[vLLM] Add option to adjust KV_CACHE_ALLOC_BLOCK_LENGTH (#9782 ) * add option kv_cache_block * change var name	2023-12-28 14:41:47 +08:00
Ruonan Wang	99bddd3ab4	LLM: better FP16 support for Intel GPUs (#9791 ) * initial support * fix * fix style * fix * limi esimd usage condition * refactor code * fix style * small fix * meet code review * small fix	2023-12-28 13:30:13 +08:00
Yishuo Wang	7d9f6c6efc	fix cpuinfo error (#9793 )	2023-12-28 09:23:44 +08:00
Wang, Jian4	7ed9538b9f	LLM: support gguf mpt (#9773 ) * add gguf mpt * update	2023-12-28 09:22:39 +08:00
Cengguang Zhang	d299f108d0	update falcon attention forward. (#9796 )	2023-12-28 09:11:59 +08:00
Shaojun Liu	a5e5c3daec	set warm_up: 3 num_trials: 50 for cpu stress test (#9799 )	2023-12-28 08:55:43 +08:00
dingbaorong	f6bb4ab313	Arc stress test (#9795 ) * add arc stress test * triger ci * triger CI * triger ci * disable ci	2023-12-27 21:02:41 +08:00
Kai Huang	40eaf76ae3	Add baichuan2-13b to Arc perf (#9794 ) * add baichuan2-13b * fix indent * revert	2023-12-27 19:38:53 +08:00
Shaojun Liu	6c75c689ea	bigdl-llm stress test for stable version (#9781 ) * 1k-512 2k-512 baseline * add cpu stress test * update yaml name * update * update * clean up * test * update * update * update * test * update	2023-12-27 15:40:53 +08:00
dingbaorong	5cfb4c4f5b	Arc stable version performance regression test (#9785 ) * add arc stable version regression test * empty gpu mem between different models * triger ci * comment spr test * triger ci * address kai's comments and disable ci * merge fp8 and int4 * disable ci	2023-12-27 11:01:56 +08:00
binbin Deng	40edb7b5d7	LLM: fix get environment variables setting (#9787 )	2023-12-27 09:11:37 +08:00
Kai Huang	689889482c	Reduce max_cache_pos to reduce Baichuan2-13B memory (#9694 ) * optimize baichuan2 memory * fix * style * fp16 mask * disable fp16 * fix style * empty cache * revert empty cache	2023-12-26 19:51:25 +08:00
Jason Dai	361781bcd0	Update readme (#9788 )	2023-12-26 19:46:11 +08:00
Yuwen Hu	c38e18f2ff	[LLM] Migrate iGPU perf tests to new machine (#9784 ) * Move 1024 test just after 32-32 test; and enable all model for 1024-128 * Make sure python output encoding in utf-8 so that redirect to txt can always be success * Upload results to ftp * Small fix	2023-12-26 19:15:57 +08:00
WeiguangHan	c05d7e1532	LLM: add star_corder_15.5b model (#9772 ) * LLM: add star_corder_15.5b model * revert llm_performance_tests.yml	2023-12-26 18:55:56 +08:00
Ziteng Zhang	44b4a0c9c5	[LLM] Correct prompt format of Yi, Llama2 and Qwen in generate.py (#9786 ) * correct prompt format of Yi * correct prompt format of llama2 in cpu generate.py * correct prompt format of Qwen in GPU example	2023-12-26 16:57:55 +08:00
Xiangyu Tian	0ea842231e	[LLM] vLLM: Add api_server entrypoint (#9783 ) Add vllm.entrypoints.api_server for benchmark_serving.py in vllm.	2023-12-26 16:03:57 +08:00
dingbaorong	64d05e581c	add peak gpu mem stats in transformer_int4_gpu (#9766 ) * add peak gpu mem stats in transformer_int4_gpu * address weiguang's comments	2023-12-26 15:38:28 +08:00
Ziteng Zhang	87b4100054	[LLM] Support Yi model in chat.py (#9778 ) * Suppot Yi model * code style& add reference link	2023-12-26 10:03:39 +08:00
Ruonan Wang	11d883301b	LLM: fix wrong batch output caused by flash attention (#9780 ) * fix * meet code review * move batch size check to the beginning * move qlen check inside function * meet code review	2023-12-26 09:41:27 +08:00
Heyang Sun	66e286a73d	Support for Mixtral AWQ (#9775 ) * Support for Mixtral AWQ * Update README.md * Update README.md * Update awq_config.py * Update README.md * Update README.md	2023-12-25 16:08:09 +08:00
Ruonan Wang	1917bbe626	LLM: fix `BF16Linear` related training & inference issue (#9755 ) * fix bf16 related issue * fix * update based on comment & add arc lora script * update readme * update based on comment * update based on comment * update * force to bf16 * fix style * move check input dtype into function * update convert * meet code review * meet code review * update merged model to support new training_mode api * fix typo	2023-12-25 14:49:30 +08:00
Xiangyu Tian	30dab36f76	[LLM] vLLM: Fix kv cache init (#9771 ) Fix kv cache init	2023-12-25 14:17:06 +08:00
Yina Chen	449b387125	Support relora in bigdl-llm (#9687 ) * init * fix style * update * support resume & update readme * update * update * remove important * add training mode * meet comments	2023-12-25 14:04:28 +08:00
Shaojun Liu	b6222404b8	bigdl-llm stable version: let the perf test fail if the difference between perf and baseline is greater than 5% (#9750 ) * test * test * test * update * revert	2023-12-25 13:47:11 +08:00
Ziteng Zhang	986f65cea9	[LLM] Add trust_remote_code for local renamed model in bigdl_llm_model.py (#9762 )	2023-12-25 11:31:14 +08:00
Yishuo Wang	be13b162fe	add codeshell example (#9743 )	2023-12-25 10:54:01 +08:00
Guancheng Fu	daf536fb2d	vLLM: Apply attention optimizations for selective batching (#9758 ) * fuse_rope for prefil * apply kv_cache optimizations * apply fast_decoding_path * Re-enable kv_cache optimizations for prefill * reduce KV_CACHE_ALLOC_BLOCK for selective_batching	2023-12-25 10:29:31 +08:00
binbin Deng	ed8ed76d4f	LLM: update deepspeed autotp usage (#9733 )	2023-12-25 09:41:14 +08:00
Yuwen Hu	02436c6cce	[LLM] Enable more long context in-out pairs for iGPU perf tests (#9765 ) * Add test for 1024-128 and enable more tests for 512-64 * Fix date in results csv name to the time when the performance is triggered * Small fix * Small fix * further fixes	2023-12-22 18:18:23 +08:00
Chen, Zhentao	7fd7c37e1b	Enable fp8e5 harness (#9761 ) * fix precision format like fp8e5 * match fp8_e5m2	2023-12-22 16:59:48 +08:00
Qiyuan Gong	4c487313f2	Revert "[LLM] IPEX auto importer turn on by default for XPU (#9730 )" (#9759 ) This reverts commit `0284801fbd`.	2023-12-22 16:38:24 +08:00
Qiyuan Gong	0284801fbd	[LLM] IPEX auto importer turn on by default for XPU (#9730 ) * Set BIGDL_IMPORT_IPEX default to true, i.e., auto import IPEX for XPU. * Remove import intel_extension_for_pytorch as ipex from GPU example. * Add support for bigdl-core-xe-21.	2023-12-22 16:20:32 +08:00
Chen, Zhentao	86a69e289c	fix harness runner label of manual trigger (#9754 ) * fix runner * update golden	2023-12-22 15:09:22 +08:00
Guancheng Fu	fdf93c9267	Implement selective batching for vLLM (#9659 ) * add control to load hf model * finish initial version of selective_batching * temp * finish * Remove print statement * fix error * Apply yang's optimization * a version that works * We need to check kv_cache passed in, this could be an error. TODO: add fast decoding path * format * temp solution: not batching prefill requests * a version that works for prefill batching * format * a solid version: works normally * a temp version * Solid version: remove redundant functions * fix format * format * solid: add option to enable selective_batching * remove logic for using transformer models * format * format * solid: enable argument VLLM_ENABLE_SELECTIVE_BATCHING * format * finish * format	2023-12-22 13:45:46 +08:00
Ruonan Wang	2f36769208	LLM: bigdl-llm lora support & lora example (#9740 ) * lora support and single card example * support multi-card, refactor code * fix model id and style * remove torch patch, add two new class for bf16, update example * fix style * change to training_mode * small fix * add more info in help * fixstyle, update readme * fix ut * fix ut * Handling compatibility issues with default LoraConfig	2023-12-22 11:05:39 +08:00
SONG Ge	ba0b939579	[LLM] Support transformers-v4.36.0 on mistral model (#9744 ) * add support transformers-v4.36.0 on mistral model * python/llm/src/bigdl/llm/transformers/models/mistral.py * make the redundant implementation as utils * fix code style * fix * fix style * update with utils enough_kv_room	2023-12-22 09:59:27 +08:00
Xin Qiu	e36111e713	mixstral fused qkv and rope (#9724 ) * mixstral fused qkv and rope * fix and clean * fix style * update * update * fix * update * fix	2023-12-22 09:26:35 +08:00
Jiao Wang	e4f6e43675	safetenor to false (#9728 )	2023-12-21 14:41:51 -08:00
Shaojun Liu	bb52239e0a	bigdl-llm stable version release & test (#9732 ) * stable version test * trigger spr test * update * trigger * test * test * test * test * test * refine * release linux first	2023-12-21 22:55:33 +08:00
WeiguangHan	d4d2ccdd9d	LLM: remove startcorder-15.5b (#9748 )	2023-12-21 18:52:52 +08:00
WeiguangHan	474c099559	LLM: using separate threads to do inference (#9727 ) * using separate threads to do inference * resolve some comments * resolve some comments * revert llm_performance_tests.yml file	2023-12-21 17:56:43 +08:00

1 2 3 4 5 ...

690 commits