ipex-llm

Author	SHA1	Message	Date
Ruonan Wang	1917bbe626	LLM: fix `BF16Linear` related training & inference issue (#9755 ) * fix bf16 related issue * fix * update based on comment & add arc lora script * update readme * update based on comment * update based on comment * update * force to bf16 * fix style * move check input dtype into function * update convert * meet code review * meet code review * update merged model to support new training_mode api * fix typo	2023-12-25 14:49:30 +08:00
Xiangyu Tian	30dab36f76	[LLM] vLLM: Fix kv cache init (#9771 ) Fix kv cache init	2023-12-25 14:17:06 +08:00
Yina Chen	449b387125	Support relora in bigdl-llm (#9687 ) * init * fix style * update * support resume & update readme * update * update * remove important * add training mode * meet comments	2023-12-25 14:04:28 +08:00
Shaojun Liu	b6222404b8	bigdl-llm stable version: let the perf test fail if the difference between perf and baseline is greater than 5% (#9750 ) * test * test * test * update * revert	2023-12-25 13:47:11 +08:00
Ziteng Zhang	986f65cea9	[LLM] Add trust_remote_code for local renamed model in bigdl_llm_model.py (#9762 )	2023-12-25 11:31:14 +08:00
Yishuo Wang	be13b162fe	add codeshell example (#9743 )	2023-12-25 10:54:01 +08:00
Guancheng Fu	daf536fb2d	vLLM: Apply attention optimizations for selective batching (#9758 ) * fuse_rope for prefil * apply kv_cache optimizations * apply fast_decoding_path * Re-enable kv_cache optimizations for prefill * reduce KV_CACHE_ALLOC_BLOCK for selective_batching	2023-12-25 10:29:31 +08:00
binbin Deng	ed8ed76d4f	LLM: update deepspeed autotp usage (#9733 )	2023-12-25 09:41:14 +08:00
Chen, Zhentao	4a98bfa5ae	fix harness manual run env typo (#9763 )	2023-12-22 18:42:35 +08:00
Yuwen Hu	02436c6cce	[LLM] Enable more long context in-out pairs for iGPU perf tests (#9765 ) * Add test for 1024-128 and enable more tests for 512-64 * Fix date in results csv name to the time when the performance is triggered * Small fix * Small fix * further fixes	2023-12-22 18:18:23 +08:00
Chen, Zhentao	7fd7c37e1b	Enable fp8e5 harness (#9761 ) * fix precision format like fp8e5 * match fp8_e5m2	2023-12-22 16:59:48 +08:00
Qiyuan Gong	4c487313f2	Revert "[LLM] IPEX auto importer turn on by default for XPU (#9730 )" (#9759 ) This reverts commit `0284801fbd`.	2023-12-22 16:38:24 +08:00
Qiyuan Gong	0284801fbd	[LLM] IPEX auto importer turn on by default for XPU (#9730 ) * Set BIGDL_IMPORT_IPEX default to true, i.e., auto import IPEX for XPU. * Remove import intel_extension_for_pytorch as ipex from GPU example. * Add support for bigdl-core-xe-21.	2023-12-22 16:20:32 +08:00
Xin Qiu	95c03765cb	Update install_gpu.md, remove kernel 5.19 requirement. (#9757 ) * Update install_gpu.md * Update install_gpu.md * Update install_gpu.md * Update install_gpu.md * Update install_gpu.md * Update install_gpu.md * Update install_gpu.md * Update install_gpu.md * Update install_gpu.md * Small fixes --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2023-12-22 16:17:56 +08:00
Yuwen Hu	1c8c104bb8	[LLM] Small fixes for igpu win perf tests (#9756 )	2023-12-22 15:51:03 +08:00
Chen, Zhentao	86a69e289c	fix harness runner label of manual trigger (#9754 ) * fix runner * update golden	2023-12-22 15:09:22 +08:00
WeiguangHan	2d1bf20309	LLM: small fix llm_performance_tests.html (#9753 ) * LLM: small fix llm_performance_tests.html * reslove some comments * revert the llm_performance_test.yaml	2023-12-22 13:55:01 +08:00
Guancheng Fu	fdf93c9267	Implement selective batching for vLLM (#9659 ) * add control to load hf model * finish initial version of selective_batching * temp * finish * Remove print statement * fix error * Apply yang's optimization * a version that works * We need to check kv_cache passed in, this could be an error. TODO: add fast decoding path * format * temp solution: not batching prefill requests * a version that works for prefill batching * format * a solid version: works normally * a temp version * Solid version: remove redundant functions * fix format * format * solid: add option to enable selective_batching * remove logic for using transformer models * format * format * solid: enable argument VLLM_ENABLE_SELECTIVE_BATCHING * format * finish * format	2023-12-22 13:45:46 +08:00
Ruonan Wang	2f36769208	LLM: bigdl-llm lora support & lora example (#9740 ) * lora support and single card example * support multi-card, refactor code * fix model id and style * remove torch patch, add two new class for bf16, update example * fix style * change to training_mode * small fix * add more info in help * fixstyle, update readme * fix ut * fix ut * Handling compatibility issues with default LoraConfig	2023-12-22 11:05:39 +08:00
SONG Ge	ba0b939579	[LLM] Support transformers-v4.36.0 on mistral model (#9744 ) * add support transformers-v4.36.0 on mistral model * python/llm/src/bigdl/llm/transformers/models/mistral.py * make the redundant implementation as utils * fix code style * fix * fix style * update with utils enough_kv_room	2023-12-22 09:59:27 +08:00
Xin Qiu	e36111e713	mixstral fused qkv and rope (#9724 ) * mixstral fused qkv and rope * fix and clean * fix style * update * update * fix * update * fix	2023-12-22 09:26:35 +08:00
Jiao Wang	e4f6e43675	safetenor to false (#9728 )	2023-12-21 14:41:51 -08:00
Shaojun Liu	bb52239e0a	bigdl-llm stable version release & test (#9732 ) * stable version test * trigger spr test * update * trigger * test * test * test * test * test * refine * release linux first	2023-12-21 22:55:33 +08:00
WeiguangHan	d4d2ccdd9d	LLM: remove startcorder-15.5b (#9748 )	2023-12-21 18:52:52 +08:00
WeiguangHan	474c099559	LLM: using separate threads to do inference (#9727 ) * using separate threads to do inference * resolve some comments * resolve some comments * revert llm_performance_tests.yml file	2023-12-21 17:56:43 +08:00
Yishuo Wang	426660b88e	simplify qwen attention (#9747 )	2023-12-21 17:53:29 +08:00
Wang, Jian4	984697afe2	LLM: Add bloom gguf support (#9734 ) * init * update bloom add merges * update * update readme * update for llama error * update	2023-12-21 14:06:25 +08:00
Heyang Sun	df775cf316	fix python style (#9742 ) * fix python style * fix * fix	2023-12-21 11:25:05 +08:00
Chen, Zhentao	b06a3146c8	Fix 70b oom (#9738 ) * add default value to bigdl llm * fix model oom	2023-12-21 10:40:52 +08:00
Xin Qiu	6c3e698bf1	mistral decoding_fast_path and fused mlp (#9714 ) * mistral decoding_fast_path and fused mlp * meet code review	2023-12-21 10:11:37 +08:00
Heyang Sun	d157f623b6	Load Mixtral gguf in a block-wise way (#9725 ) * Load Mixtral gguf in a block-wise way * refine	2023-12-21 10:03:23 +08:00
WeiguangHan	34bb804189	LLM: check csv and its corresponding yaml file (#9702 ) * LLM: check csv and its corresponding yaml file * run PR arc perf test * modify the name of some variables * execute the check results script in right place * use cp to replace mv command * resolve some comments * resolve more comments * revert the llm_performance_test.yaml file	2023-12-21 09:54:33 +08:00
Zhao Changmin	4bda975a3e	LLM: Align lowbit model config (#9735 ) * align lowbit model config	2023-12-21 09:48:58 +08:00
Wang, Jian4	e1e921f425	LLM: gguf other model using dtype (#9729 )	2023-12-21 09:33:40 +08:00
Yishuo Wang	13ea6330bd	optimize qwen rope (#9737 )	2023-12-20 17:34:34 +08:00
Ziteng Zhang	4c032a433e	[LLM] Add glibc checker (#9624 ) * Add glibc checker * Add env BIGDL_GLIBC_CHECK to control glibc checker. The default is false, i.e., don't check.	2023-12-20 16:52:43 +08:00
Yina Chen	cd652a1710	Support fp8 e5m2 on arc (#9711 ) * init * fix style * update * fix style * update	2023-12-20 16:26:17 +08:00
Yishuo Wang	e54c428d30	add bf16/fp16 fuse mlp support (#9726 )	2023-12-20 10:40:45 +08:00
Heyang Sun	612651cb5d	fix typo (#9723 )	2023-12-20 09:41:59 +08:00
WeiguangHan	3aa8b66bc3	LLM: remove starcoder-15.5b model temporarily (#9720 )	2023-12-19 20:14:46 +08:00
Yishuo Wang	522cf5ed82	[LLM] Improve chatglm2/3 rest token performance with long context (#9716 )	2023-12-19 17:29:38 +08:00
Yishuo Wang	f2e6abb563	fix mlp batch size check (#9718 )	2023-12-19 14:22:22 +08:00
Heyang Sun	1fa7793fc0	Load Mixtral GGUF Model (#9690 ) * Load Mixtral GGUF Model * refactor * fix empty tensor when to cpu * update gpu and cpu readmes * add dtype when set tensor into module	2023-12-19 13:54:38 +08:00
Qiyuan Gong	d0a3095b97	[LLM] IPEX auto importer (#9706 ) * IPEX auto importer and get_ipex_version. * Add BIGDL_IMPORT_IPEX to control auto import, default is false.	2023-12-19 13:39:38 +08:00
Yang Wang	f4fb58d99c	fusing qkv project and rope (#9612 ) * Try fusing qkv project and rope * add fused mlp * fuse append cache * fix style and clean up code * clean up	2023-12-18 16:45:00 -08:00
Kai Huang	4c112ee70c	Rename qwen in model name for arc perf test (#9712 )	2023-12-18 20:34:31 +08:00
Cengguang Zhang	4d22add4af	LLM: fix qwen efficiency issue in perf-test.	2023-12-18 18:32:54 +08:00
Ruonan Wang	8ed89557e5	LLM: add mlp optimization of mixtral (#9709 )	2023-12-18 16:59:52 +08:00
Chen, Zhentao	b3647507c0	Fix harness workflow (#9704 ) * error when larger than 0.001 * fix env setup * fix typo * fix typo	2023-12-18 15:42:10 +08:00
binbin Deng	12df70953e	LLM: add resume_from_checkpoint related section (#9705 )	2023-12-18 12:27:02 +08:00

1 2 3 4 5 ...

1895 commits