ipex-llm

Author	SHA1	Message	Date
Ruonan Wang	7b1d9ad7c0	LLM: limit esimd sdp usage for k_len < 8 (#9959 ) * update * fix	2024-01-23 09:28:23 +08:00
Ruonan Wang	3e601f9a5d	LLM: Support speculative decoding in bigdl-llm (#9951 ) * first commit * fix error, add llama example * hidden print * update api usage * change to api v3 * update * meet code review * meet code review, fix style * add reference, fix style * fix style * fix first token time	2024-01-22 19:14:56 +08:00
Jinyi Wan	6341c498b3	Fix the links of BlueLM and SOLAR (#9954 )	2024-01-22 15:58:10 +08:00
Cheen Hau, 俊豪	947b1e27b7	Add readme for Whisper Test (#9944 ) * Fix local data path * Remove non-essential files * Add readme * Minor fixes to script * Bugfix, refactor * Add references to original source. Bugfixes. * Reviewer comments * Properly print and explain output * Move files to dev/benchmark * Fixes	2024-01-22 15:11:33 +08:00
Xin Qiu	6fb3f40f7e	fix error for benchmark_util.py running on cpu (#9949 )	2024-01-22 10:14:40 +08:00
Heyang Sun	fb91c97fe8	support for Baichuan/Baichuan2 13B Chat running speculative decoding (#9921 ) * support for Baichuan/Baichuan2 13B Chat running speculative decoding * fix stype	2024-01-22 09:11:44 +08:00
Xin Qiu	97f0cd8975	optimize Decilm 7b (#9922 ) * optimize deci * update * decilm attension forward	2024-01-19 17:31:13 +08:00
Wang, Jian4	bcaeb05272	Update optimize qwen (#9943 ) * update for n tokens input * fix dtype * update	2024-01-19 16:54:59 +08:00
binbin Deng	db8e90796a	LLM: add avg token latency information and benchmark guide of autotp (#9940 )	2024-01-19 15:09:57 +08:00
Ruonan Wang	bf37b3a670	LLM: optimize CPU speculative decoding of chatglm3 (#9928 ) * update * fix style * meet code review	2024-01-19 14:10:22 +08:00
Shaojun Liu	967714bac8	gguf memory optimization for mixtral (#9939 )	2024-01-19 11:13:15 +08:00
Xin Qiu	610b5226be	move reserved memory to benchmark_utils.py (#9907 ) * move reserved memory to benchmark_utils.py * meet code review	2024-01-19 09:44:30 +08:00
Lilac09	7032a2ad73	Optimize gguf load memory for mistral (#9923 ) * optimize gguf load for mistral * fix output of gguf mistral * reset	2024-01-19 09:14:39 +08:00
Shaojun Liu	9a46f019d7	gguf memory optimization for baichuan (#9937 )	2024-01-19 09:11:02 +08:00
Guancheng Fu	2e1448f08e	[Serving] Add vllm_worker to fastchat serving framework (#9934 ) * add worker * finish * finish * add license * add more comments	2024-01-18 21:33:36 +08:00
Chen, Zhentao	a8c866c32b	add ppl benchmark (#9914 ) * add ppl benchmark * add license * add readme * add dataset argument * add dataset usage * fixed low bit args * correct result * fix terminal display * fix ppl update * enable fp16 fp32 bf16 * format the desc * fix model_kwargs * add more readme	2024-01-18 17:54:28 +08:00
WeiguangHan	100e0a87e5	LLM: add compressed chatglm3 model (#9892 ) * LLM: add compressed chatglm3 model * small fix * revert github action	2024-01-18 17:48:15 +08:00
Yuwen Hu	9e2ac5291b	Add rwkv v4 back for igpu perf test 32-512 (#9938 )	2024-01-18 17:15:28 +08:00
Yishuo Wang	7bbb98abb6	Disable fused layer norm when using XMX to fix mpt UT (#9933 )	2024-01-18 16:22:12 +08:00
Wang, Jian4	1fc9dfa265	LLM: Update for Qwen n tokens inputs (#9931 ) * update for n tokens inputs * update style * update	2024-01-18 15:56:29 +08:00
Heyang Sun	5184f400f9	Fix Mixtral GGUF Wrong Output Issue (#9930 ) * Fix Mixtral GGUF Wrong Output Issue * fix style * fix style	2024-01-18 14:11:27 +08:00
Yishuo Wang	453df868c9	add rwkv v5 attention kernel (#9927 )	2024-01-18 10:16:29 +08:00
Ruonan Wang	054952f82f	LLM: Fix rope of chatglm3 to support speculative decoding on CPU (#9926 )	2024-01-18 09:28:10 +08:00
Ziteng Zhang	18cd1f1432	[LLM]Solve the problem of calling bmm operator in BF16Linear (#9924 ) * Solve the problem of calling bmm operator in BF16Linear	2024-01-17 18:08:35 +08:00
Cheen Hau, 俊豪	e403e4a8b7	Add APT install instructions for oneAPI (#9875 ) * Add APT installer * Include reviewer suggestions * Add note - ensure matching version of oneAPI and pytorch/ipex * Fix 'command line installer' * Fix formatting. Address review comments * Append ':' to '..by running the following commands' * Fix formatting * achieve -> archive	2024-01-17 17:30:30 +08:00
Yina Chen	98b86f83d4	Support fast rope for training (#9745 ) * init * init * fix style * add test and fix * address comment * update * merge upstream main	2024-01-17 15:51:38 +08:00
Yuwen Hu	0c498a7b64	Add llama2-13b to igpu perf test (#9920 )	2024-01-17 14:58:45 +08:00
Ruonan Wang	b059a32fff	LLM: add benchmark api for bigdl-llm fp16 on GPU (#9919 ) * add bmk for bigdl fp16 * fix	2024-01-17 14:24:35 +08:00
Ruonan Wang	427f75000b	LLM: fix sdp of chatglm3 (#9917 ) * fix * fix * fix	2024-01-17 13:37:28 +08:00
Yuwen Hu	68d78fb57e	[LLM] Small improvement to iGPU perf test #9915 ) - Avoid delete csv if there is something wrong with concating csv	2024-01-17 11:21:58 +08:00
Shaojun Liu	32c56ffc71	pip install deps (#9916 )	2024-01-17 11:03:57 +08:00
Yishuo Wang	94767da7cf	optimize rwkv v4 first token performance (#9912 )	2024-01-17 09:27:41 +08:00
Cengguang Zhang	511cbcf773	LLM: add Ceval benchmark test. (#9872 ) * init ceval benchmark test. * upload dataset. * add other tests. * add qwen evaluator. * fix qwen evaluator style. * fix qwen evaluator style. * update qwen evaluator. * add llama evaluator. * update eval * fix typo. * fix * fix typo. * fix llama evaluator. * fix bug. * fix style. * delete dataset. * fix style. * fix style. * add README.md and fix typo. * fix comments. * remove run scripts	2024-01-16 19:14:26 +08:00
Shaojun Liu	b909c5c9c2	GGUF load memory optimization (#9913 ) * block-wise * convert linear for module * revert * Fix PEP8 checks Error	2024-01-16 18:54:39 +08:00
Yuwen Hu	8643b62521	[LLM] Support longer context in iGPU perf tests (2048-256) (#9910 )	2024-01-16 17:48:37 +08:00
Xin Qiu	dee32f7d15	copy fused rms norm's reuslt to avoid <unk> (#9909 )	2024-01-16 16:54:08 +08:00
ZehuaCao	05ea0ecd70	add pv for llm-serving k8s deployment (#9906 )	2024-01-16 11:32:54 +08:00
Ruonan Wang	8d7326ae03	LLM: fix chatglm3 sdp to support speculative decoding (#9900 ) * fix chatglm3 * fix * update * meet code review * fix	2024-01-16 11:29:13 +08:00
Guancheng Fu	9f34da7cdb	Update PVC XMX condition (#9901 ) * update pvc xmx condition * update condition * update conditon	2024-01-15 15:42:15 +08:00
Yishuo Wang	6637860ddf	change xmx condition (#9896 )	2024-01-12 19:51:48 +08:00
WeiguangHan	0e69bfe6b0	LLM: fix the performance drop of starcoder (#9889 ) * LLM: fix the performance drop of starcoder * small fix * small fix	2024-01-12 09:14:15 +08:00
Ruonan Wang	d9cf55bce9	LLM: fix MLP check of mixtral (#9891 )	2024-01-11 18:01:59 +08:00
Ziteng Zhang	4f4ce73f31	[LLM] Add transformer_autocast_bf16 into all-in-one (#9890 ) * Add transformer_autocast_bf16 into all-in-one	2024-01-11 17:51:07 +08:00
Ziteng Zhang	4af88a67b9	support chatglm3 with bf16 (#9888 ) * support chatglm3 with bigdl-bf16	2024-01-11 16:45:21 +08:00
Yuwen Hu	0aef35a965	[LLM] Improve LLM doc regarding windows gpu related info (#9880 ) * Improve runtime configuration for windows * Add python 310/311 supports for wheel downloading * Add troubleshooting for windows gpu * Remove manually import ipex due to auto importer * Add info regarding cpu_embedding=True on iGPU * More info for Windows users * Small updates to API docs * Python style fix * Remove tip for loading from saved optimize_model for now * Updated based on comments * Update win info for multi-intel gpus selection * Small fix * Small fix	2024-01-11 14:37:16 +08:00
Jinyi Wan	07485eff5a	Add SOLAR-10.7B to README (#9869 )	2024-01-11 14:28:41 +08:00
Kai Huang	5e766e8105	Fix Mixtral typo (#9882 )	2024-01-10 19:51:24 +08:00
Kai Huang	b53a5cb6c9	Fix Mixtral typo (#9881 ) * fix typo * fix doc page	2024-01-10 19:40:52 +08:00
WeiguangHan	33fd1f9c76	LLM: fix input length logic for run_transformer_int4_gpu (#9864 ) * LLM: fix input length logic for run_transformer_int4_gpu * small fix * small fix * small fix	2024-01-10 18:20:14 +08:00
Ruonan Wang	53531ae4ee	LLM: support qkv fusion for fp8e5 (#9878 ) * update * add mistral * meet code review	2024-01-10 17:50:00 +08:00

1 2 3 4 5 ...

2017 commits