ipex-llm

Author	SHA1	Message	Date
WeiguangHan	be5836bee1	LLM: fix outlier value (#9945 ) * fix outlier value * small fix	2024-01-23 17:04:13 +08:00
Yishuo Wang	2c8a9aaf0d	fix qwen causal mask when quantize_kv_cache=True (#9968 )	2024-01-23 16:34:05 +08:00
Yina Chen	5aa4b32c1b	LLM: Add qwen spec gpu example (#9965 ) * add qwen spec gpu example * update readme --------- Co-authored-by: rnwang04 <ruonan1.wang@intel.com>	2024-01-23 15:59:43 +08:00
Yina Chen	36c665667d	Add logits processor & qwen eos stop in speculative decoding (#9963 ) * add logits processor & qwen eos * fix style * fix * fix * fix style * fix style * support transformers 4.31 * fix style * fix style --------- Co-authored-by: rnwang04 <ruonan1.wang@intel.com>	2024-01-23 15:57:28 +08:00
Ruonan Wang	60b35db1f1	LLM: add chatglm3 speculative decoding example (#9966 ) * add chatglm3 example * update * fix	2024-01-23 15:54:12 +08:00
Xin Qiu	da4687c917	fix fp16 (#9970 )	2024-01-23 15:53:32 +08:00
Lilac09	052962dfa5	Using original fastchat and add bigdl worker in docker image (#9967 ) * add vllm worker * add options in entrypoint	2024-01-23 14:17:05 +08:00
Chen, Zhentao	301425e377	harness tests on pvc multiple xpus (#9908 ) * add run_multi_llb.py * update readme * add job hint	2024-01-23 13:20:37 +08:00
Ruonan Wang	27b19106f3	LLM: add readme for speculative decoding gpu examples (#9961 ) * add readme * add readme * meet code review	2024-01-23 12:54:19 +08:00
Chen, Zhentao	39219b7e9a	add default device meta when lcmu enabled (#9941 )	2024-01-23 11:00:49 +08:00
Xin Qiu	dacf680294	add fused rotary pos emb for qwen (#9956 ) * add fused rotary pos emb for qwen * update	2024-01-23 10:37:56 +08:00
Ruonan Wang	7b1d9ad7c0	LLM: limit esimd sdp usage for k_len < 8 (#9959 ) * update * fix	2024-01-23 09:28:23 +08:00
Ruonan Wang	3e601f9a5d	LLM: Support speculative decoding in bigdl-llm (#9951 ) * first commit * fix error, add llama example * hidden print * update api usage * change to api v3 * update * meet code review * meet code review, fix style * add reference, fix style * fix style * fix first token time	2024-01-22 19:14:56 +08:00
Jinyi Wan	6341c498b3	Fix the links of BlueLM and SOLAR (#9954 )	2024-01-22 15:58:10 +08:00
Cheen Hau, 俊豪	947b1e27b7	Add readme for Whisper Test (#9944 ) * Fix local data path * Remove non-essential files * Add readme * Minor fixes to script * Bugfix, refactor * Add references to original source. Bugfixes. * Reviewer comments * Properly print and explain output * Move files to dev/benchmark * Fixes	2024-01-22 15:11:33 +08:00
Xin Qiu	6fb3f40f7e	fix error for benchmark_util.py running on cpu (#9949 )	2024-01-22 10:14:40 +08:00
Heyang Sun	fb91c97fe8	support for Baichuan/Baichuan2 13B Chat running speculative decoding (#9921 ) * support for Baichuan/Baichuan2 13B Chat running speculative decoding * fix stype	2024-01-22 09:11:44 +08:00
Xin Qiu	97f0cd8975	optimize Decilm 7b (#9922 ) * optimize deci * update * decilm attension forward	2024-01-19 17:31:13 +08:00
Wang, Jian4	bcaeb05272	Update optimize qwen (#9943 ) * update for n tokens input * fix dtype * update	2024-01-19 16:54:59 +08:00
binbin Deng	db8e90796a	LLM: add avg token latency information and benchmark guide of autotp (#9940 )	2024-01-19 15:09:57 +08:00
Ruonan Wang	bf37b3a670	LLM: optimize CPU speculative decoding of chatglm3 (#9928 ) * update * fix style * meet code review	2024-01-19 14:10:22 +08:00
Shaojun Liu	967714bac8	gguf memory optimization for mixtral (#9939 )	2024-01-19 11:13:15 +08:00
Xin Qiu	610b5226be	move reserved memory to benchmark_utils.py (#9907 ) * move reserved memory to benchmark_utils.py * meet code review	2024-01-19 09:44:30 +08:00
Lilac09	7032a2ad73	Optimize gguf load memory for mistral (#9923 ) * optimize gguf load for mistral * fix output of gguf mistral * reset	2024-01-19 09:14:39 +08:00
Shaojun Liu	9a46f019d7	gguf memory optimization for baichuan (#9937 )	2024-01-19 09:11:02 +08:00
Guancheng Fu	2e1448f08e	[Serving] Add vllm_worker to fastchat serving framework (#9934 ) * add worker * finish * finish * add license * add more comments	2024-01-18 21:33:36 +08:00
Chen, Zhentao	a8c866c32b	add ppl benchmark (#9914 ) * add ppl benchmark * add license * add readme * add dataset argument * add dataset usage * fixed low bit args * correct result * fix terminal display * fix ppl update * enable fp16 fp32 bf16 * format the desc * fix model_kwargs * add more readme	2024-01-18 17:54:28 +08:00
WeiguangHan	100e0a87e5	LLM: add compressed chatglm3 model (#9892 ) * LLM: add compressed chatglm3 model * small fix * revert github action	2024-01-18 17:48:15 +08:00
Yuwen Hu	9e2ac5291b	Add rwkv v4 back for igpu perf test 32-512 (#9938 )	2024-01-18 17:15:28 +08:00
Yishuo Wang	7bbb98abb6	Disable fused layer norm when using XMX to fix mpt UT (#9933 )	2024-01-18 16:22:12 +08:00
Wang, Jian4	1fc9dfa265	LLM: Update for Qwen n tokens inputs (#9931 ) * update for n tokens inputs * update style * update	2024-01-18 15:56:29 +08:00
Heyang Sun	5184f400f9	Fix Mixtral GGUF Wrong Output Issue (#9930 ) * Fix Mixtral GGUF Wrong Output Issue * fix style * fix style	2024-01-18 14:11:27 +08:00
Yishuo Wang	453df868c9	add rwkv v5 attention kernel (#9927 )	2024-01-18 10:16:29 +08:00
Ruonan Wang	054952f82f	LLM: Fix rope of chatglm3 to support speculative decoding on CPU (#9926 )	2024-01-18 09:28:10 +08:00
Ziteng Zhang	18cd1f1432	[LLM]Solve the problem of calling bmm operator in BF16Linear (#9924 ) * Solve the problem of calling bmm operator in BF16Linear	2024-01-17 18:08:35 +08:00
Cheen Hau, 俊豪	e403e4a8b7	Add APT install instructions for oneAPI (#9875 ) * Add APT installer * Include reviewer suggestions * Add note - ensure matching version of oneAPI and pytorch/ipex * Fix 'command line installer' * Fix formatting. Address review comments * Append ':' to '..by running the following commands' * Fix formatting * achieve -> archive	2024-01-17 17:30:30 +08:00
Yina Chen	98b86f83d4	Support fast rope for training (#9745 ) * init * init * fix style * add test and fix * address comment * update * merge upstream main	2024-01-17 15:51:38 +08:00
Yuwen Hu	0c498a7b64	Add llama2-13b to igpu perf test (#9920 )	2024-01-17 14:58:45 +08:00
Ruonan Wang	b059a32fff	LLM: add benchmark api for bigdl-llm fp16 on GPU (#9919 ) * add bmk for bigdl fp16 * fix	2024-01-17 14:24:35 +08:00
Ruonan Wang	427f75000b	LLM: fix sdp of chatglm3 (#9917 ) * fix * fix * fix	2024-01-17 13:37:28 +08:00
Yuwen Hu	68d78fb57e	[LLM] Small improvement to iGPU perf test #9915 ) - Avoid delete csv if there is something wrong with concating csv	2024-01-17 11:21:58 +08:00
Shaojun Liu	32c56ffc71	pip install deps (#9916 )	2024-01-17 11:03:57 +08:00
Yishuo Wang	94767da7cf	optimize rwkv v4 first token performance (#9912 )	2024-01-17 09:27:41 +08:00
Cengguang Zhang	511cbcf773	LLM: add Ceval benchmark test. (#9872 ) * init ceval benchmark test. * upload dataset. * add other tests. * add qwen evaluator. * fix qwen evaluator style. * fix qwen evaluator style. * update qwen evaluator. * add llama evaluator. * update eval * fix typo. * fix * fix typo. * fix llama evaluator. * fix bug. * fix style. * delete dataset. * fix style. * fix style. * add README.md and fix typo. * fix comments. * remove run scripts	2024-01-16 19:14:26 +08:00
Shaojun Liu	b909c5c9c2	GGUF load memory optimization (#9913 ) * block-wise * convert linear for module * revert * Fix PEP8 checks Error	2024-01-16 18:54:39 +08:00
Yuwen Hu	8643b62521	[LLM] Support longer context in iGPU perf tests (2048-256) (#9910 )	2024-01-16 17:48:37 +08:00
Xin Qiu	dee32f7d15	copy fused rms norm's reuslt to avoid <unk> (#9909 )	2024-01-16 16:54:08 +08:00
ZehuaCao	05ea0ecd70	add pv for llm-serving k8s deployment (#9906 )	2024-01-16 11:32:54 +08:00
Ruonan Wang	8d7326ae03	LLM: fix chatglm3 sdp to support speculative decoding (#9900 ) * fix chatglm3 * fix * update * meet code review * fix	2024-01-16 11:29:13 +08:00
Guancheng Fu	9f34da7cdb	Update PVC XMX condition (#9901 ) * update pvc xmx condition * update condition * update conditon	2024-01-15 15:42:15 +08:00

... 3 4 5 6 7 ...

2228 commits