ipex-llm

Author	SHA1	Message	Date
Yishuo Wang	170e3d65e0	use new sdp and fp32 sdp (#11007 )	2024-05-14 14:29:18 +08:00
Cengguang Zhang	cfed76b2ed	LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. (#10937 ) * LLM: add split tensor support for baichuan2-7b and qwen1.5-7b. * fix style. * fix style. * fix style. * add support for mistral and fix condition threshold. * fix style. * fix comments.	2024-05-10 16:40:15 +08:00
Yishuo Wang	d884c62dc4	remove new_layout parameter (#10906 )	2024-04-29 10:31:50 +08:00
Yina Chen	dc27b3bc35	Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) (#10790 ) * update sdp condition * update * fix * update & test llama * mistral * fix style * update * fix style * remove pvc constrain * update ds on arc * fix style	2024-04-24 17:24:01 +08:00
Xin Qiu	e764f9b1b1	Disable fast fused rope on UHD (#10780 ) * use decoding fast path * update * update * cleanup	2024-04-18 10:03:53 +08:00
Cengguang Zhang	3e2662c87e	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
Yishuo Wang	8086554d33	use new fp16 sdp in llama and mistral (#10734 )	2024-04-12 10:49:02 +08:00
Keyan (Kyrie) Zhang	585c174e92	Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707 ) * Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables. * Fix style	2024-04-10 10:48:46 +08:00
Yina Chen	c7422712fc	mistral 4.36 use fp16 sdp (#10704 )	2024-04-09 13:50:33 +08:00
Yang Wang	5a1f446d3c	support fp8 in xetla (#10555 ) * support fp8 in xetla * change name * adjust model file * support convert back to cpu * factor * fix bug * fix style	2024-04-08 13:22:09 -07:00
Jiao Wang	69bdbf5806	Fix vllm print error message issue (#10664 ) * update chatglm readme * Add condition to invalidInputError * update * update * style	2024-04-05 15:08:13 -07:00
binbin Deng	0a3e4e788f	LLM: fix mistral hidden_size setting for deepspeed autotp (#10527 )	2024-03-26 10:55:44 +08:00
Xin Qiu	1dd40b429c	enable fp4 fused mlp and qkv (#10531 ) * enable fp4 fused mlp and qkv * update qwen * update qwen2	2024-03-26 08:34:00 +08:00
Wang, Jian4	9df70d95eb	Refactor bigdl.llm to ipex_llm (#24 ) * Rename bigdl/llm to ipex_llm * rm python/llm/src/bigdl * from bigdl.llm to from ipex_llm	2024-03-22 15:41:21 +08:00

14 commits