ipex-llm

Author	SHA1	Message	Date
Yina Chen	670ad887fc	Qwen support compress kv (#11680 ) * Qwen support compress kv * fix style * fix	2024-07-30 11:16:42 +08:00
hxsz1997	9b36877897	disable default quantize_kv of GQA on MTL (#11679 ) * disable default quantizekv of gqa in mtl * fix stype * fix stype * fix stype * fix stype * fix stype * fix stype	2024-07-30 09:38:46 +08:00
Yina Chen	fc7f8feb83	Support compress kv (#11642 ) * mistral snapkv * update * mtl update * update * update * update * add comments * style fix * fix style * support llama * llama use compress kv * support mistral 4.40 * fix style * support diff transformers versions * move snapkv util to kv * fix style * meet comments & small fix * revert all in one * fix indent --------- Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2024-07-26 16:02:00 +08:00
Cengguang Zhang	70ab1a6f1a	LLM: unify memory optimization env variables. (#11549 ) * LLM: unify memory optimization env variables. * fix comments.	2024-07-11 11:01:28 +08:00
Guoqiong Song	c44b1942ed	fix mistral for transformers>=4.39 (#11191 ) * fix mistral for transformers>=4.39	2024-06-18 13:39:35 -07:00
Yina Chen	b6b70d1ba0	Divide core-xe packages (#11131 ) * temp * add batch * fix style * update package name * fix style * add workflow * use temp version to run uts * trigger performance test * trigger win igpu perf * revert workflow & setup	2024-05-28 12:00:18 +08:00
Yishuo Wang	170e3d65e0	use new sdp and fp32 sdp (#11007 )	2024-05-14 14:29:18 +08:00
Cengguang Zhang	cfed76b2ed	LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. (#10937 ) * LLM: add split tensor support for baichuan2-7b and qwen1.5-7b. * fix style. * fix style. * fix style. * add support for mistral and fix condition threshold. * fix style. * fix comments.	2024-05-10 16:40:15 +08:00
Yishuo Wang	d884c62dc4	remove new_layout parameter (#10906 )	2024-04-29 10:31:50 +08:00
Yina Chen	dc27b3bc35	Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) (#10790 ) * update sdp condition * update * fix * update & test llama * mistral * fix style * update * fix style * remove pvc constrain * update ds on arc * fix style	2024-04-24 17:24:01 +08:00
Xin Qiu	e764f9b1b1	Disable fast fused rope on UHD (#10780 ) * use decoding fast path * update * update * cleanup	2024-04-18 10:03:53 +08:00
Cengguang Zhang	3e2662c87e	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
Yishuo Wang	8086554d33	use new fp16 sdp in llama and mistral (#10734 )	2024-04-12 10:49:02 +08:00
Keyan (Kyrie) Zhang	585c174e92	Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707 ) * Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables. * Fix style	2024-04-10 10:48:46 +08:00
Yina Chen	c7422712fc	mistral 4.36 use fp16 sdp (#10704 )	2024-04-09 13:50:33 +08:00
Yang Wang	5a1f446d3c	support fp8 in xetla (#10555 ) * support fp8 in xetla * change name * adjust model file * support convert back to cpu * factor * fix bug * fix style	2024-04-08 13:22:09 -07:00
Jiao Wang	69bdbf5806	Fix vllm print error message issue (#10664 ) * update chatglm readme * Add condition to invalidInputError * update * update * style	2024-04-05 15:08:13 -07:00
binbin Deng	0a3e4e788f	LLM: fix mistral hidden_size setting for deepspeed autotp (#10527 )	2024-03-26 10:55:44 +08:00
Xin Qiu	1dd40b429c	enable fp4 fused mlp and qkv (#10531 ) * enable fp4 fused mlp and qkv * update qwen * update qwen2	2024-03-26 08:34:00 +08:00
Wang, Jian4	9df70d95eb	Refactor bigdl.llm to ipex_llm (#24 ) * Rename bigdl/llm to ipex_llm * rm python/llm/src/bigdl * from bigdl.llm to from ipex_llm	2024-03-22 15:41:21 +08:00

20 commits