ipex-llm

Author	SHA1	Message	Date
Jin Qiao	cc8b3be11c	Add GPU and CPU example for stablelm-zephyr-3b (#10643 ) * Add example for StableLM * fix * add to readme	2024-04-03 16:28:31 +08:00
Heyang Sun	6000241b10	Add Deepspeed Example of FLEX Mistral (#10640 )	2024-04-03 16:04:17 +08:00
Shaojun Liu	d18dbfb097	update spr perf test (#10644 )	2024-04-03 15:53:55 +08:00
Heyang Sun	4f6df37805	fix wrong cpu core num seen by docker (#10645 )	2024-04-03 15:52:25 +08:00
Yishuo Wang	702e686901	optimize starcoder normal kv cache (#10642 )	2024-04-03 15:27:02 +08:00
Xin Qiu	3a9ab8f1ae	fix stablelm logits diff (#10636 ) * fix logits diff * Small fixes --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-04-03 15:08:12 +08:00
Ovo233	97c626d76f	add continue quickstart (#10610 ) Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>	2024-04-03 14:50:11 +08:00
Zhicun	b827f534d5	Add tokenizer_id in Langchain (#10588 ) * fix low-bit * fix * fix style --------- Co-authored-by: arda <arda@arda-arc12.sh.intel.com>	2024-04-03 14:25:35 +08:00
Zhicun	f6fef09933	fix prompt format for llama-2 in langchain (#10637 )	2024-04-03 14:17:34 +08:00
Jiao Wang	330d4b4f4b	update readme (#10631 )	2024-04-02 23:08:02 -07:00
Kai Huang	c875b3c858	Add seq len check for llama softmax upcast to fp32 (#10629 )	2024-04-03 12:05:13 +08:00
Shaojun Liu	1aef3bc0ab	verify and refine ipex-llm-finetune-qlora-xpu docker document (#10638 ) * verify and refine finetune-xpu document * update export_merged_model.py link * update link	2024-04-03 11:33:13 +08:00
Shaojun Liu	0779ca3db0	Bump ossf/scorecard-action to v2.3.1 (#10639 ) * Bump ossf/scorecard-action to v2.3.1 * revert	2024-04-03 11:14:18 +08:00
Jiao Wang	4431134ec5	update readme (#10632 )	2024-04-02 19:54:30 -07:00
Shaojun Liu	dfcf08c58a	update ossf/scorecard-action to fix TUF invalid key bug (#10635 )	2024-04-03 09:55:32 +08:00
Jiao Wang	23e33a0ca1	Fix qwen-vl style (#10633 ) * update * update	2024-04-02 18:41:38 -07:00
binbin Deng	2bbd8a1548	LLM: fix llama2 FP16 & bs>1 & autotp on PVC and ARC (#10611 )	2024-04-03 09:28:04 +08:00
Jiao Wang	654dc5ba57	Fix Qwen-VL example problem (#10582 ) * update * update * update * update	2024-04-02 12:17:30 -07:00
Heyang Sun	b8b923ed04	move chown step to behind add script in qlora Dockerfile	2024-04-02 23:04:51 +08:00
Jason Dai	e184c480d2	Update WebUI Quickstart (#10630 )	2024-04-02 21:49:19 +08:00
Yuwen Hu	fd384ddfb8	Optimize StableLM (#10619 ) * Initial commit for stablelm optimizations * Small style fix * add dependency * Add mlp optimizations * Small fix * add attention forward * Remove quantize kv for now as head_dim=80 * Add merged qkv * fix lisence * Python style fix --------- Co-authored-by: qiuxin2012 <qiuxin2012cs@gmail.com>	2024-04-02 18:58:38 +08:00
binbin Deng	27be448920	LLM: add `cpu_embedding` and peak memory record for deepspeed autotp script (#10621 )	2024-04-02 17:32:50 +08:00
Yishuo Wang	ba8cc6bd68	optimize starcoder2-3b (#10625 )	2024-04-02 17:16:29 +08:00
Shaojun Liu	a10f5a1b8d	add python style check (#10620 ) * add python style check * fix style checks * update runner * add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow * update tag to 2.1.0-SNAPSHOT	2024-04-02 16:17:56 +08:00
Cengguang Zhang	58b57177e3	LLM: support bigdl quantize kv cache env and add warning. (#10623 ) * LLM: support bigdl quantize kv cache env and add warnning. * fix style. * fix comments.	2024-04-02 15:41:08 +08:00
Shaojun Liu	20a5e72da0	refine and verify ipex-llm-serving-xpu docker document (#10615 ) * refine serving on cpu/xpu * minor fix * replace localhost with 0.0.0.0 so that service can be accessed through ip address	2024-04-02 11:45:45 +08:00
Yuwen Hu	89d780f2e9	Small fix to install guide (#10618 )	2024-04-02 11:10:55 +08:00
Kai Huang	0a95c556a1	Fix starcoder first token perf (#10612 ) * add bias check * update	2024-04-02 09:21:38 +08:00
Cengguang Zhang	e567956121	LLM: add memory optimization for llama. (#10592 ) * add initial memory optimization. * fix logic. * fix logic, * remove env var check in mlp split.	2024-04-02 09:07:50 +08:00
Keyan (Kyrie) Zhang	01f491757a	Modify the link in Langchain-upstream ut (#10608 ) * Modify the link in Langchain-upstream ut * fix langchain-upstream ut	2024-04-01 17:03:40 +08:00
Ruonan Wang	bfc1caa5e5	LLM: support iq1s for llama2-70b-hf (#10596 )	2024-04-01 13:13:13 +08:00
Ruonan Wang	d6af4877dd	LLM: remove ipex.optimize for gpt-j (#10606 ) * remove ipex.optimize * fix * fix	2024-04-01 12:21:49 +08:00
Shaojun Liu	59058bb206	replace 2.5.0-SNAPSHOT with 2.1.0-SNAPSHOT for llm docker images (#10603 )	2024-04-01 09:58:51 +08:00
Yishuo Wang	437a349dd6	fix rwkv with pip installer (#10591 )	2024-03-29 17:56:45 +08:00
WeiguangHan	9a83f21b86	LLM: check user env (#10580 ) * LLM: check user env * small fix * small fix * small fix	2024-03-29 17:19:34 +08:00
Shaojun Liu	c4b533f0e1	nightly build docker images (#10585 ) * nightly build docker images	2024-03-29 16:12:28 +08:00
Shaojun Liu	b06de94a50	verify xpu-inference image and refine document (#10593 )	2024-03-29 16:11:12 +08:00
Yuxuan Xia	856f1ace2b	Add linux 6.5 kernel installation (#10573 ) * Add linux 6.5 kernel installation * Fix linux quick start typo	2024-03-29 16:02:19 +08:00
Keyan (Kyrie) Zhang	848fa04dd6	Fix typo in Baichuan2 example (#10589 )	2024-03-29 13:31:47 +08:00
Shaojun Liu	52f1b541cf	refine and verify ipex-inference-cpu docker document (#10565 ) * restructure the index * refine and verify cpu-inference document * update	2024-03-29 10:16:10 +08:00
Ruonan Wang	0136fad1d4	LLM: support iq1_s (#10564 ) * init version * update utils * remove unsed code	2024-03-29 09:43:55 +08:00
Qiyuan Gong	f4537798c1	Enable kv cache quantization by default for flex when 1 < batch <= 8 (#10584 ) * Enable kv cache quantization by default for flex when 1 < batch <= 8. * Change up bound from <8 to <=8.	2024-03-29 09:43:42 +08:00
Cengguang Zhang	b44f7adbad	LLM: Disable esimd sdp for PVC GPU when batch size>1 (#10579 ) * llm: disable esimd sdp for pvc bz>1. * fix logic. * fix: avoid call get device name twice.	2024-03-28 22:55:48 +08:00
Yuwen Hu	e6c5a6a5e6	Small style fix in Install Guide (#10581 ) * Remove strange bold style * Small fix	2024-03-28 18:36:17 +08:00
Yuwen Hu	15b8964403	Win install change oneapi to pip installer (#10577 ) * Update windows related guide to use pip installer for oneAPI * Small style fix * Add oneAPI version * Update based on comments * Small fix	2024-03-28 18:22:46 +08:00
Xin Qiu	5963239b46	Fix qwen's position_ids no enough (#10572 ) * fix position_ids * fix position_ids	2024-03-28 17:05:49 +08:00
ZehuaCao	52a2135d83	Replace ipex with ipex-llm (#10554 ) * fix ipex with ipex_llm * fix ipex with ipex_llm * update * update * update * update * update * update * update * update	2024-03-28 13:54:40 +08:00
Keyan (Kyrie) Zhang	0a2e820c9f	Modify install_linux_gpu.md (#10576 )	2024-03-28 13:20:42 +08:00
Cheen Hau, 俊豪	1c5eb14128	Update pip install to use --extra-index-url for ipex package (#10557 ) * Change to 'pip install .. --extra-index-url' for readthedocs * Change to 'pip install .. --extra-index-url' for examples * Change to 'pip install .. --extra-index-url' for remaining files * Fix URL for ipex * Add links for ipex US and CN servers * Update ipex cpu url * remove readme * Update for github actions * Update for dockerfiles	2024-03-28 09:56:23 +08:00
binbin Deng	92dfed77be	LLM: fix abnormal output of fp16 deepspeed autotp (#10558 )	2024-03-28 09:35:48 +08:00

1 2 3 4 5 ...

2511 commits