ipex-llm

Author	SHA1	Message	Date
Wang, Jian4	5f95054f97	LLM：Add qwen moe example libs md (#10828 )	2024-04-22 10:03:19 +08:00
Ruonan Wang	1edb19c1dd	small fix of cpp quickstart(#10829 )	2024-04-22 09:44:08 +08:00
Guancheng Fu	61c67af386	Fix vLLM-v2 install instructions(#10822 )	2024-04-22 09:02:48 +08:00
Jason Dai	3cd21d5105	Update readme (#10817 )	2024-04-19 22:16:17 +08:00
SONG Ge	197f8dece9	Add open-webui windows document (#10775 ) * add windows document * update * fix document * build fix * update some description * reorg document structure * update doc * re-update to better view * add reminder for running model on gpus * update * remove useless part	2024-04-19 18:06:40 +08:00
Ruonan Wang	a8df429985	QuickStart: Run Llama 3 on Intel GPU using llama.cpp and ollama with IPEX-LLM (#10809 ) * initial commit * update llama.cpp * add demo video at first * fix ollama link in readme * meet review * update * small fix	2024-04-19 17:44:59 +08:00
Guancheng Fu	caf75beef8	Disable sdpa (#10814 )	2024-04-19 17:33:18 +08:00
Yishuo Wang	57edf2033c	fix lookahead with transformers >= 4.36 (#10808 )	2024-04-19 16:24:56 +08:00
Yuwen Hu	34ff07b689	Add CPU related info to langchain-chatchat quickstart (#10812 )	2024-04-19 15:59:51 +08:00
Ovo233	1a885020ee	Updated importing of top_k_top_p_filtering for transformers>=4.39.0 (#10794 ) * In transformers>=4.39.0, the top_k_top_p_filtering function has been deprecated and moved to the hugging face package trl. Thus, for versions >= 4.39.0, import this function from trl.	2024-04-19 15:34:39 +08:00
Yuwen Hu	07e8b045a9	Add Meta-llama-3-8B-Instruct and Yi-6B-Chat to igpu nightly perf (#10810 )	2024-04-19 15:09:58 +08:00
SONG Ge	fbd1743b5e	Ollama quickstart update (#10806 ) * add ollama doc for OLLAMA_NUM_GPU * remove useless params * revert unexpected changes back * move env setting to server part * update	2024-04-19 15:00:25 +08:00
Yishuo Wang	08458b4f74	remove rms norm copy (#10793 )	2024-04-19 13:57:48 +08:00
Yuwen Hu	c7235e34a8	Small update to ut (#10804 )	2024-04-19 10:59:00 +08:00
Jason Dai	995c01367d	Update readme (#10802 )	2024-04-19 06:52:57 +08:00
Yang Wang	8153c3008e	Initial llama3 example (#10799 ) * Add initial hf huggingface GPU example * Small fix * Add llama3 gpu pytorch model example * Add llama 3 hf transformers CPU example * Add llama 3 pytorch model CPU example * Fixes * Small fix * Small fixes * Small fix * Small fix * Add links * update repo id * change prompt tuning url * remove system header if there is no system prompt --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com>	2024-04-18 11:01:33 -07:00
Ruonan Wang	754b0ffecf	Fix pvc llama (#10798 ) * ifx * update	2024-04-18 10:44:57 -07:00
Ruonan Wang	439c834ed3	LLM: add mixed precision for lm_head (#10795 ) * add mixed_quantization * meet code review * update * fix style * meet review	2024-04-18 19:11:31 +08:00
Yina Chen	8796401b08	Support q4k in ipex-llm (#10796 ) * support q4k * update	2024-04-18 18:55:28 +08:00
Zhicun	88463cbf47	fix transformer version (#10788 ) * fix transformer version * uninstall sentence transformer * uninstall * uninstall	2024-04-18 17:37:21 +08:00
Ruonan Wang	0e8aac19e3	add q6k precision in ipex-llm (#10792 ) * add q6k * add initial 16k * update * fix style	2024-04-18 16:52:09 +08:00
Qiyuan Gong	e90e31719f	axolotl lora example (#10789 ) * Add axolotl lora example * Modify readme * Add comments in yml	2024-04-18 16:38:32 +08:00
Wang, Jian4	14ca42a048	LLM：Fix moe indexs error on cpu (#10791 )	2024-04-18 15:56:52 +08:00
Guancheng Fu	cbe7b5753f	Add vLLM[xpu] related code (#10779 ) * Add ipex-llm side change * add runable offline_inference * refactor to call vllm2 * Verified async server * add new v2 example * add README * fix * change dir * refactor readme.md * add experimental * fix	2024-04-18 15:29:20 +08:00
Kai Huang	053ec30737	Transformers ppl evaluation on wikitext (#10784 ) * tranformers code * cache	2024-04-18 15:27:18 +08:00
Wang, Jian4	209c3501e6	LLM: Optimize qwen1.5 moe model (#10706 ) * update moe block * fix style * enable optmize MLP * enabel kv_cache * enable fuse rope * enable fused qkv * enable flash_attention * error sdp quantize * use old api * use fuse * use xetla * fix python style * update moe_blocks num * fix output error * add cpu sdpa * update * update * update	2024-04-18 14:54:05 +08:00
Ziteng Zhang	ff040c8f01	LISA Finetuning Example (#10743 ) * enabling xetla only supports qtype=SYM_INT4 or FP8E5 * LISA Finetuning Example on gpu * update readme * add licence * Explain parameters of lisa & Move backend codes to src dir * fix style * fix style * update readme * support chatglm * fix style * fix style * update readme * fix	2024-04-18 13:48:10 +08:00
Heyang Sun	581ebf6104	GaLore Finetuning Example (#10722 ) * GaLore Finetuning Example * Update README.md * Update README.md * change data to HuggingFaceH4/helpful_instructions * Update README.md * Update README.md * shrink train size and delete cache before starting training to save memory * Update README.md * Update galore_finetuning.py * change model to llama2 3b * Update README.md	2024-04-18 13:47:41 +08:00
Yang Wang	952e517db9	use config rope_theta (#10787 ) * use config rope_theta * fix style	2024-04-17 20:39:11 -07:00
Guancheng Fu	31ea2f9a9f	Fix wrong output for Llama models on CPU (#10742 )	2024-04-18 11:07:27 +08:00
Xin Qiu	e764f9b1b1	Disable fast fused rope on UHD (#10780 ) * use decoding fast path * update * update * cleanup	2024-04-18 10:03:53 +08:00
Yina Chen	ea5b373a97	Add lookahead GPU example (#10785 ) * Add lookahead example * fix style & attn mask * fix typo * address comments	2024-04-17 17:41:55 +08:00
Wang, Jian4	a20271ffe4	LLM: Fix yi-6b fp16 error on pvc (#10781 ) * updat for yi fp16 * update * update	2024-04-17 16:49:59 +08:00
ZehuaCao	0646e2c062	Fix short prompt for IPEX_CPU speculative decoding cause no_attr error (#10783 )	2024-04-17 16:19:57 +08:00
Cengguang Zhang	7ec82c6042	LLM: add README.md for Long-Context examples. (#10765 ) * LLM: add readme to long-context examples. * add precision. * update wording. * add GPU type. * add Long-Context example to GPU examples. * fix comments. * update max input length. * update max length. * add output length. * fix wording.	2024-04-17 15:34:59 +08:00
Yina Chen	766fe45222	Fix spec error caused by lookup pr (#10777 ) * Fix spec error * remove * fix style	2024-04-17 11:27:35 +08:00
Qiyuan Gong	9e5069437f	Fix gradio version in axolotl example (#10776 ) * Change to gradio>=4.19.2	2024-04-17 10:23:43 +08:00
Qiyuan Gong	f2e923b3ca	Axolotl v0.4.0 support (#10773 ) * Add Axolotl 0.4.0, remove legacy 0.3.0 support. * replace is_torch_bf16_gpu_available * Add HF_HUB_OFFLINE=1 * Move transformers out of requirement * Refine readme and qlora.yml	2024-04-17 09:49:11 +08:00
Heyang Sun	26cae0a39c	Update FLEX in Deepspeed README (#10774 ) * Update FLEX in Deepspeed README * Update README.md	2024-04-17 09:28:24 +08:00
Wenjing Margaret Mao	c41730e024	edit 'ppl_result does not exist' issue, delete useless code (#10767 ) * edit ppl_result not exist issue, delete useless code * delete nonzero_min function --------- Co-authored-by: jenniew <jenniewang123@gmail.com>	2024-04-16 18:11:56 +08:00
Yina Chen	899d392e2f	Support prompt lookup in ipex-llm (#10768 ) * lookup init * add lookup * fix style * remove redundant code * change param name * fix style	2024-04-16 16:52:38 +08:00
Qiyuan Gong	d30b22a81b	Refine axolotl 0.3.0 documents and links (#10764 ) * Refine axolotl 0.3 based on comments * Rename requirements to requirement-xpu * Add comments for paged_adamw_32bit * change lora_r from 8 to 16	2024-04-16 14:47:45 +08:00
ZehuaCao	599a88db53	Add deepsped-autoTP-Fastapi serving (#10748 ) * add deepsped-autoTP-Fastapi serving * add readme * add license * update * update * fix	2024-04-16 14:03:23 +08:00
ZehuaCao	a7c12020b4	Add fastchat quickstart (#10688 ) * add fastchat quickstart * update * update * update	2024-04-16 14:02:38 +08:00
Ruonan Wang	ea5e46c8cb	Small update of quickstart (#10772 )	2024-04-16 10:46:58 +08:00
binbin Deng	0a62933d36	LLM: fix qwen AutoTP (#10766 )	2024-04-16 09:56:17 +08:00
Cengguang Zhang	3e2662c87e	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
Shaojun Liu	7297036c03	upgrade python (#10769 )	2024-04-16 09:28:10 +08:00
Yuwen Hu	1abd77507e	Small update for GPU configuration related doc (#10770 ) * Small doc fix for dGPU type name * Further fixes * Further fix * Small fix	2024-04-15 18:43:29 +08:00
Jin Qiao	73a67804a4	GPU configuration update for examples (windows pip installer, etc.) (#10762 ) * renew chatglm3-6b gpu example readme fix fix fix * fix for comments * fix * fix * fix * fix * fix * apply on HF-Transformers-AutoModels * apply on PyTorch-Models * fix * fix	2024-04-15 17:42:52 +08:00

1 2 3 4 5 ...

2659 commits