ipex-llm

Author	SHA1	Message	Date
Yishuo Wang	1b3c7a6928	remove phi3 empty cache (#10997 )	2024-05-13 14:09:55 +08:00
ZehuaCao	99255fe36e	fix ppl (#10996 )	2024-05-13 13:57:19 +08:00
Kai Huang	f8dd2e52ad	Fix Langchain upstream ut (#10985 ) * Fix Langchain upstream ut * Small fix * Install bigdl-llm * Update run-langchain-upstream-tests.sh * Update run-langchain-upstream-tests.sh * Update llm_unit_tests.yml * Update run-langchain-upstream-tests.sh * Update llm_unit_tests.yml * Update run-langchain-upstream-tests.sh * fix git checkout * fix --------- Co-authored-by: Zhangky11 <2321096202@qq.com> Co-authored-by: Keyan (Kyrie) Zhang <79576162+Zhangky11@users.noreply.github.com>	2024-05-11 14:40:37 +08:00
Yuwen Hu	9f6358e4c2	Deprecate support for pytorch 2.0 on Linux for `ipex-llm >= 2.1.0b20240511` (#10986 ) * Remove xpu_2.0 option in setup.py * Disable xpu_2.0 test in UT and nightly * Update docs for deprecated pytorch 2.0 * Small doc update	2024-05-11 12:33:35 +08:00
Yishuo Wang	ad96f32ce0	optimize phi3 1st token performance (#10981 )	2024-05-10 17:33:46 +08:00
Cengguang Zhang	cfed76b2ed	LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. (#10937 ) * LLM: add split tensor support for baichuan2-7b and qwen1.5-7b. * fix style. * fix style. * fix style. * add support for mistral and fix condition threshold. * fix style. * fix comments.	2024-05-10 16:40:15 +08:00
binbin Deng	f9615f12d1	Add driver related packages version check in env script (#10977 )	2024-05-10 15:02:58 +08:00
Kai Huang	a6342cc068	Empty cache after phi first attention to support 4k input (#10972 ) * empty cache * fix style	2024-05-09 19:50:04 +08:00
Yishuo Wang	e753125880	use fp16_sdp when head_dim=96 (#10976 )	2024-05-09 17:02:59 +08:00
Yishuo Wang	697ca79eca	use quantize kv and sdp in phi3-mini (#10973 )	2024-05-09 15:16:18 +08:00
Wang, Jian4	f4c615b1ee	Add cohere example (#10954 ) * add link first * add_cpu_example * add GPU example	2024-05-08 17:19:59 +08:00
Wang, Jian4	3209d6b057	Fix spculative llama3 no stop error (#10963 ) * fix normal * add eos_tokens_id on sp and add list if * update * no none	2024-05-08 17:09:47 +08:00
Xiangyu Tian	02870dc385	LLM: Refine README of AutoTP-FastAPI example (#10960 )	2024-05-08 16:55:23 +08:00
Yishuo Wang	2ebec0395c	optimize phi-3-mini-128 (#10959 )	2024-05-08 16:33:17 +08:00
Xin Qiu	dfa3147278	update (#10944 )	2024-05-08 14:28:05 +08:00
Xin Qiu	5973d6c753	make gemma's output better (#10943 )	2024-05-08 14:27:51 +08:00
Jin Qiao	15ee3fd542	Update igpu perf internlm (#10958 )	2024-05-08 14:16:43 +08:00
Zhao Changmin	0d6e12036f	Disable fast_init_ in load_low_bit (#10945 ) * fast_init_ disable	2024-05-08 10:46:19 +08:00
Qiyuan Gong	164e6957af	Refine axolotl quickstart (#10957 ) * Add default accelerate config for axolotl quickstart. * Fix requirement link. * Upgrade peft to 0.10.0 in requirement.	2024-05-08 09:34:02 +08:00
Yishuo Wang	c801c37bc6	optimize phi3 again: use quantize kv if possible (#10953 )	2024-05-07 17:26:19 +08:00
Yishuo Wang	aa2fa9fde1	optimize phi3 again: use sdp if possible (#10951 )	2024-05-07 15:53:08 +08:00
Qiyuan Gong	c11170b96f	Upgrade Peft to 0.10.0 in finetune examples and docker (#10930 ) * Upgrade Peft to 0.10.0 in finetune examples. * Upgrade Peft to 0.10.0 in docker.	2024-05-07 15:12:26 +08:00
Qiyuan Gong	d7ca5d935b	Upgrade Peft version to 0.10.0 for LLM finetune (#10886 ) * Upgrade Peft version to 0.10.0 * Upgrade Peft version in ARC unit test and HF-Peft example.	2024-05-07 15:09:14 +08:00
Yuwen Hu	0efe26c3b6	Change order of chatglm2-6b and chatglm3-6b in iGPU perf test for more stable performance (#10948 )	2024-05-07 13:48:39 +08:00
hxsz1997	245c7348bc	Add codegemma example (#10884 ) * add codegemma example in GPU/HF-Transformers-AutoModels/ * add README of codegemma example in GPU/HF-Transformers-AutoModels/ * add codegemma example in GPU/PyTorch-Models/ * add readme of codegemma example in GPU/PyTorch-Models/ * add codegemma example in CPU/HF-Transformers-AutoModels/ * add readme of codegemma example in CPU/HF-Transformers-AutoModels/ * add codegemma example in CPU/PyTorch-Models/ * add readme of codegemma example in CPU/PyTorch-Models/ * fix typos * fix filename typo * add codegemma in tables * add comments of lm_head * remove comments of use_cache	2024-05-07 13:35:42 +08:00
Shaojun Liu	08ad40b251	improve ipex-llm-init for Linux (#10928 ) * refine ipex-llm-init * install libtcmalloc.so for Max * update based on comment * remove unneeded code	2024-05-07 12:55:14 +08:00
Wang, Jian4	191b184341	LLM: Optimize cohere model (#10878 ) * use mlp and rms * optimize kv_cache * add fuse qkv * add flash attention and fp16 sdp * error fp8 sdp * fix optimized * fix style * update * add for pp	2024-05-07 10:19:50 +08:00
Xiangyu Tian	13a44cdacb	LLM: Refine Deepspped-AutoTP-FastAPI example (#10916 )	2024-05-07 09:37:31 +08:00
Wang, Jian4	1de878bee1	LLM: Fix speculative llama3 long input error (#10934 )	2024-05-07 09:25:20 +08:00
Guancheng Fu	49ab5a2b0e	Add embeddings (#10931 )	2024-05-07 09:07:02 +08:00
Wang, Jian4	0e0bd309e2	LLM: Enable Speculative on Fastchat (#10909 ) * init * enable streamer * update * update * remove deprecated * update * update * add gpu example	2024-05-06 10:06:20 +08:00
Cengguang Zhang	0edef1f94c	LLM: add min_new_tokens to all in one benchmark. (#10911 )	2024-05-06 09:32:59 +08:00
Cengguang Zhang	75dbf240ec	LLM: update split tensor conditions. (#10872 ) * LLM: update split tensor condition. * add cond for split tensor. * update priority of env. * fix style. * update env name.	2024-04-30 17:07:21 +08:00
Guancheng Fu	2c64754eb0	Add vLLM to ipex-llm serving image (#10807 ) * add vllm * done * doc work * fix done * temp * add docs * format * add start-fastchat-service.sh * fix	2024-04-29 17:25:42 +08:00
Jin Qiao	1f876fd837	Add example for phi-3 (#10881 ) * Add example for phi-3 * add in readme and index * fix * fix * fix * fix indent * fix	2024-04-29 16:43:55 +08:00
Yishuo Wang	d884c62dc4	remove new_layout parameter (#10906 )	2024-04-29 10:31:50 +08:00
Guancheng Fu	fbcd7bc737	Fix Loader issue with dtype fp16 (#10907 )	2024-04-29 10:16:02 +08:00
Guancheng Fu	c9fac8c26b	Fix sdp logic (#10896 ) * fix * fix	2024-04-28 22:02:14 +08:00
Yina Chen	015d07a58f	Fix lookahead sample error & add update strategy (#10894 ) * Fix sample error & add update strategy * add mtl config * fix style * remove print	2024-04-28 17:21:00 +08:00
Yuwen Hu	1a8a93d5e0	Further fix nightly perf (#10901 )	2024-04-28 10:18:58 +08:00
Yuwen Hu	ddfdaec137	Fix nightly perf (#10899 ) * Fix nightly perf by adding default value in benchmark for use_fp16_torch_dtype * further fixes	2024-04-28 09:39:29 +08:00
Cengguang Zhang	9752ffe979	LLM: update split qkv native sdp. (#10895 ) * LLM: update split qkv native sdp. * fix typo.	2024-04-26 18:47:35 +08:00
Guancheng Fu	990535b1cf	Add tensor parallel for vLLM (#10879 ) * initial * test initial tp * initial sup * fix format * fix * fix	2024-04-26 17:10:49 +08:00
binbin Deng	f51bf018eb	Add benchmark script for pipeline parallel inference (#10873 )	2024-04-26 15:28:11 +08:00
Yishuo Wang	46ba962168	use new quantize kv (#10888 )	2024-04-26 14:42:17 +08:00
Xiangyu Tian	3d4950b0f0	LLM: Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example (#10876 ) Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example.	2024-04-26 13:24:28 +08:00
Wang, Jian4	3e8ed54270	LLM: Fix bigdl_ipex_int8 warning (#10890 )	2024-04-26 11:18:44 +08:00
Jin Qiao	fb3c268d13	Add phi-3 to perf (#10883 )	2024-04-25 20:21:56 +08:00
Yina Chen	8811f268ff	Use new fp16 sdp in Qwen and modify the constraint (#10882 )	2024-04-25 19:23:37 +08:00
Yuxuan Xia	0213c1c1da	Add phi3 to the nightly test (#10885 ) * Add llama3 and phi2 nightly test * Change llama3-8b to llama3-8b-instruct * Add phi3 to nightly test * Add phi3 to nightly test --------- Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>	2024-04-25 17:39:12 +08:00
Yuxuan Xia	ca2479be87	Update scripts readme (#10725 ) * Update scripts readme * Update scripts readme * Update README * Update readme * Update readme * Update windows env check readme * Adjust env check readme * Update windows env check * Update env check readme * Adjust the env-check README * Modify the env-check README	2024-04-25 17:24:37 +08:00
Cengguang Zhang	cd369c2715	LLM: add device id to benchmark utils. (#10877 )	2024-04-25 14:01:51 +08:00
Yang Wang	1ce8d7bcd9	Support the `desc_act` feature in GPTQ model (#10851 ) * support act_order * update versions * fix style * fix bug * clean up	2024-04-24 10:17:13 -07:00
Yina Chen	dc27b3bc35	Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) (#10790 ) * update sdp condition * update * fix * update & test llama * mistral * fix style * update * fix style * remove pvc constrain * update ds on arc * fix style	2024-04-24 17:24:01 +08:00
Yuxuan Xia	844e18b1db	Add llama3 and phi2 nightly test (#10874 ) * Add llama3 and phi2 nightly test * Change llama3-8b to llama3-8b-instruct --------- Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>	2024-04-24 16:58:56 +08:00
binbin Deng	c9feffff9a	LLM: support Qwen1.5-MoE-A2.7B-Chat pipeline parallel inference (#10864 )	2024-04-24 16:02:27 +08:00
Yishuo Wang	2d210817ff	add phi3 optimization (#10871 )	2024-04-24 15:17:40 +08:00
Cengguang Zhang	eb39c61607	LLM: add min new token to perf test. (#10869 )	2024-04-24 14:32:02 +08:00
Yuwen Hu	fb2a160af3	Add phi-2 to 2048-256 test for fixes (#10867 )	2024-04-24 10:00:25 +08:00
binbin Deng	fabf54e052	LLM: make pipeline parallel inference example more common (#10786 )	2024-04-24 09:28:52 +08:00
hxsz1997	328b1a1de9	Fix the not stop issue of llama3 examples (#10860 ) * fix not stop issue in GPU/HF-Transformers-AutoModels * fix not stop issue in GPU/PyTorch-Models/Model/llama3 * fix not stop issue in CPU/HF-Transformers-AutoModels/Model/llama3 * fix not stop issue in CPU/PyTorch-Models/Model/llama3 * update the output in readme * update format * add reference * update prompt format * update output format in readme * update example output in readme	2024-04-23 19:10:09 +08:00
Yuwen Hu	5c9eb5d0f5	Support llama-index install option for upstreaming purposes (#10866 ) * Support llama-index install option for upstreaming purposes * Small fix * Small fix	2024-04-23 19:08:29 +08:00
Yuwen Hu	21bb8bd164	Add phi-2 to igpu performance test (#10865 )	2024-04-23 18:13:14 +08:00
ZehuaCao	36eb8b2e96	Add llama3 speculative example (#10856 ) * Initial llama3 speculative example * update README * update README * update README	2024-04-23 17:03:54 +08:00
Cengguang Zhang	763413b7e1	LLM: support llama split tensor for long context in transformers>=4.36. (#10844 ) * LLm: support llama split tensor for long context in transformers>=4.36. * fix dtype. * fix style. * fix style. * fix style. * fix style. * fix dtype. * fix style.	2024-04-23 16:13:25 +08:00
ZehuaCao	92ea54b512	Fix speculative decoding bug (#10855 )	2024-04-23 14:28:31 +08:00
yb-peng	c9dee6cd0e	Update 8192.txt (#10824 ) * Update 8192.txt * Update 8192.txt with original text	2024-04-23 14:02:09 +08:00
Wang, Jian4	18c032652d	LLM: Add mixtral speculative CPU example (#10830 ) * init mixtral sp example * use different prompt_format * update output * update	2024-04-23 10:05:51 +08:00
Qiyuan Gong	5494aa55f6	Downgrade datasets in axolotl example (#10849 ) * Downgrade datasets to 2.15.0 to address axolotl prepare issue https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544 Tks to @kwaa for providing the solution in https://github.com/intel-analytics/ipex-llm/issues/10821#issuecomment-2068861571	2024-04-23 09:41:58 +08:00
Yishuo Wang	fe5a082b84	add phi-2 optimization (#10843 )	2024-04-22 18:56:47 +08:00
Guancheng Fu	47bd5f504c	[vLLM]Remove vllm-v1, refactor v2 (#10842 ) * remove vllm-v1 * fix format	2024-04-22 17:51:32 +08:00
Wang, Jian4	23c6a52fb0	LLM: Fix ipex torchscript=True error (#10832 ) * remove * update * remove torchscript	2024-04-22 15:53:09 +08:00
Heyang Sun	fc33aa3721	fix missing import (#10839 )	2024-04-22 14:34:52 +08:00
Yina Chen	3daad242b8	Fix No module named 'transformers.cache_utils' with transformers < 4.36 (#10835 ) * update sdp condition * update * fix * fix 431 error * revert sdp & style fix * fix * meet comments	2024-04-22 14:05:50 +08:00
Guancheng Fu	ae3b577537	Update README.md (#10833 )	2024-04-22 11:07:10 +08:00
Wang, Jian4	5f95054f97	LLM：Add qwen moe example libs md (#10828 )	2024-04-22 10:03:19 +08:00
Guancheng Fu	61c67af386	Fix vLLM-v2 install instructions(#10822 )	2024-04-22 09:02:48 +08:00
Guancheng Fu	caf75beef8	Disable sdpa (#10814 )	2024-04-19 17:33:18 +08:00
Yishuo Wang	57edf2033c	fix lookahead with transformers >= 4.36 (#10808 )	2024-04-19 16:24:56 +08:00
Ovo233	1a885020ee	Updated importing of top_k_top_p_filtering for transformers>=4.39.0 (#10794 ) * In transformers>=4.39.0, the top_k_top_p_filtering function has been deprecated and moved to the hugging face package trl. Thus, for versions >= 4.39.0, import this function from trl.	2024-04-19 15:34:39 +08:00
Yuwen Hu	07e8b045a9	Add Meta-llama-3-8B-Instruct and Yi-6B-Chat to igpu nightly perf (#10810 )	2024-04-19 15:09:58 +08:00
Yishuo Wang	08458b4f74	remove rms norm copy (#10793 )	2024-04-19 13:57:48 +08:00
Yang Wang	8153c3008e	Initial llama3 example (#10799 ) * Add initial hf huggingface GPU example * Small fix * Add llama3 gpu pytorch model example * Add llama 3 hf transformers CPU example * Add llama 3 pytorch model CPU example * Fixes * Small fix * Small fixes * Small fix * Small fix * Add links * update repo id * change prompt tuning url * remove system header if there is no system prompt --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com>	2024-04-18 11:01:33 -07:00
Ruonan Wang	754b0ffecf	Fix pvc llama (#10798 ) * ifx * update	2024-04-18 10:44:57 -07:00
Ruonan Wang	439c834ed3	LLM: add mixed precision for lm_head (#10795 ) * add mixed_quantization * meet code review * update * fix style * meet review	2024-04-18 19:11:31 +08:00
Yina Chen	8796401b08	Support q4k in ipex-llm (#10796 ) * support q4k * update	2024-04-18 18:55:28 +08:00
Ruonan Wang	0e8aac19e3	add q6k precision in ipex-llm (#10792 ) * add q6k * add initial 16k * update * fix style	2024-04-18 16:52:09 +08:00
Qiyuan Gong	e90e31719f	axolotl lora example (#10789 ) * Add axolotl lora example * Modify readme * Add comments in yml	2024-04-18 16:38:32 +08:00
Wang, Jian4	14ca42a048	LLM：Fix moe indexs error on cpu (#10791 )	2024-04-18 15:56:52 +08:00
Guancheng Fu	cbe7b5753f	Add vLLM[xpu] related code (#10779 ) * Add ipex-llm side change * add runable offline_inference * refactor to call vllm2 * Verified async server * add new v2 example * add README * fix * change dir * refactor readme.md * add experimental * fix	2024-04-18 15:29:20 +08:00
Kai Huang	053ec30737	Transformers ppl evaluation on wikitext (#10784 ) * tranformers code * cache	2024-04-18 15:27:18 +08:00
Wang, Jian4	209c3501e6	LLM: Optimize qwen1.5 moe model (#10706 ) * update moe block * fix style * enable optmize MLP * enabel kv_cache * enable fuse rope * enable fused qkv * enable flash_attention * error sdp quantize * use old api * use fuse * use xetla * fix python style * update moe_blocks num * fix output error * add cpu sdpa * update * update * update	2024-04-18 14:54:05 +08:00
Ziteng Zhang	ff040c8f01	LISA Finetuning Example (#10743 ) * enabling xetla only supports qtype=SYM_INT4 or FP8E5 * LISA Finetuning Example on gpu * update readme * add licence * Explain parameters of lisa & Move backend codes to src dir * fix style * fix style * update readme * support chatglm * fix style * fix style * update readme * fix	2024-04-18 13:48:10 +08:00
Heyang Sun	581ebf6104	GaLore Finetuning Example (#10722 ) * GaLore Finetuning Example * Update README.md * Update README.md * change data to HuggingFaceH4/helpful_instructions * Update README.md * Update README.md * shrink train size and delete cache before starting training to save memory * Update README.md * Update galore_finetuning.py * change model to llama2 3b * Update README.md	2024-04-18 13:47:41 +08:00
Yang Wang	952e517db9	use config rope_theta (#10787 ) * use config rope_theta * fix style	2024-04-17 20:39:11 -07:00
Guancheng Fu	31ea2f9a9f	Fix wrong output for Llama models on CPU (#10742 )	2024-04-18 11:07:27 +08:00
Xin Qiu	e764f9b1b1	Disable fast fused rope on UHD (#10780 ) * use decoding fast path * update * update * cleanup	2024-04-18 10:03:53 +08:00
Yina Chen	ea5b373a97	Add lookahead GPU example (#10785 ) * Add lookahead example * fix style & attn mask * fix typo * address comments	2024-04-17 17:41:55 +08:00
Wang, Jian4	a20271ffe4	LLM: Fix yi-6b fp16 error on pvc (#10781 ) * updat for yi fp16 * update * update	2024-04-17 16:49:59 +08:00
ZehuaCao	0646e2c062	Fix short prompt for IPEX_CPU speculative decoding cause no_attr error (#10783 )	2024-04-17 16:19:57 +08:00
Cengguang Zhang	7ec82c6042	LLM: add README.md for Long-Context examples. (#10765 ) * LLM: add readme to long-context examples. * add precision. * update wording. * add GPU type. * add Long-Context example to GPU examples. * fix comments. * update max input length. * update max length. * add output length. * fix wording.	2024-04-17 15:34:59 +08:00
Yina Chen	766fe45222	Fix spec error caused by lookup pr (#10777 ) * Fix spec error * remove * fix style	2024-04-17 11:27:35 +08:00
Qiyuan Gong	9e5069437f	Fix gradio version in axolotl example (#10776 ) * Change to gradio>=4.19.2	2024-04-17 10:23:43 +08:00
Qiyuan Gong	f2e923b3ca	Axolotl v0.4.0 support (#10773 ) * Add Axolotl 0.4.0, remove legacy 0.3.0 support. * replace is_torch_bf16_gpu_available * Add HF_HUB_OFFLINE=1 * Move transformers out of requirement * Refine readme and qlora.yml	2024-04-17 09:49:11 +08:00
Heyang Sun	26cae0a39c	Update FLEX in Deepspeed README (#10774 ) * Update FLEX in Deepspeed README * Update README.md	2024-04-17 09:28:24 +08:00
Wenjing Margaret Mao	c41730e024	edit 'ppl_result does not exist' issue, delete useless code (#10767 ) * edit ppl_result not exist issue, delete useless code * delete nonzero_min function --------- Co-authored-by: jenniew <jenniewang123@gmail.com>	2024-04-16 18:11:56 +08:00
Yina Chen	899d392e2f	Support prompt lookup in ipex-llm (#10768 ) * lookup init * add lookup * fix style * remove redundant code * change param name * fix style	2024-04-16 16:52:38 +08:00
Qiyuan Gong	d30b22a81b	Refine axolotl 0.3.0 documents and links (#10764 ) * Refine axolotl 0.3 based on comments * Rename requirements to requirement-xpu * Add comments for paged_adamw_32bit * change lora_r from 8 to 16	2024-04-16 14:47:45 +08:00
ZehuaCao	599a88db53	Add deepsped-autoTP-Fastapi serving (#10748 ) * add deepsped-autoTP-Fastapi serving * add readme * add license * update * update * fix	2024-04-16 14:03:23 +08:00
binbin Deng	0a62933d36	LLM: fix qwen AutoTP (#10766 )	2024-04-16 09:56:17 +08:00
Cengguang Zhang	3e2662c87e	LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771 )	2024-04-16 09:32:30 +08:00
Jin Qiao	73a67804a4	GPU configuration update for examples (windows pip installer, etc.) (#10762 ) * renew chatglm3-6b gpu example readme fix fix fix * fix for comments * fix * fix * fix * fix * fix * apply on HF-Transformers-AutoModels * apply on PyTorch-Models * fix * fix	2024-04-15 17:42:52 +08:00
yb-peng	b5209d3ec1	Update example/GPU/PyTorch-Models/Model/llava/README.md (#10757 ) * Update example/GPU/PyTorch-Models/Model/llava/README.md * Update README.md fix path in windows installation	2024-04-15 13:01:37 +08:00
binbin Deng	3d561b60ac	LLM: add `enable_xetla` parameter for `optimize_model` API (#10753 )	2024-04-15 12:18:25 +08:00
Jiao Wang	a9a6b6b7af	Fix baichuan-13b issue on portable zip under transformers 4.36 (#10746 ) * fix baichuan-13b issue * update * update	2024-04-12 16:27:01 -07:00
Jiao Wang	9e668a5bf0	fix_internlm-chat-7b-8k repo name in examples (#10747 )	2024-04-12 10:15:48 -07:00
binbin Deng	c3fc8f4b90	LLM: add bs limitation for llama softmax upcast to fp32 (#10752 )	2024-04-12 15:40:25 +08:00
hxsz1997	0d518aab8d	Merge pull request #10697 from MargarettMao/ceval combine english and chinese, remove nan	2024-04-12 14:37:47 +08:00
jenniew	dd0d2df5af	Change fp16.csv mistral-7b-v0.1 into Mistral-7B-v0.1	2024-04-12 14:28:46 +08:00
jenniew	7309f1ddf9	Mofidy Typos	2024-04-12 14:23:13 +08:00
jenniew	cb594e1fc5	Mofidy Typos	2024-04-12 14:22:09 +08:00
jenniew	382c18e600	Mofidy Typos	2024-04-12 14:15:48 +08:00
jenniew	1a360823ce	Mofidy Typos	2024-04-12 14:13:21 +08:00
jenniew	cdbb1de972	Mark Color Modification	2024-04-12 14:00:50 +08:00
jenniew	9bbfcaf736	Mark Color Modification	2024-04-12 13:30:16 +08:00
jenniew	bb34c6e325	Mark Color Modification	2024-04-12 13:26:36 +08:00
Yishuo Wang	8086554d33	use new fp16 sdp in llama and mistral (#10734 )	2024-04-12 10:49:02 +08:00
Yang Wang	019293e1b9	Fuse MOE indexes computation (#10716 ) * try moe * use c++ cpu to compute indexes * fix style	2024-04-11 10:12:55 -07:00
jenniew	b151a9b672	edit csv_to_html to combine en & zh	2024-04-11 17:35:36 +08:00
binbin Deng	70ed9397f9	LLM: fix AttributeError of FP16Linear (#10740 )	2024-04-11 17:03:56 +08:00
Keyan (Kyrie) Zhang	1256a2cc4e	Add chatglm3 long input example (#10739 ) * Add long context input example for chatglm3 * Small fix * Small fix * Small fix	2024-04-11 16:33:43 +08:00
hxsz1997	fd473ddb1b	Merge pull request #10730 from MargarettMao/MargarettMao-parent_folder Edit ppl update_HTML_parent_folder	2024-04-11 15:45:24 +08:00
Qiyuan Gong	2d64630757	Remove transformers version in axolotl example (#10736 ) * Remove transformers version in axolotl requirements.txt	2024-04-11 14:02:31 +08:00
yb-peng	2685c41318	Modify all-in-one benchmark (#10726 ) * Update 8192 prompt in all-in-one * Add cpu_embedding param for linux api * Update run.py * Update README.md	2024-04-11 13:38:50 +08:00
Xiangyu Tian	301504aa8d	Fix transformers version warning (#10732 )	2024-04-11 13:12:49 +08:00
Wenjing Margaret Mao	9bec233e4d	Delete python/llm/test/benchmark/perplexity/update_html_in_parent_folder.py Delete due to repetition	2024-04-11 07:21:12 +08:00
Cengguang Zhang	4b024b7aac	LLM: optimize chatglm2 8k input. (#10723 ) * LLM: optimize chatglm2 8k input. * rename.	2024-04-10 16:59:06 +08:00
Yuxuan Xia	cd22cb8257	Update Env check Script (#10709 ) * Update env check bash file * Update env-check	2024-04-10 15:06:00 +08:00
Shaojun Liu	29bf28bd6f	Upgrade python to 3.11 in Docker Image (#10718 ) * install python 3.11 for cpu-inference docker image * update xpu-inference dockerfile * update cpu-serving image * update qlora image * update lora image * update document	2024-04-10 14:41:27 +08:00
Qiyuan Gong	b727767f00	Add axolotl v0.3.0 with ipex-llm on Intel GPU (#10717 ) * Add axolotl v0.3.0 support on Intel GPU. * Add finetune example on llama-2-7B with Alpaca dataset.	2024-04-10 14:38:29 +08:00
Wang, Jian4	c9e6d42ad1	LLM: Fix chatglm3-6b-32k error (#10719 ) * fix chatglm3-6b-32k * update style	2024-04-10 11:24:06 +08:00
Keyan (Kyrie) Zhang	585c174e92	Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707 ) * Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables. * Fix style	2024-04-10 10:48:46 +08:00
Jiao Wang	d1eaea509f	update chatglm readme (#10659 )	2024-04-09 14:24:46 -07:00
Jiao Wang	878a97077b	Fix llava example to support transformerds 4.36 (#10614 ) * fix llava example * update	2024-04-09 13:47:07 -07:00
Jiao Wang	1e817926ba	Fix low memory generation example issue in transformers 4.36 (#10702 ) * update cache in low memory generate * update	2024-04-09 09:56:52 -07:00
Yuwen Hu	97db2492c8	Update setup.py for `bigdl-core-xe-esimd-21` on Windows (#10705 ) * Support bigdl-core-xe-esimd-21 for windows in setup.py * Update setup-llm-env accordingly	2024-04-09 18:21:21 +08:00
Zhicun	b4147a97bb	Fix dtype mismatch error (#10609 ) * fix llama * fix * fix code style * add torch type in model.py --------- Co-authored-by: arda <arda@arda-arc19.sh.intel.com>	2024-04-09 17:50:33 +08:00
Shaojun Liu	f37a1f2a81	Upgrade to python 3.11 (#10711 ) * create conda env with python 3.11 * recommend to use Python 3.11 * update	2024-04-09 17:41:17 +08:00
Yishuo Wang	8f45e22072	fix llama2 (#10710 )	2024-04-09 17:28:37 +08:00
Yishuo Wang	e438f941f2	disable rwkv5 fp16 (#10699 )	2024-04-09 16:42:11 +08:00
Cengguang Zhang	6a32216269	LLM: add llama2 8k input example. (#10696 ) * LLM: add llama2-32K example. * refactor name. * fix comments. * add IPEX_LLM_LOW_MEM notes and update sample output.	2024-04-09 16:02:37 +08:00
Wenjing Margaret Mao	289cc99cd6	Update README.md (#10700 ) Edit "summarize the results"	2024-04-09 16:01:12 +08:00
Wenjing Margaret Mao	d3116de0db	Update README.md (#10701 ) edit "summarize the results"	2024-04-09 15:50:25 +08:00
Chen, Zhentao	d59e0cce5c	Migrate harness to ipexllm (#10703 ) * migrate to ipexlm * fix workflow * fix run_multi * fix precision map * rename ipexlm to ipexllm * rename bigdl to ipex in comments	2024-04-09 15:48:53 +08:00
Keyan (Kyrie) Zhang	1e27e08322	Modify example from fp32 to fp16 (#10528 ) * Modify example from fp32 to fp16 * Remove Falcon from fp16 example for now * Remove MPT from fp16 example	2024-04-09 15:45:49 +08:00
binbin Deng	44922bb5c2	LLM: support baichuan2-13b using AutoTP (#10691 )	2024-04-09 14:06:01 +08:00
Yina Chen	c7422712fc	mistral 4.36 use fp16 sdp (#10704 )	2024-04-09 13:50:33 +08:00
Ovo233	dcb2038aad	Enable optimization for sentence_transformers (#10679 ) * enable optimization for sentence_transformers * fix python style check failure	2024-04-09 12:33:46 +08:00
Yang Wang	5a1f446d3c	support fp8 in xetla (#10555 ) * support fp8 in xetla * change name * adjust model file * support convert back to cpu * factor * fix bug * fix style	2024-04-08 13:22:09 -07:00
jenniew	591bae092c	combine english and chinese, remove nan	2024-04-08 19:37:51 +08:00
Cengguang Zhang	7c43ac0164	LLM: optimize llama natvie sdp for split qkv tensor (#10693 ) * LLM: optimize llama natvie sdp for split qkv tensor. * fix block real size. * fix comment. * fix style. * refactor.	2024-04-08 17:48:11 +08:00
Xin Qiu	1274cba79b	stablelm fp8 kv cache (#10672 ) * stablelm fp8 kvcache * update * fix * change to fp8 matmul * fix style * fix * fix * meet code review * add comment	2024-04-08 15:16:46 +08:00
Yishuo Wang	65127622aa	fix UT threshold (#10689 )	2024-04-08 14:58:20 +08:00
Cengguang Zhang	c0cd238e40	LLM: support llama2 8k input with w4a16. (#10677 ) * LLM: support llama2 8k input with w4a16. * fix comment and style. * fix style. * fix comments and split tensor to quantized attention forward. * fix style. * refactor name. * fix style. * fix style. * fix style. * refactor checker name. * refactor native sdp split qkv tensor name. * fix style. * fix comment rename variables. * fix co-exist of intermedia results.	2024-04-08 11:43:15 +08:00
Zhicun	321bc69307	Fix llamaindex ut (#10673 ) * fix llamaindex ut * add GPU ut	2024-04-08 09:47:51 +08:00
yb-peng	2d88bb9b4b	add test api transformer_int4_fp16_gpu (#10627 ) * add test api transformer_int4_fp16_gpu * update config.yaml and README.md in all-in-one * modify run.py in all-in-one * re-order test-api * re-order test-api in config * modify README.md in all-in-one * modify README.md in all-in-one * modify config.yaml --------- Co-authored-by: pengyb2001 <arda@arda-arc21.sh.intel.com> Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>	2024-04-07 15:47:17 +08:00
Wang, Jian4	47cabe8fcc	LLM: Fix no return_last_logit running bigdl_ipex chatglm3 (#10678 ) * fix no return_last_logits * update only for chatglm	2024-04-07 15:27:58 +08:00
Wang, Jian4	9ad4b29697	LLM: CPU benchmark using tcmalloc (#10675 )	2024-04-07 14:17:01 +08:00
binbin Deng	d9a1153b4e	LLM: upgrade deepspeed in AutoTP on GPU (#10647 )	2024-04-07 14:05:19 +08:00
Jin Qiao	56dfcb2ade	Migrate portable zip to ipex-llm (#10617 ) * change portable zip prompt to ipex-llm * fix chat with ui * add no proxy	2024-04-07 13:58:58 +08:00
Zhicun	9d8ba64c0d	Llamaindex: add tokenizer_id and support chat (#10590 ) * add tokenizer_id * fix * modify * add from_model_id and from_mode_id_low_bit * fix typo and add comment * fix python code style --------- Co-authored-by: pengyb2001 <284261055@qq.com>	2024-04-07 13:51:34 +08:00
Jin Qiao	10ee786920	Replace with IPEX-LLM in example comments (#10671 ) * Replace with IPEX-LLM in example comments * More replacement * revert some changes	2024-04-07 13:29:51 +08:00
Xiangyu Tian	08018a18df	Remove not-imported MistralConfig (#10670 )	2024-04-07 10:32:05 +08:00
Cengguang Zhang	1a9b8204a4	LLM: support int4 fp16 chatglm2-6b 8k input. (#10648 )	2024-04-07 09:39:21 +08:00
Jiao Wang	69bdbf5806	Fix vllm print error message issue (#10664 ) * update chatglm readme * Add condition to invalidInputError * update * update * style	2024-04-05 15:08:13 -07:00
Jason Dai	29d97e4678	Update readme (#10665 )	2024-04-05 18:01:57 +08:00
Xin Qiu	4c3e493b2d	fix stablelm2 1.6b (#10656 ) * fix stablelm2 1.6b * meet code review	2024-04-03 22:15:32 +08:00
Jin Qiao	cc8b3be11c	Add GPU and CPU example for stablelm-zephyr-3b (#10643 ) * Add example for StableLM * fix * add to readme	2024-04-03 16:28:31 +08:00
Heyang Sun	6000241b10	Add Deepspeed Example of FLEX Mistral (#10640 )	2024-04-03 16:04:17 +08:00
Shaojun Liu	d18dbfb097	update spr perf test (#10644 )	2024-04-03 15:53:55 +08:00
Yishuo Wang	702e686901	optimize starcoder normal kv cache (#10642 )	2024-04-03 15:27:02 +08:00
Xin Qiu	3a9ab8f1ae	fix stablelm logits diff (#10636 ) * fix logits diff * Small fixes --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-04-03 15:08:12 +08:00
Zhicun	b827f534d5	Add tokenizer_id in Langchain (#10588 ) * fix low-bit * fix * fix style --------- Co-authored-by: arda <arda@arda-arc12.sh.intel.com>	2024-04-03 14:25:35 +08:00
Zhicun	f6fef09933	fix prompt format for llama-2 in langchain (#10637 )	2024-04-03 14:17:34 +08:00
Jiao Wang	330d4b4f4b	update readme (#10631 )	2024-04-02 23:08:02 -07:00
Kai Huang	c875b3c858	Add seq len check for llama softmax upcast to fp32 (#10629 )	2024-04-03 12:05:13 +08:00
Jiao Wang	4431134ec5	update readme (#10632 )	2024-04-02 19:54:30 -07:00
Jiao Wang	23e33a0ca1	Fix qwen-vl style (#10633 ) * update * update	2024-04-02 18:41:38 -07:00
binbin Deng	2bbd8a1548	LLM: fix llama2 FP16 & bs>1 & autotp on PVC and ARC (#10611 )	2024-04-03 09:28:04 +08:00
Jiao Wang	654dc5ba57	Fix Qwen-VL example problem (#10582 ) * update * update * update * update	2024-04-02 12:17:30 -07:00
Yuwen Hu	fd384ddfb8	Optimize StableLM (#10619 ) * Initial commit for stablelm optimizations * Small style fix * add dependency * Add mlp optimizations * Small fix * add attention forward * Remove quantize kv for now as head_dim=80 * Add merged qkv * fix lisence * Python style fix --------- Co-authored-by: qiuxin2012 <qiuxin2012cs@gmail.com>	2024-04-02 18:58:38 +08:00
binbin Deng	27be448920	LLM: add `cpu_embedding` and peak memory record for deepspeed autotp script (#10621 )	2024-04-02 17:32:50 +08:00
Yishuo Wang	ba8cc6bd68	optimize starcoder2-3b (#10625 )	2024-04-02 17:16:29 +08:00
Shaojun Liu	a10f5a1b8d	add python style check (#10620 ) * add python style check * fix style checks * update runner * add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow * update tag to 2.1.0-SNAPSHOT	2024-04-02 16:17:56 +08:00
Cengguang Zhang	58b57177e3	LLM: support bigdl quantize kv cache env and add warning. (#10623 ) * LLM: support bigdl quantize kv cache env and add warnning. * fix style. * fix comments.	2024-04-02 15:41:08 +08:00
Kai Huang	0a95c556a1	Fix starcoder first token perf (#10612 ) * add bias check * update	2024-04-02 09:21:38 +08:00
Cengguang Zhang	e567956121	LLM: add memory optimization for llama. (#10592 ) * add initial memory optimization. * fix logic. * fix logic, * remove env var check in mlp split.	2024-04-02 09:07:50 +08:00
Keyan (Kyrie) Zhang	01f491757a	Modify the link in Langchain-upstream ut (#10608 ) * Modify the link in Langchain-upstream ut * fix langchain-upstream ut	2024-04-01 17:03:40 +08:00
Ruonan Wang	bfc1caa5e5	LLM: support iq1s for llama2-70b-hf (#10596 )	2024-04-01 13:13:13 +08:00
Ruonan Wang	d6af4877dd	LLM: remove ipex.optimize for gpt-j (#10606 ) * remove ipex.optimize * fix * fix	2024-04-01 12:21:49 +08:00
Yishuo Wang	437a349dd6	fix rwkv with pip installer (#10591 )	2024-03-29 17:56:45 +08:00
WeiguangHan	9a83f21b86	LLM: check user env (#10580 ) * LLM: check user env * small fix * small fix * small fix	2024-03-29 17:19:34 +08:00
Keyan (Kyrie) Zhang	848fa04dd6	Fix typo in Baichuan2 example (#10589 )	2024-03-29 13:31:47 +08:00
Ruonan Wang	0136fad1d4	LLM: support iq1_s (#10564 ) * init version * update utils * remove unsed code	2024-03-29 09:43:55 +08:00
Qiyuan Gong	f4537798c1	Enable kv cache quantization by default for flex when 1 < batch <= 8 (#10584 ) * Enable kv cache quantization by default for flex when 1 < batch <= 8. * Change up bound from <8 to <=8.	2024-03-29 09:43:42 +08:00
Cengguang Zhang	b44f7adbad	LLM: Disable esimd sdp for PVC GPU when batch size>1 (#10579 ) * llm: disable esimd sdp for pvc bz>1. * fix logic. * fix: avoid call get device name twice.	2024-03-28 22:55:48 +08:00
Xin Qiu	5963239b46	Fix qwen's position_ids no enough (#10572 ) * fix position_ids * fix position_ids	2024-03-28 17:05:49 +08:00
ZehuaCao	52a2135d83	Replace ipex with ipex-llm (#10554 ) * fix ipex with ipex_llm * fix ipex with ipex_llm * update * update * update * update * update * update * update * update	2024-03-28 13:54:40 +08:00
Cheen Hau, 俊豪	1c5eb14128	Update pip install to use --extra-index-url for ipex package (#10557 ) * Change to 'pip install .. --extra-index-url' for readthedocs * Change to 'pip install .. --extra-index-url' for examples * Change to 'pip install .. --extra-index-url' for remaining files * Fix URL for ipex * Add links for ipex US and CN servers * Update ipex cpu url * remove readme * Update for github actions * Update for dockerfiles	2024-03-28 09:56:23 +08:00
binbin Deng	92dfed77be	LLM: fix abnormal output of fp16 deepspeed autotp (#10558 )	2024-03-28 09:35:48 +08:00
Jason Dai	c450c85489	Delete llm/readme.md (#10569 )	2024-03-27 20:06:40 +08:00
Xiangyu Tian	51d34ca68e	Fix wrong import in speculative (#10562 )	2024-03-27 18:21:07 +08:00
Cheen Hau, 俊豪	f239bc329b	Specify oneAPI minor version in documentation (#10561 )	2024-03-27 17:58:57 +08:00
WeiguangHan	fbeb10c796	LLM: Set different env based on different Linux kernels (#10566 )	2024-03-27 17:56:33 +08:00
hxsz1997	d86477f14d	Remove native_int4 in LangChain examples (#10510 ) * rebase the modify to ipex-llm * modify the typo	2024-03-27 17:48:16 +08:00
Guancheng Fu	04baac5a2e	Fix fastchat top_k (#10560 ) * fix -1 top_k * fix * done	2024-03-27 16:01:58 +08:00
binbin Deng	fc8c7904f0	LLM: fix torch_dtype setting of apply fp16 optimization through optimize_model (#10556 )	2024-03-27 14:18:45 +08:00
Ruonan Wang	ea4bc450c4	LLM: add esimd sdp for pvc (#10543 ) * add esimd sdp for pvc * update * fix * fix batch	2024-03-26 19:04:40 +08:00
Jin Qiao	b78289a595	Remove ipex-llm dependency in readme (#10544 )	2024-03-26 18:25:14 +08:00
Xiangyu Tian	11550d3f25	LLM: Add length check for IPEX-CPU speculative decoding (#10529 ) Add length check for IPEX-CPU speculative decoding.	2024-03-26 17:47:10 +08:00
Guancheng Fu	a3b007f3b1	[Serving] Fix fastchat breaks (#10548 ) * fix fastchat * fix doc	2024-03-26 17:03:52 +08:00
Yishuo Wang	69a28d6b4c	fix chatglm (#10540 )	2024-03-26 16:01:00 +08:00
Shaojun Liu	c563b41491	add nightly_build workflow (#10533 ) * add nightly_build workflow * add create-job-status-badge action * update * update * update * update setup.py * release * revert	2024-03-26 12:47:38 +08:00
binbin Deng	0a3e4e788f	LLM: fix mistral hidden_size setting for deepspeed autotp (#10527 )	2024-03-26 10:55:44 +08:00
Xin Qiu	1dd40b429c	enable fp4 fused mlp and qkv (#10531 ) * enable fp4 fused mlp and qkv * update qwen * update qwen2	2024-03-26 08:34:00 +08:00
Wang, Jian4	16b2ef49c6	Update_document by heyang (#30 )	2024-03-25 10:06:02 +08:00
Wang, Jian4	a1048ca7f6	Update setup.py and add new actions and add compatible mode (#25 ) * update setup.py * add new action * add compatible mode	2024-03-22 15:44:59 +08:00
Wang, Jian4	9df70d95eb	Refactor bigdl.llm to ipex_llm (#24 ) * Rename bigdl/llm to ipex_llm * rm python/llm/src/bigdl * from bigdl.llm to from ipex_llm	2024-03-22 15:41:21 +08:00
Jin Qiao	cc5806f4bc	LLM: add save/load example for hf-transformers (#10432 )	2024-03-22 13:57:47 +08:00
Wang, Jian4	34d0a9328c	LLM: Speed-up mixtral in pipeline parallel inference (#10472 ) * speed-up mixtral * fix style	2024-03-22 11:06:28 +08:00
Cengguang Zhang	b9d4280892	LLM: fix baichuan7b quantize kv abnormal output. (#10504 ) * fix abnormal output. * fix style. * fix style.	2024-03-22 10:00:08 +08:00
Yishuo Wang	f0f317b6cf	fix a typo in yuan (#10503 )	2024-03-22 09:40:04 +08:00
Guancheng Fu	3a3756b51d	Add FastChat bigdl_worker (#10493 ) * done * fix format * add licence * done * fix doc * refactor folder * add license	2024-03-21 18:35:05 +08:00
Xin Qiu	dba7ddaab3	add sdp fp8 for qwen llama436 baichuan mistral baichuan2 (#10485 ) * add sdp fp8 * fix style * fix qwen * fix baichuan 13 * revert baichuan 13b and baichuan2-13b * fix style * update	2024-03-21 17:23:05 +08:00
Kai Huang	30f111cd32	lm_head empty_cache for more models (#10490 ) * modify constraint * fix style	2024-03-21 17:11:43 +08:00
Yuwen Hu	1579ee4421	[LLM] Add nightly igpu perf test for INT4+FP16 1024-128 (#10496 )	2024-03-21 16:07:06 +08:00
binbin Deng	2958ca49c0	LLM: add patching function for llm finetuning (#10247 )	2024-03-21 16:01:01 +08:00
Zhicun	5b97fdb87b	update deepseek example readme (#10420 ) * update readme * update * update readme	2024-03-21 15:21:48 +08:00
hxsz1997	a5f35757a4	Migrate langchain rag cpu example to gpu (#10450 ) * add langchain rag on gpu * add rag example in readme * add trust_remote_code in TransformersEmbeddings.from_model_id * add trust_remote_code in TransformersEmbeddings.from_model_id in cpu	2024-03-21 15:20:46 +08:00
binbin Deng	85ef3f1d99	LLM: add empty cache in deepspeed autotp benchmark script (#10488 )	2024-03-21 10:51:23 +08:00
Xiangyu Tian	5a5fd5af5b	LLM: Add speculative benchmark on CPU/XPU (#10464 ) Add speculative benchmark on CPU/XPU.	2024-03-21 09:51:06 +08:00
Ruonan Wang	28c315a5b9	LLM: fix deepspeed error of finetuning on xpu (#10484 )	2024-03-21 09:46:25 +08:00
Kai Huang	021d77fd22	Remove softmax upcast fp32 in llama (#10481 ) * update * fix style	2024-03-20 18:17:34 +08:00
Yishuo Wang	cfdf8ad496	Fix `modules_not_to_convert` argument (#10483 )	2024-03-20 17:47:03 +08:00
Xiangyu Tian	cbe24cc7e6	LLM: Enable BigDL IPEX Int8 (#10480 ) Enable BigDL IPEX Int8	2024-03-20 15:59:54 +08:00
ZehuaCao	1d062e24db	Update serving doc (#10475 ) * update serving doc * add tob * update * update * update * update vllm worker	2024-03-20 14:44:43 +08:00
Cengguang Zhang	4581e4f17f	LLM: fix whiper model missing config. (#10473 ) * fix whiper model missing config. * fix style. * fix style. * style.	2024-03-20 14:22:37 +08:00
Jin Qiao	e41d556436	LLM: change fp16 benchmark to model.half (#10477 ) * LLM: change fp16 benchmark to model.half * fix	2024-03-20 13:38:39 +08:00
Yishuo Wang	749bedaf1e	fix rwkv v5 fp16 (#10474 )	2024-03-20 13:15:08 +08:00
Yuwen Hu	72bcc27da9	[LLM] Add `TransformersBgeEmbeddings` class in `bigdl.llm.langchain.embeddings` (#10459 ) * Add TransformersBgeEmbeddings class in bigdl.llm.langchain.embeddings * Small fixes	2024-03-19 18:04:35 +08:00
Cengguang Zhang	463a86cd5d	LLM: fix qwen-vl interpolation gpu abnormal results. (#10457 ) * fix qwen-vl interpolation gpu abnormal results. * fix style. * update qwen-vl gpu example. * fix comment and update example. * fix style.	2024-03-19 16:59:39 +08:00
Jin Qiao	e9055c32f9	LLM: fix fp16 mem record in benchmark (#10461 ) * LLM: fix fp16 mem record in benchmark * change style	2024-03-19 16:17:23 +08:00
Jiao Wang	f3fefdc9ce	fix pad_token_id issue (#10425 )	2024-03-18 23:30:28 -07:00
Yuxuan Xia	74e7490fda	Fix Baichuan2 prompt format (#10334 ) * Fix Baichuan2 prompt format * Fix Baichuan2 README * Change baichuan2 prompt info * Change baichuan2 prompt info	2024-03-19 12:48:07 +08:00
Jin Qiao	0451103a43	LLM: add int4+fp16 benchmark script for windows benchmarking (#10449 ) * LLM: add fp16 for benchmark script * remove transformer_int4_fp16_loadlowbit_gpu_win	2024-03-19 11:11:25 +08:00
Xin Qiu	bbd749dceb	qwen2 fp8 cache (#10446 ) * qwen2 fp8 cache * fix style check	2024-03-19 08:32:39 +08:00
Yang Wang	9e763b049c	Support running pipeline parallel inference by vertically partitioning model to different devices (#10392 ) * support pipeline parallel inference * fix logging * remove benchmark file * fic * need to warmup twice * support qwen and qwen2 * fix lint * remove genxir * refine	2024-03-18 13:04:45 -07:00
Ruonan Wang	66b4bb5c5d	LLM: update setup to provide cpp for windows (#10448 )	2024-03-18 18:20:55 +08:00
Xiangyu Tian	dbdeaddd6a	LLM: Fix log condition for BIGDL_OPT_IPEX (#10441 ) remove log for BIGDL_OPT_IPEX	2024-03-18 16:03:51 +08:00
Wang, Jian4	1de13ea578	LLM: remove CPU english_quotes dataset and update docker example (#10399 ) * update dataset * update readme * update docker cpu * update xpu docker	2024-03-18 10:45:14 +08:00
Xin Qiu	399843faf0	Baichuan 7b fp16 sdp and qwen2 pvc sdp (#10435 ) * add baichuan sdp * update * baichuan2 * fix * fix style * revert 13b * revert	2024-03-18 10:15:34 +08:00
Jiao Wang	5ab52ef5b5	update (#10424 )	2024-03-15 09:24:26 -07:00
Yishuo Wang	bd64488b2a	add mask support for llama/chatglm fp8 sdp (#10433 ) * add mask support for fp8 sdp * fix chatglm2 dtype * update	2024-03-15 17:36:52 +08:00
Keyan (Kyrie) Zhang	444b11af22	Add LangChain upstream ut test for ipynb (#10387 ) * Add LangChain upstream ut test for ipynb * Integrate unit test for LangChain upstream ut and ipynb into one file * Modify file name * Remove LangChain version update in unit test * Move Langchain upstream ut job to arc * Modify path in .yml file * Modify path in llm_unit_tests.yml * Avoid create directory repeatedly	2024-03-15 16:31:01 +08:00
Jin Qiao	ca372f6dab	LLM: add save/load example for ModelScope (#10397 ) * LLM: add sl example for modelscope * fix according to comments * move file	2024-03-15 15:17:50 +08:00
Xin Qiu	24473e331a	Qwen2 fp16 sdp (#10427 ) * qwen2 sdp and refine * update * update * fix style * remove use_flash_attention	2024-03-15 13:12:03 +08:00
Kai Huang	1315150e64	Add baichuan2-13b 1k to arc nightly perf (#10406 )	2024-03-15 10:29:11 +08:00
Ruonan Wang	b036205be2	LLM: add fp8 sdp for chatglm2/3 (#10411 ) * add fp8 sdp for chatglm2 * fix style	2024-03-15 09:38:18 +08:00
Wang, Jian4	fe8976a00f	LLM: Support gguf models use low_bit and fix no json(#10408 ) * support others model use low_bit * update readme * update to add *.json	2024-03-15 09:34:18 +08:00
Xin Qiu	cda38f85a9	Qwen fp16 sdp (#10401 ) * qwen sdp * fix * update * update * update sdp * update * fix style check * add to origin type	2024-03-15 08:51:50 +08:00
dingbaorong	1c0f7ed3fa	add xpu support (#10419 )	2024-03-14 17:13:48 +08:00
Heyang Sun	7d29765092	refactor qwen2 forward to enable XPU (#10409 ) * refactor awen2 forward to enable XPU * Update qwen2.py	2024-03-14 11:03:05 +08:00
Yuxuan Xia	f36224aac4	Fix ceval run.sh (#10410 )	2024-03-14 10:57:25 +08:00
ZehuaCao	f66329e35d	Fix multiple get_enable_ipex function error (#10400 ) * fix multiple get_enable_ipex function error * remove get_enable_ipex_low_bit function	2024-03-14 10:14:13 +08:00
Kai Huang	76e30d8ec8	Empty cache for lm_head (#10317 ) * empty cache * add comments	2024-03-13 20:31:53 +08:00
Ruonan Wang	2be8bbd236	LLM: add cpp option in setup.py (#10403 ) * add llama_cpp option * meet code review	2024-03-13 20:12:59 +08:00
Ovo233	0dbce53464	LLM: Add decoder/layernorm unit tests (#10211 ) * add decoder/layernorm unit tests * update tests * delete decoder tests * address comments * remove none type check * restore nonetype checks * delete nonetype checks; add decoder tests for Llama * add gc * deal with tuple output	2024-03-13 19:41:47 +08:00
Yishuo Wang	06a851afa9	support new baichuan model (#10404 )	2024-03-13 17:45:50 +08:00
Yuxuan Xia	a90e9b6ec2	Fix C-Eval Workflow (#10359 ) * Fix Baichuan2 prompt format * Fix ceval workflow errors * Fix ceval workflow error * Fix ceval error * Fix ceval error * Test ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Fix ceval * Add ceval dependency test * Fix ceval * Fix ceval * Test full ceval * Test full ceval * Fix ceval * Fix ceval	2024-03-13 17:23:17 +08:00
Yishuo Wang	b268baafd6	use fp8 sdp in llama (#10396 )	2024-03-13 16:45:38 +08:00
Xiangyu Tian	60043a3ae8	LLM: Support Baichuan2-13b in BigDL-vLLM (#10398 ) Support Baichuan2-13b in BigDL-vLLM.	2024-03-13 16:21:06 +08:00
Xiangyu Tian	e10de2c42d	[Fix] LLM: Fix condition check error for speculative decoding on CPU (#10402 ) Fix condition check error for speculative decoding on CPU	2024-03-13 16:05:06 +08:00
Keyan (Kyrie) Zhang	f158b49835	[LLM] Recover arc ut test for Falcon (#10385 )	2024-03-13 13:31:35 +08:00
Heyang Sun	d72c0fad0d	Qwen2 SDPA forward on CPU (#10395 ) * Fix Qwen1.5 CPU forward * Update convert.py * Update qwen2.py	2024-03-13 13:10:03 +08:00
Yishuo Wang	ca58a69b97	fix arc rms norm UT (#10394 )	2024-03-13 13:09:15 +08:00
Wang, Jian4	0193f29411	LLM : Enable gguf float16 and Yuan2 model (#10372 ) * enable float16 * add yun files * enable yun * enable set low_bit on yuan2 * update * update license * update generate * update readme * update python style * update	2024-03-13 10:19:18 +08:00
Yina Chen	f5d65203c0	First token lm_head optimization (#10318 ) * add lm head linear * update * address comments and fix style * address comment	2024-03-13 10:11:32 +08:00
Keyan (Kyrie) Zhang	7cf01e6ec8	Add LangChain upstream ut test (#10349 ) * Add LangChain upstream ut test * Add LangChain upstream ut test * Specify version numbers in yml script * Correct langchain-community version	2024-03-13 09:52:45 +08:00
Xin Qiu	28c4a8cf5c	Qwen fused qkv (#10368 ) * fused qkv + rope for qwen * quantized kv cache * fix * update qwen * fixed quantized qkv * fix * meet code review * update split * convert.py * extend when no enough kv * fix	2024-03-12 17:39:00 +08:00
Yishuo Wang	741c2bf1df	use new rms norm (#10384 )	2024-03-12 17:29:51 +08:00
Xiangyu Tian	0ded0b4b13	LLM: Enable BigDL IPEX optimization for int4 (#10319 ) Enable BigDL IPEX optimization for int4	2024-03-12 17:08:50 +08:00
binbin Deng	5d7e044dbc	LLM: add low bit option in deepspeed autotp example (#10382 )	2024-03-12 17:07:09 +08:00
binbin Deng	df3bcc0e65	LLM: remove english_quotes dataset (#10370 )	2024-03-12 16:57:40 +08:00
Zhao Changmin	df2b84f7de	Enable kv cache on arc batch (#10308 )	2024-03-12 16:46:04 +08:00
Lilac09	5809a3f5fe	Add run-hbm.sh & add user guide for spr and hbm (#10357 ) * add run-hbm.sh * add spr and hbm guide * only support quad mode * only support quad mode * update special cases * update special cases	2024-03-12 16:15:27 +08:00
binbin Deng	5d996a5caf	LLM: add benchmark script for deepspeed autotp on gpu (#10380 )	2024-03-12 15:19:57 +08:00
Keyan (Kyrie) Zhang	f9c144dc4c	Fix final logits ut failure (#10377 ) * Fix final logits ut failure * Fix final logits ut failure * Remove Falcon from completion test for now * Remove Falcon from unit test for now	2024-03-12 14:34:01 +08:00
Guancheng Fu	cc4148636d	[FastChat-integration] Add initial implementation for loader (#10323 ) * add initial implementation for loader * add test method for model_loader * data * Refine	2024-03-12 10:54:59 +08:00
WeiguangHan	17bdb1a60b	LLM: add whisper models into nightly test (#10193 ) * LLM: add whisper models into nightly test * small fix * small fix * add more whisper models * test all cases * test specific cases * collect the csv * store the resut * to html * small fix * small test * test all cases * modify whisper_csv_to_html	2024-03-11 20:00:47 +08:00
binbin Deng	dbcfc5c2fa	LLM: fix error of 'AI-ModelScope/phi-2' hosted by ModelScope hub (#10364 )	2024-03-11 16:19:17 +08:00

... 4 5 6 7 8 ...

1549 commits