ipex-llm

Author	SHA1	Message	Date
Shaojun Liu	694912698e	Upgrade scikit-learn to 1.5.0 to fix dependabot issue (#11349 )	2024-06-18 15:47:25 +08:00
Heyang Sun	00f322d8ee	Finetune ChatGLM with Deepspeed Zero3 LoRA (#11314 ) * Fintune ChatGLM with Deepspeed Zero3 LoRA * add deepspeed zero3 config * rename config * remove offload_param * add save_checkpoint parameter * Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh * refine	2024-06-18 12:31:26 +08:00
binbin Deng	e50c890e1f	Support finishing PP inference once `eos_token_id` is found (#11336 )	2024-06-18 09:55:40 +08:00
Qiyuan Gong	de4bb97b4f	Remove accelerate 0.23.0 install command in readme and docker (#11333 ) *ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。	2024-06-17 17:52:12 +08:00
SONG Ge	ef4b6519fb	Add phi-3 model support for pipeline parallel inference (#11334 ) * add phi-3 model support * add phi3 example	2024-06-17 17:44:24 +08:00
SONG Ge	be00380f1a	Fix pipeline parallel inference past_key_value error in Baichuan (#11318 ) * fix past_key_value error * add baichuan2 example * fix style * update doc * add script link in doc * fix import error * update	2024-06-17 09:29:32 +08:00
Xiangyu Tian	4359ab3172	LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example (#11187 ) Add /generate_stream and OpenAI-formatted endpoint for Pipeline-Parallel-FastAPI example	2024-06-14 15:15:32 +08:00
Jin Qiao	0e7a31a09c	ChatGLM Examples Restructure regarding Installation Steps (#11285 ) * merge install step in glm examples * fix section * fix section * fix tiktoken	2024-06-14 12:37:05 +08:00
binbin Deng	60cb1dac7c	Support PP for qwen1.5 (#11300 )	2024-06-13 17:35:24 +08:00
binbin Deng	f97cce2642	Fix import error of ds autotp (#11307 )	2024-06-13 16:22:52 +08:00
binbin Deng	220151e2a1	Refactor pipeline parallel multi-stage implementation (#11286 )	2024-06-13 10:00:23 +08:00
ivy-lv11	e7a4e2296f	Add Stable Diffusion examples on GPU and CPU (#11166 ) * add sdxl and lcm-lora * readme * modify * add cpu * add license * modify * add file	2024-06-12 16:33:25 +08:00
Zijie Li	40fc8704c4	Add GPU example for GLM-4 (#11267 ) * Add GPU example for GLM-4 * Update streamchat.py * Fix pretrianed arguments Fix pretrained arguments in generate and streamchat.py * Update Readme Update install tiktoken required for GLM-4 * Update comments in generate.py	2024-06-12 14:29:50 +08:00
Wang, Jian4	6f2684e5c9	Update pp llama.py to save memory (#11233 )	2024-06-07 13:18:16 +08:00
Zijie Li	7b753dc8ca	Update sample output for HF Qwen2 GPU and CPU (#11257 )	2024-06-07 11:36:22 +08:00
Yuwen Hu	8c36b5bdde	Add qwen2 example (#11252 ) * Add GPU example for Qwen2 * Update comments in README * Update README for Qwen2 GPU example * Add CPU example for Qwen2 Sample Output under README pending * Update generate.py and README for CPU Qwen2 * Update GPU example for Qwen2 * Small update * Small fix * Add Qwen2 table * Update README for Qwen2 CPU and GPU Update sample output under README --------- Co-authored-by: Zijie Li <michael20001122@gmail.com>	2024-06-07 10:29:33 +08:00
Shaojun Liu	85df5e7699	fix nightly perf test (#11251 )	2024-06-07 09:33:14 +08:00
Guoqiong Song	09c6780d0c	phi-2 transformers 4.37 (#11161 ) * phi-2 transformers 4.37	2024-06-05 13:36:41 -07:00
Zijie Li	bfa1367149	Add CPU and GPU example for MiniCPM (#11202 ) * Change installation address Change former address: "https://docs.conda.io/en/latest/miniconda.html#" to new address: "https://conda-forge.org/download/" for 63 occurrences under python\llm\example * Change Prompt Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence * Create and update model minicpm * Update model minicpm Update model minicpm under GPU/PyTorch-Models * Update readme and generate.py change "prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)" and delete "pip install transformers==4.37.0 " * Update comments for minicpm GPU Update comments for generate.py at minicpm GPU * Add CPU example for MiniCPM * Update minicpm README for CPU * Update README for MiniCPM and Llama3 * Update Readme for Llama3 CPU Pytorch * Update and fix comments for MiniCPM	2024-06-05 18:09:53 +08:00
Yuwen Hu	af96579c76	Update installation guide for pipeline parallel inference (#11224 ) * Update installation guide for pipeline parallel inference * Small fix * further fix * Small fix * Small fix * Update based on comments * Small fix * Small fix * Small fix	2024-06-05 17:54:29 +08:00
Qiyuan Gong	ce3f08b25a	Fix IPEX auto importer (#11192 ) * Fix ipex auto importer with Python builtins. * Raise errors if the user imports ipex manually before importing ipex_llm. Do nothing if they import ipex after importing ipex_llm. * Remove import ipex in examples.	2024-06-04 16:57:18 +08:00
Zijie Li	a644e9409b	Miniconda/Anaconda -> Miniforge update in examples (#11194 ) * Change installation address Change former address: "https://docs.conda.io/en/latest/miniconda.html#" to new address: "https://conda-forge.org/download/" for 63 occurrences under python\llm\example * Change Prompt Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence	2024-06-04 10:14:02 +08:00
Qiyuan Gong	15a6205790	Fix LoRA tokenizer for Llama and chatglm (#11186 ) * Set pad_token to eos_token if it's None. Otherwise, use model config.	2024-06-03 15:35:38 +08:00
Wang, Jian4	c0f1be6aea	Fix pp logic (#11175 ) * only send no none batch and rank1-n sending first * always send first	2024-05-30 16:40:59 +08:00
Jin Qiao	dcbf4d3d0a	Add phi-3-vision example (#11156 ) * Add phi-3-vision example (HF-Automodels) * fix * fix * fix * Add phi-3-vision CPU example (HF-Automodels) * add in readme * fix * fix * fix * fix * use fp8 for gpu example * remove eval	2024-05-30 10:02:47 +08:00
Jiao Wang	93146b9433	Reconstruct Speculative Decoding example directory (#11136 ) * update * update * update	2024-05-29 13:15:27 -07:00
Xiangyu Tian	2299698b45	Refine Pipeline Parallel FastAPI example (#11168 )	2024-05-29 17:16:50 +08:00
Wang, Jian4	8e25de1126	LLM: Add codegeex2 example (#11143 ) * add codegeex example * update * update cpu * add GPU * add gpu * update readme	2024-05-29 10:00:26 +08:00
ZehuaCao	751e1a4e29	Fix concurrent issue in autoTP streming. (#11150 ) * add benchmark test * update	2024-05-29 08:22:38 +08:00
SONG Ge	33852bd23e	Refactor pipeline parallel device config (#11149 ) * refactor pipeline parallel device config * meet comments * update example * add warnings and update code doc	2024-05-28 16:52:46 +08:00
Xiangyu Tian	b44cf405e2	Refine Pipeline-Parallel-Fastapi example README (#11155 )	2024-05-28 15:18:21 +08:00
Xiangyu Tian	5c8ccf0ba9	LLM: Add Pipeline-Parallel-FastAPI example (#10917 ) Add multi-stage Pipeline-Parallel-FastAPI example --------- Co-authored-by: hzjane <a1015616934@qq.com>	2024-05-27 14:46:29 +08:00
Ruonan Wang	d550af957a	fix security issue of eagle (#11140 ) * fix security issue of eagle * small fix	2024-05-27 10:15:28 +08:00
Jean Yu	ab476c7fe2	Eagle Speculative Sampling examples (#11104 ) * Eagle Speculative Sampling examples * rm multi-gpu and ray content * updated README to include Arc A770	2024-05-24 11:13:43 -07:00
Guancheng Fu	fabc395d0d	add langchain vllm interface (#11121 ) * done * fix * fix * add vllm * add langchain vllm exampels * add docs * temp	2024-05-24 17:19:27 +08:00
ZehuaCao	63e95698eb	[LLM]Reopen autotp generate_stream (#11120 ) * reopen autotp generate_stream * fix style error * update	2024-05-24 17:16:14 +08:00
Qiyuan Gong	120a0035ac	Fix type mismatch in eval for Baichuan2 QLora example (#11117 ) * During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.	2024-05-24 14:14:30 +08:00
Xiangyu Tian	b3f6faa038	LLM: Add CPU vLLM entrypoint (#11083 ) Add CPU vLLM entrypoint and update CPU vLLM serving example.	2024-05-24 09:16:59 +08:00
Qiyuan Gong	f6c9ffe4dc	Add WANDB_MODE and HF_HUB_OFFLINE to XPU finetune README (#11097 ) * Add WANDB_MODE=offline to avoid multi-GPUs finetune errors. * Add HF_HUB_OFFLINE=1 to avoid Hugging Face related errors.	2024-05-22 15:20:53 +08:00
Qiyuan Gong	492ed3fd41	Add verified models to GPU finetune README (#11088 ) * Add verified models to GPU finetune README	2024-05-21 15:49:15 +08:00
Qiyuan Gong	1210491748	ChatGLM3, Baichuan2 and Qwen1.5 QLoRA example (#11078 ) * Add chatglm3, qwen15-7b and baichuan-7b QLoRA alpaca example * Remove unnecessary tokenization setting.	2024-05-21 15:29:43 +08:00
binbin Deng	7170dd9192	Update guide for running qwen with AutoTP (#11065 )	2024-05-20 10:53:17 +08:00
ZehuaCao	56cb992497	LLM: Modify CPU Installation Command for most examples (#11049 ) * init * refine * refine * refine * modify hf-agent example * modify all CPU model example * remove readthedoc modify * replace powershell with cmd * fix repo * fix repo * update * remove comment on windows code block * update * update * update * update --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com>	2024-05-17 15:52:20 +08:00
Jin Qiao	9a96af4232	Remove oneAPI pip install command in related examples (#11030 ) * Remove pip install command in windows installation guide * fix chatglm3 installation guide * Fix gemma cpu example * Apply on other examples * fix	2024-05-16 10:46:29 +08:00
Wang, Jian4	d9f71f1f53	Update benchmark util for example using (#11027 ) * mv benchmark_util.py to utils/ * remove * update	2024-05-15 14:16:35 +08:00
binbin Deng	4053a6ef94	Update environment variable setting in AutoTP with arc (#11018 )	2024-05-15 10:23:58 +08:00
Ziteng Zhang	7d3791c819	[LLM] Add llama3 alpaca qlora example (#11011 ) * Add llama3 finetune example based on alpaca qlora example	2024-05-15 09:17:32 +08:00
Qiyuan Gong	c957ea3831	Add axolotl main support and axolotl Llama-3-8B QLoRA example (#10984 ) * Support axolotl main (796a085). * Add axolotl Llama-3-8B QLoRA example. * Change `sequence_len` to 256 for alpaca, and revert `lora_r` value. * Add example to quick_start.	2024-05-14 13:43:59 +08:00
Wang, Jian4	f4c615b1ee	Add cohere example (#10954 ) * add link first * add_cpu_example * add GPU example	2024-05-08 17:19:59 +08:00
Xiangyu Tian	02870dc385	LLM: Refine README of AutoTP-FastAPI example (#10960 )	2024-05-08 16:55:23 +08:00
Xin Qiu	5973d6c753	make gemma's output better (#10943 )	2024-05-08 14:27:51 +08:00
Qiyuan Gong	164e6957af	Refine axolotl quickstart (#10957 ) * Add default accelerate config for axolotl quickstart. * Fix requirement link. * Upgrade peft to 0.10.0 in requirement.	2024-05-08 09:34:02 +08:00
Qiyuan Gong	c11170b96f	Upgrade Peft to 0.10.0 in finetune examples and docker (#10930 ) * Upgrade Peft to 0.10.0 in finetune examples. * Upgrade Peft to 0.10.0 in docker.	2024-05-07 15:12:26 +08:00
Qiyuan Gong	d7ca5d935b	Upgrade Peft version to 0.10.0 for LLM finetune (#10886 ) * Upgrade Peft version to 0.10.0 * Upgrade Peft version in ARC unit test and HF-Peft example.	2024-05-07 15:09:14 +08:00
hxsz1997	245c7348bc	Add codegemma example (#10884 ) * add codegemma example in GPU/HF-Transformers-AutoModels/ * add README of codegemma example in GPU/HF-Transformers-AutoModels/ * add codegemma example in GPU/PyTorch-Models/ * add readme of codegemma example in GPU/PyTorch-Models/ * add codegemma example in CPU/HF-Transformers-AutoModels/ * add readme of codegemma example in CPU/HF-Transformers-AutoModels/ * add codegemma example in CPU/PyTorch-Models/ * add readme of codegemma example in CPU/PyTorch-Models/ * fix typos * fix filename typo * add codegemma in tables * add comments of lm_head * remove comments of use_cache	2024-05-07 13:35:42 +08:00
Xiangyu Tian	13a44cdacb	LLM: Refine Deepspped-AutoTP-FastAPI example (#10916 )	2024-05-07 09:37:31 +08:00
Guancheng Fu	2c64754eb0	Add vLLM to ipex-llm serving image (#10807 ) * add vllm * done * doc work * fix done * temp * add docs * format * add start-fastchat-service.sh * fix	2024-04-29 17:25:42 +08:00
Jin Qiao	1f876fd837	Add example for phi-3 (#10881 ) * Add example for phi-3 * add in readme and index * fix * fix * fix * fix indent * fix	2024-04-29 16:43:55 +08:00
Xiangyu Tian	3d4950b0f0	LLM: Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example (#10876 ) Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example.	2024-04-26 13:24:28 +08:00
Yang Wang	1ce8d7bcd9	Support the `desc_act` feature in GPTQ model (#10851 ) * support act_order * update versions * fix style * fix bug * clean up	2024-04-24 10:17:13 -07:00
binbin Deng	fabf54e052	LLM: make pipeline parallel inference example more common (#10786 )	2024-04-24 09:28:52 +08:00
hxsz1997	328b1a1de9	Fix the not stop issue of llama3 examples (#10860 ) * fix not stop issue in GPU/HF-Transformers-AutoModels * fix not stop issue in GPU/PyTorch-Models/Model/llama3 * fix not stop issue in CPU/HF-Transformers-AutoModels/Model/llama3 * fix not stop issue in CPU/PyTorch-Models/Model/llama3 * update the output in readme * update format * add reference * update prompt format * update output format in readme * update example output in readme	2024-04-23 19:10:09 +08:00
Qiyuan Gong	5494aa55f6	Downgrade datasets in axolotl example (#10849 ) * Downgrade datasets to 2.15.0 to address axolotl prepare issue https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544 Tks to @kwaa for providing the solution in https://github.com/intel-analytics/ipex-llm/issues/10821#issuecomment-2068861571	2024-04-23 09:41:58 +08:00
Guancheng Fu	47bd5f504c	[vLLM]Remove vllm-v1, refactor v2 (#10842 ) * remove vllm-v1 * fix format	2024-04-22 17:51:32 +08:00
Heyang Sun	fc33aa3721	fix missing import (#10839 )	2024-04-22 14:34:52 +08:00
Guancheng Fu	ae3b577537	Update README.md (#10833 )	2024-04-22 11:07:10 +08:00
Wang, Jian4	5f95054f97	LLM：Add qwen moe example libs md (#10828 )	2024-04-22 10:03:19 +08:00
Guancheng Fu	61c67af386	Fix vLLM-v2 install instructions(#10822 )	2024-04-22 09:02:48 +08:00
Yang Wang	8153c3008e	Initial llama3 example (#10799 ) * Add initial hf huggingface GPU example * Small fix * Add llama3 gpu pytorch model example * Add llama 3 hf transformers CPU example * Add llama 3 pytorch model CPU example * Fixes * Small fix * Small fixes * Small fix * Small fix * Add links * update repo id * change prompt tuning url * remove system header if there is no system prompt --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com>	2024-04-18 11:01:33 -07:00
Qiyuan Gong	e90e31719f	axolotl lora example (#10789 ) * Add axolotl lora example * Modify readme * Add comments in yml	2024-04-18 16:38:32 +08:00
Guancheng Fu	cbe7b5753f	Add vLLM[xpu] related code (#10779 ) * Add ipex-llm side change * add runable offline_inference * refactor to call vllm2 * Verified async server * add new v2 example * add README * fix * change dir * refactor readme.md * add experimental * fix	2024-04-18 15:29:20 +08:00
Ziteng Zhang	ff040c8f01	LISA Finetuning Example (#10743 ) * enabling xetla only supports qtype=SYM_INT4 or FP8E5 * LISA Finetuning Example on gpu * update readme * add licence * Explain parameters of lisa & Move backend codes to src dir * fix style * fix style * update readme * support chatglm * fix style * fix style * update readme * fix	2024-04-18 13:48:10 +08:00
Heyang Sun	581ebf6104	GaLore Finetuning Example (#10722 ) * GaLore Finetuning Example * Update README.md * Update README.md * change data to HuggingFaceH4/helpful_instructions * Update README.md * Update README.md * shrink train size and delete cache before starting training to save memory * Update README.md * Update galore_finetuning.py * change model to llama2 3b * Update README.md	2024-04-18 13:47:41 +08:00
Yina Chen	ea5b373a97	Add lookahead GPU example (#10785 ) * Add lookahead example * fix style & attn mask * fix typo * address comments	2024-04-17 17:41:55 +08:00
Cengguang Zhang	7ec82c6042	LLM: add README.md for Long-Context examples. (#10765 ) * LLM: add readme to long-context examples. * add precision. * update wording. * add GPU type. * add Long-Context example to GPU examples. * fix comments. * update max input length. * update max length. * add output length. * fix wording.	2024-04-17 15:34:59 +08:00
Qiyuan Gong	9e5069437f	Fix gradio version in axolotl example (#10776 ) * Change to gradio>=4.19.2	2024-04-17 10:23:43 +08:00
Qiyuan Gong	f2e923b3ca	Axolotl v0.4.0 support (#10773 ) * Add Axolotl 0.4.0, remove legacy 0.3.0 support. * replace is_torch_bf16_gpu_available * Add HF_HUB_OFFLINE=1 * Move transformers out of requirement * Refine readme and qlora.yml	2024-04-17 09:49:11 +08:00
Heyang Sun	26cae0a39c	Update FLEX in Deepspeed README (#10774 ) * Update FLEX in Deepspeed README * Update README.md	2024-04-17 09:28:24 +08:00
Qiyuan Gong	d30b22a81b	Refine axolotl 0.3.0 documents and links (#10764 ) * Refine axolotl 0.3 based on comments * Rename requirements to requirement-xpu * Add comments for paged_adamw_32bit * change lora_r from 8 to 16	2024-04-16 14:47:45 +08:00
ZehuaCao	599a88db53	Add deepsped-autoTP-Fastapi serving (#10748 ) * add deepsped-autoTP-Fastapi serving * add readme * add license * update * update * fix	2024-04-16 14:03:23 +08:00
Jin Qiao	73a67804a4	GPU configuration update for examples (windows pip installer, etc.) (#10762 ) * renew chatglm3-6b gpu example readme fix fix fix * fix for comments * fix * fix * fix * fix * fix * apply on HF-Transformers-AutoModels * apply on PyTorch-Models * fix * fix	2024-04-15 17:42:52 +08:00
yb-peng	b5209d3ec1	Update example/GPU/PyTorch-Models/Model/llava/README.md (#10757 ) * Update example/GPU/PyTorch-Models/Model/llava/README.md * Update README.md fix path in windows installation	2024-04-15 13:01:37 +08:00
Jiao Wang	9e668a5bf0	fix_internlm-chat-7b-8k repo name in examples (#10747 )	2024-04-12 10:15:48 -07:00
Keyan (Kyrie) Zhang	1256a2cc4e	Add chatglm3 long input example (#10739 ) * Add long context input example for chatglm3 * Small fix * Small fix * Small fix	2024-04-11 16:33:43 +08:00
Qiyuan Gong	2d64630757	Remove transformers version in axolotl example (#10736 ) * Remove transformers version in axolotl requirements.txt	2024-04-11 14:02:31 +08:00
Shaojun Liu	29bf28bd6f	Upgrade python to 3.11 in Docker Image (#10718 ) * install python 3.11 for cpu-inference docker image * update xpu-inference dockerfile * update cpu-serving image * update qlora image * update lora image * update document	2024-04-10 14:41:27 +08:00
Qiyuan Gong	b727767f00	Add axolotl v0.3.0 with ipex-llm on Intel GPU (#10717 ) * Add axolotl v0.3.0 support on Intel GPU. * Add finetune example on llama-2-7B with Alpaca dataset.	2024-04-10 14:38:29 +08:00
Jiao Wang	878a97077b	Fix llava example to support transformerds 4.36 (#10614 ) * fix llava example * update	2024-04-09 13:47:07 -07:00
Jiao Wang	1e817926ba	Fix low memory generation example issue in transformers 4.36 (#10702 ) * update cache in low memory generate * update	2024-04-09 09:56:52 -07:00
Shaojun Liu	f37a1f2a81	Upgrade to python 3.11 (#10711 ) * create conda env with python 3.11 * recommend to use Python 3.11 * update	2024-04-09 17:41:17 +08:00
Cengguang Zhang	6a32216269	LLM: add llama2 8k input example. (#10696 ) * LLM: add llama2-32K example. * refactor name. * fix comments. * add IPEX_LLM_LOW_MEM notes and update sample output.	2024-04-09 16:02:37 +08:00
Keyan (Kyrie) Zhang	1e27e08322	Modify example from fp32 to fp16 (#10528 ) * Modify example from fp32 to fp16 * Remove Falcon from fp16 example for now * Remove MPT from fp16 example	2024-04-09 15:45:49 +08:00
binbin Deng	d9a1153b4e	LLM: upgrade deepspeed in AutoTP on GPU (#10647 )	2024-04-07 14:05:19 +08:00
Zhicun	9d8ba64c0d	Llamaindex: add tokenizer_id and support chat (#10590 ) * add tokenizer_id * fix * modify * add from_model_id and from_mode_id_low_bit * fix typo and add comment * fix python code style --------- Co-authored-by: pengyb2001 <284261055@qq.com>	2024-04-07 13:51:34 +08:00
Jin Qiao	10ee786920	Replace with IPEX-LLM in example comments (#10671 ) * Replace with IPEX-LLM in example comments * More replacement * revert some changes	2024-04-07 13:29:51 +08:00
Jiao Wang	69bdbf5806	Fix vllm print error message issue (#10664 ) * update chatglm readme * Add condition to invalidInputError * update * update * style	2024-04-05 15:08:13 -07:00
Jason Dai	29d97e4678	Update readme (#10665 )	2024-04-05 18:01:57 +08:00
Jin Qiao	cc8b3be11c	Add GPU and CPU example for stablelm-zephyr-3b (#10643 ) * Add example for StableLM * fix * add to readme	2024-04-03 16:28:31 +08:00
Heyang Sun	6000241b10	Add Deepspeed Example of FLEX Mistral (#10640 )	2024-04-03 16:04:17 +08:00
Zhicun	b827f534d5	Add tokenizer_id in Langchain (#10588 ) * fix low-bit * fix * fix style --------- Co-authored-by: arda <arda@arda-arc12.sh.intel.com>	2024-04-03 14:25:35 +08:00
Zhicun	f6fef09933	fix prompt format for llama-2 in langchain (#10637 )	2024-04-03 14:17:34 +08:00
Jiao Wang	330d4b4f4b	update readme (#10631 )	2024-04-02 23:08:02 -07:00
Jiao Wang	654dc5ba57	Fix Qwen-VL example problem (#10582 ) * update * update * update * update	2024-04-02 12:17:30 -07:00
Ruonan Wang	d6af4877dd	LLM: remove ipex.optimize for gpt-j (#10606 ) * remove ipex.optimize * fix * fix	2024-04-01 12:21:49 +08:00
Keyan (Kyrie) Zhang	848fa04dd6	Fix typo in Baichuan2 example (#10589 )	2024-03-29 13:31:47 +08:00
ZehuaCao	52a2135d83	Replace ipex with ipex-llm (#10554 ) * fix ipex with ipex_llm * fix ipex with ipex_llm * update * update * update * update * update * update * update * update	2024-03-28 13:54:40 +08:00
Cheen Hau, 俊豪	1c5eb14128	Update pip install to use --extra-index-url for ipex package (#10557 ) * Change to 'pip install .. --extra-index-url' for readthedocs * Change to 'pip install .. --extra-index-url' for examples * Change to 'pip install .. --extra-index-url' for remaining files * Fix URL for ipex * Add links for ipex US and CN servers * Update ipex cpu url * remove readme * Update for github actions * Update for dockerfiles	2024-03-28 09:56:23 +08:00
Cheen Hau, 俊豪	f239bc329b	Specify oneAPI minor version in documentation (#10561 )	2024-03-27 17:58:57 +08:00
hxsz1997	d86477f14d	Remove native_int4 in LangChain examples (#10510 ) * rebase the modify to ipex-llm * modify the typo	2024-03-27 17:48:16 +08:00
Wang, Jian4	16b2ef49c6	Update_document by heyang (#30 )	2024-03-25 10:06:02 +08:00
Wang, Jian4	9df70d95eb	Refactor bigdl.llm to ipex_llm (#24 ) * Rename bigdl/llm to ipex_llm * rm python/llm/src/bigdl * from bigdl.llm to from ipex_llm	2024-03-22 15:41:21 +08:00
Jin Qiao	cc5806f4bc	LLM: add save/load example for hf-transformers (#10432 )	2024-03-22 13:57:47 +08:00
binbin Deng	2958ca49c0	LLM: add patching function for llm finetuning (#10247 )	2024-03-21 16:01:01 +08:00
hxsz1997	a5f35757a4	Migrate langchain rag cpu example to gpu (#10450 ) * add langchain rag on gpu * add rag example in readme * add trust_remote_code in TransformersEmbeddings.from_model_id * add trust_remote_code in TransformersEmbeddings.from_model_id in cpu	2024-03-21 15:20:46 +08:00
Ruonan Wang	28c315a5b9	LLM: fix deepspeed error of finetuning on xpu (#10484 )	2024-03-21 09:46:25 +08:00
Cengguang Zhang	463a86cd5d	LLM: fix qwen-vl interpolation gpu abnormal results. (#10457 ) * fix qwen-vl interpolation gpu abnormal results. * fix style. * update qwen-vl gpu example. * fix comment and update example. * fix style.	2024-03-19 16:59:39 +08:00
Jiao Wang	f3fefdc9ce	fix pad_token_id issue (#10425 )	2024-03-18 23:30:28 -07:00
Yuxuan Xia	74e7490fda	Fix Baichuan2 prompt format (#10334 ) * Fix Baichuan2 prompt format * Fix Baichuan2 README * Change baichuan2 prompt info * Change baichuan2 prompt info	2024-03-19 12:48:07 +08:00
Yang Wang	9e763b049c	Support running pipeline parallel inference by vertically partitioning model to different devices (#10392 ) * support pipeline parallel inference * fix logging * remove benchmark file * fic * need to warmup twice * support qwen and qwen2 * fix lint * remove genxir * refine	2024-03-18 13:04:45 -07:00
Jiao Wang	5ab52ef5b5	update (#10424 )	2024-03-15 09:24:26 -07:00
Jin Qiao	ca372f6dab	LLM: add save/load example for ModelScope (#10397 ) * LLM: add sl example for modelscope * fix according to comments * move file	2024-03-15 15:17:50 +08:00
Wang, Jian4	fe8976a00f	LLM: Support gguf models use low_bit and fix no json(#10408 ) * support others model use low_bit * update readme * update to add *.json	2024-03-15 09:34:18 +08:00
binbin Deng	5d7e044dbc	LLM: add low bit option in deepspeed autotp example (#10382 )	2024-03-12 17:07:09 +08:00
binbin Deng	df3bcc0e65	LLM: remove english_quotes dataset (#10370 )	2024-03-12 16:57:40 +08:00
binbin Deng	fe27a6971c	LLM: update modelscope version (#10367 )	2024-03-11 16:18:27 +08:00
Zhicun	9026c08633	Fix llamaindex AutoTokenizer bug (#10345 ) * fix tokenizer * fix AutoTokenizer bug * modify code style	2024-03-08 16:24:50 +08:00
hxsz1997	af11c53473	Add the installation step of postgresql and pgvector on windows in LlamaIndex GPU support (#10328 ) * add the installation of postgresql and pgvector of windows * fix some format	2024-03-05 18:31:19 +08:00
dingbaorong	1e6f0c6f1a	Add llamaindex gpu example (#10314 ) * add llamaindex example * fix core dump * refine readme * add trouble shooting * refine readme --------- Co-authored-by: Ariadne <wyn2000330@126.com>	2024-03-05 13:36:00 +08:00
dingbaorong	fc7f10cd12	add langchain gpu example (#10277 ) * first draft * fix * add readme for transformer_int4_gpu * fix doc * check device_map * add arc ut test * fix ut test * fix langchain ut * Refine README * fix gpu mem too high * fix ut test --------- Co-authored-by: Ariadne <wyn2000330@126.com>	2024-03-05 13:33:57 +08:00
Xin Qiu	58208a5883	Update FAQ document. (#10300 ) * Update install_gpu.md * Update resolve_error.md * Update README.md * Update resolve_error.md * Update README.md * Update resolve_error.md	2024-03-04 08:35:11 +08:00
Xin Qiu	509e206de0	update doc about gemma random and unreadable output. (#10297 ) * Update install_gpu.md * Update README.md * Update README.md	2024-03-01 15:41:16 +08:00
Guancheng Fu	2d930bdca8	Add vLLM bf16 support (#10278 ) * add argument load_in_low_bit * add docs * modify gpu doc * done --------- Co-authored-by: ivy-lv11 <lvzc@lamda.nju.edu.cn>	2024-02-29 16:33:42 +08:00
Ruonan Wang	a9fd20b6ba	LLM: Update qkv fusion for GGUF-IQ2 (#10271 ) * first commit * update mistral * fix transformers==4.36.0 * fix * disable qk for mixtral now * fix style	2024-02-29 12:49:53 +08:00
Shengsheng Huang	db0d129226	Revert "Add rwkv example (#9432 )" (#10264 ) This reverts commit `6930422b42`.	2024-02-28 11:48:31 +08:00
Yining Wang	6930422b42	Add rwkv example (#9432 ) * codeshell fix wrong urls * restart runner * add RWKV CPU & GPU example (rwkv-4-world-7b) * restart runner * update submodule * fix runner * runner-test --------- Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>	2024-02-28 11:41:00 +08:00
Keyan (Kyrie) Zhang	59861f73e5	Add Deepseek-6.7B (#9991 ) * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * modify deepseek * modify deepseek * Add verified model in README * Turn cpu_embedding=True in Deepseek example --------- Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>	2024-02-28 11:36:39 +08:00
Yuxuan Xia	2524273198	Update AutoGen README (#10255 ) * Update AutoGen README * Fix AutoGen README typos * Update AutoGen README * Update AutoGen README	2024-02-28 11:34:45 +08:00
Zheng, Yi	2347f611cf	Add cpu and gpu examples of Mamba (#9797 ) * Add mamba cpu example * Add mamba gpu example * Use a smaller model as the example * minor fixes --------- Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>	2024-02-28 11:33:29 +08:00
Guoqiong Song	f4a2e32106	Stream llm example for both GPU and CPU (#9390 )	2024-02-27 15:54:47 -08:00
Keyan (Kyrie) Zhang	843fe546b0	Add CPU and GPU examples for DeciLM-7B (#9867 ) * Add cpu and gpu examples for DeciLM-7B * Add cpu and gpu examples for DeciLM-7B * Add DeciLM-7B to README table * modify deciLM * modify deciLM * modify deciLM * Add verified model in README * Add cpu_embedding=True	2024-02-27 13:15:49 +08:00
Xin Qiu	8ef5482da2	update Gemma readme (#10229 ) * Update README.md * Update README.md * Update README.md * Update README.md	2024-02-23 16:57:08 +08:00
Xin Qiu	aabfc06977	add gemma example (#10224 ) * add gemma gpu example * Update README.md * add cpu example * Update README.md * Update README.md * Update generate.py * Update generate.py	2024-02-23 15:20:57 +08:00
yb-peng	a2c1675546	Add CPU and GPU examples for Yuan2-2B-hf (#9946 ) * Add a new CPU example of Yuan2-2B-hf * Add a new CPU generate.py of Yuan2-2B-hf example * Add a new GPU example of Yuan2-2B-hf * Add Yuan2 to README table * In CPU example:1.Use English as default prompt; 2.Provide modified files in yuan2-2B-instruct * In GPU example:1.Use English as default prompt;2.Provide modified files * GPU example:update README * update Yuan2-2B-hf in README table * Add CPU example for Yuan2-2B in Pytorch-Models * Add GPU example for Yuan2-2B in Pytorch-Models * Add license in generate.py; Modify README * In GPU Add license in generate.py; Modify README * In CPU yuan2 modify README * In GPU yuan2 modify README * In CPU yuan2 modify README * In GPU example, updated the readme for Windows GPU supports * In GPU torch example, updated the readme for Windows GPU supports * GPU hf example README modified * GPU example README modified	2024-02-23 14:09:30 +08:00
yb-peng	f1f4094a09	Add CPU and GPU examples of phi-2 (#10014 ) * Add CPU and GPU examples of phi-2 * In GPU hf example, updated the readme for Windows GPU supports * In GPU torch example, updated the readme for Windows GPU supports * update the table in BigDL/README.md * update the table in BigDL/python/llm/README.md	2024-02-23 14:05:53 +08:00
Guoqiong Song	63681af97e	falcon for transformers 4.36 (#9960 ) * falcon for transformers 4.36	2024-02-22 17:04:40 -08:00
Jason Dai	84d5f40936	Update README.md (#10213 )	2024-02-22 17:22:59 +08:00
Ruonan Wang	5e1fee5e05	LLM: add GGUF-IQ2 examples (#10207 ) * add iq2 examples * small fix * meet code review * fix * meet review * small fix	2024-02-22 14:18:45 +08:00
binbin Deng	9975b029c5	LLM: add qlora finetuning example using `trl.SFTTrainer` (#10183 )	2024-02-21 16:40:04 +08:00
Zhicun	c7e839e66c	Add Qwen1.5-7B-Chat (#10113 ) * add Qwen1.5-7B-Chat * modify Qwen1.5 example * update README * update prompt format * update folder name and example README * add Chinese prompt sample output * update link in README * correct the link * update transformer version	2024-02-21 13:29:29 +08:00
binbin Deng	11fe5a87ec	LLM: add Modelscope model example (#10126 )	2024-02-08 11:18:07 +08:00

1 2 3 4 5 ...

366 commits