ipex-llm

Author	SHA1	Message	Date
binbin Deng	987017ef47	Update pipeline parallel serving for more model support (#11428 )	2024-06-27 18:21:01 +08:00
Yishuo Wang	cf0f5c4322	change npu document (#11446 )	2024-06-27 13:59:59 +08:00
binbin Deng	508c364a79	Add precision option in PP inference examples (#11440 )	2024-06-27 09:24:27 +08:00
Shaojun Liu	ab9f7f3ac5	FIX: Qwen1.5-GPTQ-Int4 inference error (#11432 ) * merge_qkv if quant_method is 'gptq' * fix python style checks * refactor * update GPU example	2024-06-26 15:36:22 +08:00
Jiao Wang	40fa23560e	Fix LLAVA example on CPU (#11271 ) * update * update * update * update	2024-06-25 20:04:59 -07:00
binbin Deng	e473b8d946	Add more qwen1.5 and qwen2 support for pipeline parallel inference (#11423 )	2024-06-25 15:49:32 +08:00
Yishuo Wang	3b23de684a	update npu examples (#11422 )	2024-06-25 13:32:53 +08:00
Xiangyu Tian	8ddae22cfb	LLM: Refactor Pipeline-Parallel-FastAPI example (#11319 ) Initially Refactor for Pipeline-Parallel-FastAPI example	2024-06-25 13:30:36 +08:00
SONG Ge	34c15d3a10	update pp document (#11421 )	2024-06-25 10:17:20 +08:00
Heyang Sun	c985912ee3	Add Deepspeed LoRA dependencies in document (#11410 )	2024-06-24 15:29:59 +08:00
SONG Ge	0c67639539	Add more examples for pipeline parallel inference (#11372 ) * add more model exampels for pipelien parallel inference * add mixtral and vicuna models * add yi model and past_kv supprot for chatglm family * add docs * doc update * add license * update	2024-06-21 17:55:16 +08:00
ivy-lv11	21fc781fce	Add GLM-4V example (#11343 ) * add example * modify * modify * add line * add * add link and replace with phi-3-vision template * fix generate options * fix * fix --------- Co-authored-by: jinbridge <2635480475@qq.com>	2024-06-21 12:54:31 +08:00
binbin Deng	4ba82191f2	Support PP inference for chatglm3 (#11375 )	2024-06-21 09:59:01 +08:00
Zijie Li	ae452688c2	Add NPU HF example (#11358 )	2024-06-19 18:07:28 +08:00
Heyang Sun	67a1e05876	Remove zero3 context manager from LoRA (#11346 )	2024-06-18 17:24:43 +08:00
Shaojun Liu	694912698e	Upgrade scikit-learn to 1.5.0 to fix dependabot issue (#11349 )	2024-06-18 15:47:25 +08:00
Heyang Sun	00f322d8ee	Finetune ChatGLM with Deepspeed Zero3 LoRA (#11314 ) * Fintune ChatGLM with Deepspeed Zero3 LoRA * add deepspeed zero3 config * rename config * remove offload_param * add save_checkpoint parameter * Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh * refine	2024-06-18 12:31:26 +08:00
binbin Deng	e50c890e1f	Support finishing PP inference once `eos_token_id` is found (#11336 )	2024-06-18 09:55:40 +08:00
Qiyuan Gong	de4bb97b4f	Remove accelerate 0.23.0 install command in readme and docker (#11333 ) *ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。	2024-06-17 17:52:12 +08:00
SONG Ge	ef4b6519fb	Add phi-3 model support for pipeline parallel inference (#11334 ) * add phi-3 model support * add phi3 example	2024-06-17 17:44:24 +08:00
SONG Ge	be00380f1a	Fix pipeline parallel inference past_key_value error in Baichuan (#11318 ) * fix past_key_value error * add baichuan2 example * fix style * update doc * add script link in doc * fix import error * update	2024-06-17 09:29:32 +08:00
Xiangyu Tian	4359ab3172	LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example (#11187 ) Add /generate_stream and OpenAI-formatted endpoint for Pipeline-Parallel-FastAPI example	2024-06-14 15:15:32 +08:00
Jin Qiao	0e7a31a09c	ChatGLM Examples Restructure regarding Installation Steps (#11285 ) * merge install step in glm examples * fix section * fix section * fix tiktoken	2024-06-14 12:37:05 +08:00
binbin Deng	60cb1dac7c	Support PP for qwen1.5 (#11300 )	2024-06-13 17:35:24 +08:00
binbin Deng	f97cce2642	Fix import error of ds autotp (#11307 )	2024-06-13 16:22:52 +08:00
binbin Deng	220151e2a1	Refactor pipeline parallel multi-stage implementation (#11286 )	2024-06-13 10:00:23 +08:00
ivy-lv11	e7a4e2296f	Add Stable Diffusion examples on GPU and CPU (#11166 ) * add sdxl and lcm-lora * readme * modify * add cpu * add license * modify * add file	2024-06-12 16:33:25 +08:00
Jin Qiao	f224e98297	Add GLM-4 CPU example (#11223 ) * Add GLM-4 example * add tiktoken dependency * fix * fix	2024-06-12 15:30:51 +08:00
Zijie Li	40fc8704c4	Add GPU example for GLM-4 (#11267 ) * Add GPU example for GLM-4 * Update streamchat.py * Fix pretrianed arguments Fix pretrained arguments in generate and streamchat.py * Update Readme Update install tiktoken required for GLM-4 * Update comments in generate.py	2024-06-12 14:29:50 +08:00
Wang, Jian4	6f2684e5c9	Update pp llama.py to save memory (#11233 )	2024-06-07 13:18:16 +08:00
Zijie Li	7b753dc8ca	Update sample output for HF Qwen2 GPU and CPU (#11257 )	2024-06-07 11:36:22 +08:00
Yuwen Hu	8c36b5bdde	Add qwen2 example (#11252 ) * Add GPU example for Qwen2 * Update comments in README * Update README for Qwen2 GPU example * Add CPU example for Qwen2 Sample Output under README pending * Update generate.py and README for CPU Qwen2 * Update GPU example for Qwen2 * Small update * Small fix * Add Qwen2 table * Update README for Qwen2 CPU and GPU Update sample output under README --------- Co-authored-by: Zijie Li <michael20001122@gmail.com>	2024-06-07 10:29:33 +08:00
Shaojun Liu	85df5e7699	fix nightly perf test (#11251 )	2024-06-07 09:33:14 +08:00
Guoqiong Song	09c6780d0c	phi-2 transformers 4.37 (#11161 ) * phi-2 transformers 4.37	2024-06-05 13:36:41 -07:00
Zijie Li	bfa1367149	Add CPU and GPU example for MiniCPM (#11202 ) * Change installation address Change former address: "https://docs.conda.io/en/latest/miniconda.html#" to new address: "https://conda-forge.org/download/" for 63 occurrences under python\llm\example * Change Prompt Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence * Create and update model minicpm * Update model minicpm Update model minicpm under GPU/PyTorch-Models * Update readme and generate.py change "prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)" and delete "pip install transformers==4.37.0 " * Update comments for minicpm GPU Update comments for generate.py at minicpm GPU * Add CPU example for MiniCPM * Update minicpm README for CPU * Update README for MiniCPM and Llama3 * Update Readme for Llama3 CPU Pytorch * Update and fix comments for MiniCPM	2024-06-05 18:09:53 +08:00
Yuwen Hu	af96579c76	Update installation guide for pipeline parallel inference (#11224 ) * Update installation guide for pipeline parallel inference * Small fix * further fix * Small fix * Small fix * Update based on comments * Small fix * Small fix * Small fix	2024-06-05 17:54:29 +08:00
Xiangyu Tian	ac3d53ff5d	LLM: Fix vLLM CPU version error (#11206 ) Fix vLLM CPU version error	2024-06-04 19:10:23 +08:00
Qiyuan Gong	ce3f08b25a	Fix IPEX auto importer (#11192 ) * Fix ipex auto importer with Python builtins. * Raise errors if the user imports ipex manually before importing ipex_llm. Do nothing if they import ipex after importing ipex_llm. * Remove import ipex in examples.	2024-06-04 16:57:18 +08:00
Xiangyu Tian	f02f097002	Fix vLLM verion in CPU/vLLM-Serving example README (#11201 )	2024-06-04 15:56:55 +08:00
Zijie Li	a644e9409b	Miniconda/Anaconda -> Miniforge update in examples (#11194 ) * Change installation address Change former address: "https://docs.conda.io/en/latest/miniconda.html#" to new address: "https://conda-forge.org/download/" for 63 occurrences under python\llm\example * Change Prompt Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence	2024-06-04 10:14:02 +08:00
Qiyuan Gong	15a6205790	Fix LoRA tokenizer for Llama and chatglm (#11186 ) * Set pad_token to eos_token if it's None. Otherwise, use model config.	2024-06-03 15:35:38 +08:00
Shaojun Liu	401013a630	Remove chatglm_C Module to Eliminate LGPL Dependency (#11178 ) * remove chatglm_C.*.pyd to solve ngsolve weak copyright vunl fix style check error * remove chatglm native int4 from langchain	2024-05-31 17:03:11 +08:00
Wang, Jian4	c0f1be6aea	Fix pp logic (#11175 ) * only send no none batch and rank1-n sending first * always send first	2024-05-30 16:40:59 +08:00
Jin Qiao	dcbf4d3d0a	Add phi-3-vision example (#11156 ) * Add phi-3-vision example (HF-Automodels) * fix * fix * fix * Add phi-3-vision CPU example (HF-Automodels) * add in readme * fix * fix * fix * fix * use fp8 for gpu example * remove eval	2024-05-30 10:02:47 +08:00
Jiao Wang	93146b9433	Reconstruct Speculative Decoding example directory (#11136 ) * update * update * update	2024-05-29 13:15:27 -07:00
Xiangyu Tian	2299698b45	Refine Pipeline Parallel FastAPI example (#11168 )	2024-05-29 17:16:50 +08:00
Wang, Jian4	8e25de1126	LLM: Add codegeex2 example (#11143 ) * add codegeex example * update * update cpu * add GPU * add gpu * update readme	2024-05-29 10:00:26 +08:00
ZehuaCao	751e1a4e29	Fix concurrent issue in autoTP streming. (#11150 ) * add benchmark test * update	2024-05-29 08:22:38 +08:00
SONG Ge	33852bd23e	Refactor pipeline parallel device config (#11149 ) * refactor pipeline parallel device config * meet comments * update example * add warnings and update code doc	2024-05-28 16:52:46 +08:00
Xiangyu Tian	b44cf405e2	Refine Pipeline-Parallel-Fastapi example README (#11155 )	2024-05-28 15:18:21 +08:00

1 2 3 4 5 ...

468 commits