ipex-llm

Author	SHA1	Message	Date
binbin Deng	cd077881f1	Disable lm head (#11972 )	2024-08-30 11:05:18 +08:00
Yuwen Hu	2e49e1f8e9	Further fix for MiniCPM-V-2_6 example (#11965 )	2024-08-29 19:14:13 +08:00
Jason Dai	431affd0a0	Update README.md (#11964 )	2024-08-29 18:56:35 +08:00
binbin Deng	14b2c8dc32	Update qwen2-7b example script (#11961 )	2024-08-29 18:25:17 +08:00
Yuwen Hu	7abe17d6f7	Update MiniCPM-V-2_6 Example (#11958 ) * Update example scripts regarding warmup, stream generate, moudles to not convert, etc. * Update readme accordingly * Fix based on comments * Small fix * Remove n_predict	2024-08-29 18:23:48 +08:00
Yina Chen	5f7ff76ea5	update troubleshooting (#11960 )	2024-08-29 17:44:22 +08:00
Yina Chen	882f4a5ff7	Add lnl npu driver recommend version and enable cpu_lm_head on llama3 (#11952 ) * update lnl npu driver version and enable cpu_lm_head on llama3 * update * fix style * typo * address comments * update * add qwen2-7b	2024-08-29 15:01:18 +08:00
binbin Deng	71f03dcc39	Support qwen2-7b with fused decoderlayer optimization on NPU (#11912 )	2024-08-29 13:34:20 +08:00
SONG Ge	5ca7390082	[NPU] Add minicpm-2b support for npu multi-processing (#11949 ) * add minicpm-2b support * update example for minicpm-2b * add LNL NPU driver requirement in readme	2024-08-28 18:08:49 +08:00
hxsz1997	e23549f63f	Update llamaindex examples (#11940 ) * modify rag.py * update readme of gpu example * update llamaindex cpu example and readme * add llamaindex doc * update note style * import before instancing IpexLLMEmbedding * update index in readme * update links * update link * update related links	2024-08-28 14:03:44 +08:00
Zijie Li	90f692937d	Update npu baichuan2 (#11939 )	2024-08-27 16:56:26 +08:00
Jiao Wang	b4b6ddf73c	NPU Baichuan2 Multi- Process example (#11928 )	2024-08-27 15:25:49 +08:00
SONG Ge	a81a329a5f	[NPU] Add example for NPU multi-processing minicpm-1b model (#11935 ) * add minicpm example	2024-08-27 14:57:46 +08:00
Ch1y0q	730d9ec811	Add Qwen2-audio example (#11835 ) * add draft for qwen2-audio * update example for `Qwen2-Audio` * update * update * add warmup	2024-08-27 13:35:24 +08:00
Yina Chen	e246f1e258	update llama3 npu example (#11933 )	2024-08-27 13:03:18 +08:00
binbin Deng	14dddfc0d6	Update NPU example readme (#11931 )	2024-08-27 12:44:58 +08:00
Zijie Li	6c3eb1e1e8	refactor from_pretrained API for NPU (#11927 )	2024-08-27 09:50:30 +08:00
binbin Deng	dd303776cf	Add troubleshooting about transpose value setting	2024-08-26 16:06:32 +08:00
Zijie Li	794abe2ce8	update npu-readme (#11900 )	2024-08-22 17:49:35 +08:00
Jinhe	18662dca1c	change 5 pytorch/huggingface models to fp16 (#11894 )	2024-08-22 16:12:09 +08:00
Wang, Jian4	5c4ed00593	Add lightweight-serving whisper asr example (#11847 ) * add asr init * update for pp * update style * update readme * update reamde	2024-08-22 15:46:28 +08:00
Jinhe	a8e2573421	added tokenization file for codegeex2-6b in pytorch-models(#11875 ) * added tokenization file * tokenization file readme update * optional	2024-08-22 14:37:56 +08:00
binbin Deng	72a7bf624b	Support qwen2-1.5b with fused decoderlayer optimization on NPU (#11888 )	2024-08-22 11:09:12 +08:00
Zijie Li	bdbe995b01	Update README.md (#11889 ) Set datasets version to 2.16.1. Clear out the transformers version requirement.	2024-08-22 09:40:16 +08:00
SONG Ge	8c5c7f32dd	Update doc for running npu generate example with ipex-llm[npu] (#11876 ) * update doc for running npu generate example with ipex-llm[npu] * switch max_prompt_len to 512 to fix compile error on mtl	2024-08-21 13:45:29 +08:00
Jinhe	3ee194d983	Pytorch models transformers version update (#11860 ) * yi sync * delete 4.34 constraint * delete 4.34 constraint * delete 4.31 constraint * delete 4.34 constraint * delete 4.35 constraint * added <=4.33.3 constraint * added <=4.33.3 constraint * switched to chinese prompt	2024-08-20 18:01:42 +08:00
Yuwen Hu	5e8286f72d	Update `ipex-llm` default transformers version to 4.37.0 (#11859 ) * Update default transformers version to 4.37.0 * Add dependency requirements for qwen and qwen-vl * Temp fix transformers version for these not yet verified models * Skip qwen test in UT for now as it requires transformers<4.37.0	2024-08-20 17:37:58 +08:00
SONG Ge	5b83493b1a	Add ipex-llm npu option in setup.py (#11858 ) * add ipex-llm npu release * update example doc * meet latest release changes	2024-08-20 17:29:49 +08:00
Heyang Sun	ee6852c915	Fix typo (#11862 )	2024-08-20 16:38:11 +08:00
SONG Ge	7380823f3f	Update Llama2 multi-processes example (#11852 ) * update llama2 multi-processes examples * update * update readme * update	2024-08-19 19:49:01 +08:00
Yang Wang	99b05ba1dc	separate prefill into a process (#11787 ) * seperate prefill into a process * using model.share_memory() * might work * worked * use long prompt * refactor * cleanup * fix bug * clean up * changable inter and intra process stages * refactor * add max output len * fix npu_model changes that may cause generate down * fix npu_model generate import error * fix generare forward error --------- Co-authored-by: sgwhat <ge.song@intel.com>	2024-08-19 17:53:36 +08:00
Jinhe	da3d7a3a53	delete transformers version requirement (#11845 ) * delete transformers version requirement * delete transformers version requirement	2024-08-19 17:53:02 +08:00
Jinhe	e07a55665c	Codegeex2 tokenization fix (#11831 ) * updated tokenizer file * updated tokenizer file * updated tokenizer file * updated tokenizer file * new folder	2024-08-16 15:48:47 +08:00
Jinhe	adfbb9124a	Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples (#11815 ) * model to fp16 & 2_6 reorganize * revisions * revisions * half * deleted transformer version requirements * deleted transformer version requirements --------- Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>	2024-08-16 14:48:56 +08:00
Chu,Youcheng	f463268e36	fix: add run oneAPI instruction for the example of codeshell (#11828 ) * fix: delete ipex extension import in ppl wikitext evaluation * feat: add mixed_precision argument on ppl wikitext evaluation * fix: delete mix_precision command in perplex evaluation for wikitext * fix: remove fp16 mixed-presicion argument * fix: Add a space. * fix: add run oneAPI instruction for the example of codeshell * fix: textual adjustments * fix: Textual adjustment --------- Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>	2024-08-16 14:29:06 +08:00
Ch1y0q	447c8ed324	update transformers version for `replit-code-v1-3b`, `internlm2-chat-… (#11811 ) * update transformers version for `replit-code-v1-3b`, `internlm2-chat-7b` and mistral * remove for default transformers version	2024-08-15 16:40:48 +08:00
Jinhe	2fbbb51e71	transformers==4.37, yi & yuan2 & vicuna (#11805 ) * transformers==4.37 * added yi model * added yi model * xxxx * delete prompt template * / and delete	2024-08-15 15:39:24 +08:00
Jinhe	f43da2d455	deletion of specification of transformers version (#11808 )	2024-08-15 15:23:32 +08:00
Jinhe	d8d887edd2	added minicpm-v-2_6 (#11794 )	2024-08-14 16:23:44 +08:00
Yang Wang	51bcac1229	follow up on experimental support of fused decoder layer for llama2 (#11785 ) * clean up and support transpose value cache * refine * fix style * fix style	2024-08-13 18:53:55 -07:00
Heyang Sun	70c828b87c	deepspeed zero3 QLoRA finetuning (#11625 ) * deepspeed zero3 QLoRA finetuning * Update convert.py * Update low_bit_linear.py * Update utils.py * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update low_bit_linear.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update utils.py * Update convert.py * Update alpaca_qlora_finetuning.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update deepspeed_zero3.json * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update low_bit_linear.py * Update low_bit_linear.py * Update utils.py * fix style * fix style * Update alpaca_qlora_finetuning.py * Update qlora_finetune_llama2_13b_arch_2_card.sh * Update convert.py * Update low_bit_linear.py * Update model.py * Update alpaca_qlora_finetuning.py * Update low_bit_linear.py * Update low_bit_linear.py * Update low_bit_linear.py	2024-08-13 16:15:29 +08:00
binbin Deng	23d3acdc77	Add experimental support of fused decoder layer for llama2 (#11768 )	2024-08-13 14:41:36 +08:00
Jin, Qiao	c28b3389e6	Update npu multimodal example (#11773 )	2024-08-13 14:14:59 +08:00
Ruonan Wang	8db34057b4	optimize lookahead init time (#11769 )	2024-08-12 17:19:12 +08:00
Jin, Qiao	05989ad0f9	Update npu example and all in one benckmark (#11766 )	2024-08-12 16:46:46 +08:00
Ruonan Wang	7e917d6cfb	fix gptq of llama (#11749 ) * fix gptq of llama * small fix	2024-08-09 16:39:25 +08:00
Shaojun Liu	107f7aafd0	enable inference mode for deepspeed tp serving (#11742 )	2024-08-08 14:38:30 +08:00
Zijie Li	9e65cf00b3	Add openai-whisper pytorch gpu (#11736 ) * Add openai-whisper pytorch gpu * Update README.md * Update README.md * fix typo * fix names update readme * Update README.md	2024-08-08 12:32:59 +08:00
Jinhe	d0c89fb715	updated llama.cpp and ollama quickstart (#11732 ) * updated llama.cpp and ollama quickstart.md * added qwen2-1.5B sample output * revision on quickstart updates * revision on quickstart updates * revision on qwen2 readme * added 2 troubleshoots“ ” * troubleshoot revision	2024-08-08 11:04:01 +08:00
Ch1y0q	4676af2054	add `gemma2` example (#11724 ) * add `gemma2` * update `transformers` version * update `README.md`	2024-08-06 21:17:50 +08:00
Jin, Qiao	11650b6f81	upgrade glm-4v example transformers version (#11719 )	2024-08-06 14:55:09 +08:00
Jin, Qiao	7f241133da	Add MiniCPM-Llama3-V-2_5 GPU example (#11693 ) * Add MiniCPM-Llama3-V-2_5 GPU example * fix	2024-08-06 10:22:41 +08:00
Jin, Qiao	808d9a7bae	Add MiniCPM-V-2 GPU example (#11699 ) * Add MiniCPM-V-2 GPU example * add example in README.md * add example in README.md	2024-08-06 10:22:33 +08:00
Zijie Li	8fb36b9f4a	add new benchmark_util.py (#11713 ) * add new benchmark_util.py	2024-08-05 16:18:48 +08:00
Wang, Jian4	493cbd9a36	Support lightweight-serving with internlm-xcomposer2-vl-7b multimodal input (#11703 ) * init image_list * enable internlm-xcomposer2 image input * update style * add readme * update model * update readme	2024-08-05 09:36:04 +08:00
Qiyuan Gong	762ad49362	Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM (#11704 ) * DeepSpeed-AutoTP will start multiple processors to load models and convert them in CPU memory. If model/rank_num is large, this will lead to OOM. Add RANK_WAIT_TIME to reduce memory usage by controlling model reading parallelism.	2024-08-01 18:16:21 +08:00
Zijie Li	5079ed9e06	Add Llama3.1 example (#11689 ) * Add Llama3.1 example Add Llama3.1 example for Linux arc and Windows MTL * Changes made to adjust compatibilities transformers changed to 4.43.1 * Update index.rst * Update README.md * Update index.rst * Update index.rst * Update index.rst	2024-07-31 10:53:30 +08:00
Jin, Qiao	6e3ce28173	Upgrade glm-4 example transformers version (#11659 ) * upgrade glm-4 example transformers version * move pip install in one line	2024-07-31 10:24:50 +08:00
Jin, Qiao	a44ab32153	Switch to conhost when running on NPU (#11687 )	2024-07-30 17:08:06 +08:00
Guoqiong Song	336dfc04b1	fix 1482 (#11661 ) Co-authored-by: rnwang04 <ruonan1.wang@intel.com>	2024-07-26 12:39:09 -07:00
Wang, Jian4	23681fbf5c	Support codegeex4-9b for lightweight-serving (#11648 ) * add options, support prompt and not return end_token * enable openai parameter * set do_sample None and update style	2024-07-26 09:41:03 +08:00
Wang, Jian4	1eed0635f2	Add lightweight serving and support tgi parameter (#11600 ) * init tgi request * update openai api * update for pp * update and add readme * add to docker * add start bash * update * update * update	2024-07-19 13:15:56 +08:00
Guoqiong Song	380717f50d	fix gemma for 4.41 (#11531 ) * fix gemma for 4.41	2024-07-18 15:02:50 -07:00
Guoqiong Song	5a6211fd56	fix minicpm for transformers>=4.39 (#11533 ) * fix minicpm for transformers>=4.39	2024-07-18 15:01:57 -07:00
Guoqiong Song	bfcdc35b04	phi-3 on "transformers>=4.37.0,<=4.42.3" (#11534 )	2024-07-17 17:19:57 -07:00
Guoqiong Song	d64711900a	Fix cohere model on transformers>=4.41 (#11575 ) * fix cohere model for 4-41	2024-07-17 17:18:59 -07:00
Guoqiong Song	5b6eb85b85	phi model readme (#11595 ) Co-authored-by: rnwang04 <ruonan1.wang@intel.com>	2024-07-17 17:18:34 -07:00
Wang, Jian4	9c15abf825	Refactor fastapi-serving and add one card serving(#11581 ) * init fastapi-serving one card * mv api code to source * update worker * update for style-check * add worker * update bash * update * update worker name and add readme * rename update * rename to fastapi	2024-07-17 11:12:43 +08:00
Heyang Sun	365adad59f	Support LoRA ChatGLM with Alpaca Dataset (#11580 ) * Support LoRA ChatGLM with Alpaca Dataset * refine * fix * add 2-card alpaca	2024-07-16 15:40:02 +08:00
Ch1y0q	50cf563a71	Add example: MiniCPM-V (#11570 )	2024-07-15 10:55:48 +08:00
Zhao Changmin	06745e5742	Add npu benchmark all-in-one script (#11571 ) * npu benchmark	2024-07-15 10:42:37 +08:00
Xiangyu Tian	0981b72275	Fix /generate_stream api in Pipeline Parallel FastAPI (#11569 )	2024-07-12 13:19:42 +08:00
Zhao Changmin	b9c66994a5	add npu sdp (#11562 )	2024-07-11 16:57:35 +08:00
binbin Deng	2b8ad8731e	Support pipeline parallel for glm-4v (#11545 )	2024-07-11 16:06:06 +08:00
Xiangyu Tian	7f5111a998	LLM: Refine start script for Pipeline Parallel Serving (#11557 ) Refine start script and readme for Pipeline Parallel Serving	2024-07-11 15:45:27 +08:00
Zhao Changmin	105e124752	optimize phi3-v encoder npu performance and add multimodal example (#11553 ) * phi3-v * readme	2024-07-11 13:59:14 +08:00
Zhao Changmin	3c16c9f725	Optimize baichuan on NPU (#11548 ) * baichuan_npu	2024-07-10 13:18:48 +08:00
Zhao Changmin	76a5802acf	update NPU examples (#11540 ) * update NPU examples	2024-07-09 17:19:42 +08:00
Jason Dai	099486afb7	Update README.md (#11530 )	2024-07-08 20:18:41 +08:00
binbin Deng	66f6ffe4b2	Update GPU HF-Transformers example structure (#11526 )	2024-07-08 17:58:06 +08:00
Xiangyu Tian	7d8bc83415	LLM: Partial Prefilling for Pipeline Parallel Serving (#11457 ) LLM: Partial Prefilling for Pipeline Parallel Serving	2024-07-05 13:10:35 +08:00
binbin Deng	60de428b37	Support pipeline parallel for qwen-vl (#11503 )	2024-07-04 18:03:57 +08:00
Wang, Jian4	61c36ba085	Add pp_serving verified models (#11498 ) * add verified models * update * verify large model * update commend	2024-07-03 14:57:09 +08:00
binbin Deng	9274282ef7	Support pipeline parallel for glm-4-9b-chat (#11463 )	2024-07-03 14:25:28 +08:00
Wang, Jian4	4390e7dc49	Fix codegeex2 transformers version (#11487 )	2024-07-02 15:09:28 +08:00
Heyang Sun	913e750b01	fix non-string deepseed config path bug (#11476 ) * fix non-string deepseed config path bug * Update lora_finetune_chatglm.py	2024-07-01 15:53:50 +08:00
Yishuo Wang	319a3b36b2	fix npu llama2 (#11471 )	2024-07-01 10:14:11 +08:00
Heyang Sun	07362ffffc	ChatGLM3-6B LoRA Fine-tuning Demo (#11450 ) * ChatGLM3-6B LoRA Fine-tuning Demo * refine * refine * add 2-card deepspeed * refine format * add mpi4py and deepspeed install	2024-07-01 09:18:39 +08:00
Xiangyu Tian	fd933c92d8	Fix: Correct num_requests in benchmark for Pipeline Parallel Serving (#11462 )	2024-06-28 16:10:51 +08:00
binbin Deng	987017ef47	Update pipeline parallel serving for more model support (#11428 )	2024-06-27 18:21:01 +08:00
Yishuo Wang	cf0f5c4322	change npu document (#11446 )	2024-06-27 13:59:59 +08:00
binbin Deng	508c364a79	Add precision option in PP inference examples (#11440 )	2024-06-27 09:24:27 +08:00
Shaojun Liu	ab9f7f3ac5	FIX: Qwen1.5-GPTQ-Int4 inference error (#11432 ) * merge_qkv if quant_method is 'gptq' * fix python style checks * refactor * update GPU example	2024-06-26 15:36:22 +08:00
Jiao Wang	40fa23560e	Fix LLAVA example on CPU (#11271 ) * update * update * update * update	2024-06-25 20:04:59 -07:00
binbin Deng	e473b8d946	Add more qwen1.5 and qwen2 support for pipeline parallel inference (#11423 )	2024-06-25 15:49:32 +08:00
Yishuo Wang	3b23de684a	update npu examples (#11422 )	2024-06-25 13:32:53 +08:00
Xiangyu Tian	8ddae22cfb	LLM: Refactor Pipeline-Parallel-FastAPI example (#11319 ) Initially Refactor for Pipeline-Parallel-FastAPI example	2024-06-25 13:30:36 +08:00
SONG Ge	34c15d3a10	update pp document (#11421 )	2024-06-25 10:17:20 +08:00
Heyang Sun	c985912ee3	Add Deepspeed LoRA dependencies in document (#11410 )	2024-06-24 15:29:59 +08:00
SONG Ge	0c67639539	Add more examples for pipeline parallel inference (#11372 ) * add more model exampels for pipelien parallel inference * add mixtral and vicuna models * add yi model and past_kv supprot for chatglm family * add docs * doc update * add license * update	2024-06-21 17:55:16 +08:00
ivy-lv11	21fc781fce	Add GLM-4V example (#11343 ) * add example * modify * modify * add line * add * add link and replace with phi-3-vision template * fix generate options * fix * fix --------- Co-authored-by: jinbridge <2635480475@qq.com>	2024-06-21 12:54:31 +08:00
binbin Deng	4ba82191f2	Support PP inference for chatglm3 (#11375 )	2024-06-21 09:59:01 +08:00
Zijie Li	ae452688c2	Add NPU HF example (#11358 )	2024-06-19 18:07:28 +08:00
Heyang Sun	67a1e05876	Remove zero3 context manager from LoRA (#11346 )	2024-06-18 17:24:43 +08:00
Shaojun Liu	694912698e	Upgrade scikit-learn to 1.5.0 to fix dependabot issue (#11349 )	2024-06-18 15:47:25 +08:00
Heyang Sun	00f322d8ee	Finetune ChatGLM with Deepspeed Zero3 LoRA (#11314 ) * Fintune ChatGLM with Deepspeed Zero3 LoRA * add deepspeed zero3 config * rename config * remove offload_param * add save_checkpoint parameter * Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh * refine	2024-06-18 12:31:26 +08:00
binbin Deng	e50c890e1f	Support finishing PP inference once `eos_token_id` is found (#11336 )	2024-06-18 09:55:40 +08:00
Qiyuan Gong	de4bb97b4f	Remove accelerate 0.23.0 install command in readme and docker (#11333 ) *ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。	2024-06-17 17:52:12 +08:00
SONG Ge	ef4b6519fb	Add phi-3 model support for pipeline parallel inference (#11334 ) * add phi-3 model support * add phi3 example	2024-06-17 17:44:24 +08:00
SONG Ge	be00380f1a	Fix pipeline parallel inference past_key_value error in Baichuan (#11318 ) * fix past_key_value error * add baichuan2 example * fix style * update doc * add script link in doc * fix import error * update	2024-06-17 09:29:32 +08:00
Xiangyu Tian	4359ab3172	LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example (#11187 ) Add /generate_stream and OpenAI-formatted endpoint for Pipeline-Parallel-FastAPI example	2024-06-14 15:15:32 +08:00
Jin Qiao	0e7a31a09c	ChatGLM Examples Restructure regarding Installation Steps (#11285 ) * merge install step in glm examples * fix section * fix section * fix tiktoken	2024-06-14 12:37:05 +08:00
binbin Deng	60cb1dac7c	Support PP for qwen1.5 (#11300 )	2024-06-13 17:35:24 +08:00
binbin Deng	f97cce2642	Fix import error of ds autotp (#11307 )	2024-06-13 16:22:52 +08:00
binbin Deng	220151e2a1	Refactor pipeline parallel multi-stage implementation (#11286 )	2024-06-13 10:00:23 +08:00
ivy-lv11	e7a4e2296f	Add Stable Diffusion examples on GPU and CPU (#11166 ) * add sdxl and lcm-lora * readme * modify * add cpu * add license * modify * add file	2024-06-12 16:33:25 +08:00
Jin Qiao	f224e98297	Add GLM-4 CPU example (#11223 ) * Add GLM-4 example * add tiktoken dependency * fix * fix	2024-06-12 15:30:51 +08:00
Zijie Li	40fc8704c4	Add GPU example for GLM-4 (#11267 ) * Add GPU example for GLM-4 * Update streamchat.py * Fix pretrianed arguments Fix pretrained arguments in generate and streamchat.py * Update Readme Update install tiktoken required for GLM-4 * Update comments in generate.py	2024-06-12 14:29:50 +08:00
Wang, Jian4	6f2684e5c9	Update pp llama.py to save memory (#11233 )	2024-06-07 13:18:16 +08:00
Zijie Li	7b753dc8ca	Update sample output for HF Qwen2 GPU and CPU (#11257 )	2024-06-07 11:36:22 +08:00
Yuwen Hu	8c36b5bdde	Add qwen2 example (#11252 ) * Add GPU example for Qwen2 * Update comments in README * Update README for Qwen2 GPU example * Add CPU example for Qwen2 Sample Output under README pending * Update generate.py and README for CPU Qwen2 * Update GPU example for Qwen2 * Small update * Small fix * Add Qwen2 table * Update README for Qwen2 CPU and GPU Update sample output under README --------- Co-authored-by: Zijie Li <michael20001122@gmail.com>	2024-06-07 10:29:33 +08:00
Shaojun Liu	85df5e7699	fix nightly perf test (#11251 )	2024-06-07 09:33:14 +08:00
Guoqiong Song	09c6780d0c	phi-2 transformers 4.37 (#11161 ) * phi-2 transformers 4.37	2024-06-05 13:36:41 -07:00
Zijie Li	bfa1367149	Add CPU and GPU example for MiniCPM (#11202 ) * Change installation address Change former address: "https://docs.conda.io/en/latest/miniconda.html#" to new address: "https://conda-forge.org/download/" for 63 occurrences under python\llm\example * Change Prompt Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence * Create and update model minicpm * Update model minicpm Update model minicpm under GPU/PyTorch-Models * Update readme and generate.py change "prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)" and delete "pip install transformers==4.37.0 " * Update comments for minicpm GPU Update comments for generate.py at minicpm GPU * Add CPU example for MiniCPM * Update minicpm README for CPU * Update README for MiniCPM and Llama3 * Update Readme for Llama3 CPU Pytorch * Update and fix comments for MiniCPM	2024-06-05 18:09:53 +08:00
Yuwen Hu	af96579c76	Update installation guide for pipeline parallel inference (#11224 ) * Update installation guide for pipeline parallel inference * Small fix * further fix * Small fix * Small fix * Update based on comments * Small fix * Small fix * Small fix	2024-06-05 17:54:29 +08:00
Xiangyu Tian	ac3d53ff5d	LLM: Fix vLLM CPU version error (#11206 ) Fix vLLM CPU version error	2024-06-04 19:10:23 +08:00
Qiyuan Gong	ce3f08b25a	Fix IPEX auto importer (#11192 ) * Fix ipex auto importer with Python builtins. * Raise errors if the user imports ipex manually before importing ipex_llm. Do nothing if they import ipex after importing ipex_llm. * Remove import ipex in examples.	2024-06-04 16:57:18 +08:00
Xiangyu Tian	f02f097002	Fix vLLM verion in CPU/vLLM-Serving example README (#11201 )	2024-06-04 15:56:55 +08:00
Zijie Li	a644e9409b	Miniconda/Anaconda -> Miniforge update in examples (#11194 ) * Change installation address Change former address: "https://docs.conda.io/en/latest/miniconda.html#" to new address: "https://conda-forge.org/download/" for 63 occurrences under python\llm\example * Change Prompt Change "Anaconda Prompt" to "Miniforge Prompt" for 1 occurrence	2024-06-04 10:14:02 +08:00
Qiyuan Gong	15a6205790	Fix LoRA tokenizer for Llama and chatglm (#11186 ) * Set pad_token to eos_token if it's None. Otherwise, use model config.	2024-06-03 15:35:38 +08:00
Shaojun Liu	401013a630	Remove chatglm_C Module to Eliminate LGPL Dependency (#11178 ) * remove chatglm_C.*.pyd to solve ngsolve weak copyright vunl fix style check error * remove chatglm native int4 from langchain	2024-05-31 17:03:11 +08:00
Wang, Jian4	c0f1be6aea	Fix pp logic (#11175 ) * only send no none batch and rank1-n sending first * always send first	2024-05-30 16:40:59 +08:00
Jin Qiao	dcbf4d3d0a	Add phi-3-vision example (#11156 ) * Add phi-3-vision example (HF-Automodels) * fix * fix * fix * Add phi-3-vision CPU example (HF-Automodels) * add in readme * fix * fix * fix * fix * use fp8 for gpu example * remove eval	2024-05-30 10:02:47 +08:00
Jiao Wang	93146b9433	Reconstruct Speculative Decoding example directory (#11136 ) * update * update * update	2024-05-29 13:15:27 -07:00
Xiangyu Tian	2299698b45	Refine Pipeline Parallel FastAPI example (#11168 )	2024-05-29 17:16:50 +08:00
Wang, Jian4	8e25de1126	LLM: Add codegeex2 example (#11143 ) * add codegeex example * update * update cpu * add GPU * add gpu * update readme	2024-05-29 10:00:26 +08:00
ZehuaCao	751e1a4e29	Fix concurrent issue in autoTP streming. (#11150 ) * add benchmark test * update	2024-05-29 08:22:38 +08:00
SONG Ge	33852bd23e	Refactor pipeline parallel device config (#11149 ) * refactor pipeline parallel device config * meet comments * update example * add warnings and update code doc	2024-05-28 16:52:46 +08:00
Xiangyu Tian	b44cf405e2	Refine Pipeline-Parallel-Fastapi example README (#11155 )	2024-05-28 15:18:21 +08:00
Xiangyu Tian	5c8ccf0ba9	LLM: Add Pipeline-Parallel-FastAPI example (#10917 ) Add multi-stage Pipeline-Parallel-FastAPI example --------- Co-authored-by: hzjane <a1015616934@qq.com>	2024-05-27 14:46:29 +08:00
Ruonan Wang	d550af957a	fix security issue of eagle (#11140 ) * fix security issue of eagle * small fix	2024-05-27 10:15:28 +08:00
Jean Yu	ab476c7fe2	Eagle Speculative Sampling examples (#11104 ) * Eagle Speculative Sampling examples * rm multi-gpu and ray content * updated README to include Arc A770	2024-05-24 11:13:43 -07:00
Guancheng Fu	fabc395d0d	add langchain vllm interface (#11121 ) * done * fix * fix * add vllm * add langchain vllm exampels * add docs * temp	2024-05-24 17:19:27 +08:00
ZehuaCao	63e95698eb	[LLM]Reopen autotp generate_stream (#11120 ) * reopen autotp generate_stream * fix style error * update	2024-05-24 17:16:14 +08:00
Qiyuan Gong	120a0035ac	Fix type mismatch in eval for Baichuan2 QLora example (#11117 ) * During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.	2024-05-24 14:14:30 +08:00
Xiangyu Tian	b3f6faa038	LLM: Add CPU vLLM entrypoint (#11083 ) Add CPU vLLM entrypoint and update CPU vLLM serving example.	2024-05-24 09:16:59 +08:00
Qiyuan Gong	f6c9ffe4dc	Add WANDB_MODE and HF_HUB_OFFLINE to XPU finetune README (#11097 ) * Add WANDB_MODE=offline to avoid multi-GPUs finetune errors. * Add HF_HUB_OFFLINE=1 to avoid Hugging Face related errors.	2024-05-22 15:20:53 +08:00
Qiyuan Gong	492ed3fd41	Add verified models to GPU finetune README (#11088 ) * Add verified models to GPU finetune README	2024-05-21 15:49:15 +08:00
Qiyuan Gong	1210491748	ChatGLM3, Baichuan2 and Qwen1.5 QLoRA example (#11078 ) * Add chatglm3, qwen15-7b and baichuan-7b QLoRA alpaca example * Remove unnecessary tokenization setting.	2024-05-21 15:29:43 +08:00
ZehuaCao	842d6dfc2d	Further Modify CPU example (#11081 ) * modify CPU example * update	2024-05-21 13:55:47 +08:00
binbin Deng	7170dd9192	Update guide for running qwen with AutoTP (#11065 )	2024-05-20 10:53:17 +08:00
ZehuaCao	56cb992497	LLM: Modify CPU Installation Command for most examples (#11049 ) * init * refine * refine * refine * modify hf-agent example * modify all CPU model example * remove readthedoc modify * replace powershell with cmd * fix repo * fix repo * update * remove comment on windows code block * update * update * update * update --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com>	2024-05-17 15:52:20 +08:00
Xiangyu Tian	d963e95363	LLM: Modify CPU Installation Command for documentation (#11042 ) * init * refine * refine * refine * refine comments	2024-05-17 10:14:00 +08:00
Jin Qiao	9a96af4232	Remove oneAPI pip install command in related examples (#11030 ) * Remove pip install command in windows installation guide * fix chatglm3 installation guide * Fix gemma cpu example * Apply on other examples * fix	2024-05-16 10:46:29 +08:00
Wang, Jian4	d9f71f1f53	Update benchmark util for example using (#11027 ) * mv benchmark_util.py to utils/ * remove * update	2024-05-15 14:16:35 +08:00
binbin Deng	4053a6ef94	Update environment variable setting in AutoTP with arc (#11018 )	2024-05-15 10:23:58 +08:00
Ziteng Zhang	7d3791c819	[LLM] Add llama3 alpaca qlora example (#11011 ) * Add llama3 finetune example based on alpaca qlora example	2024-05-15 09:17:32 +08:00
Qiyuan Gong	c957ea3831	Add axolotl main support and axolotl Llama-3-8B QLoRA example (#10984 ) * Support axolotl main (796a085). * Add axolotl Llama-3-8B QLoRA example. * Change `sequence_len` to 256 for alpaca, and revert `lora_r` value. * Add example to quick_start.	2024-05-14 13:43:59 +08:00
Wang, Jian4	f4c615b1ee	Add cohere example (#10954 ) * add link first * add_cpu_example * add GPU example	2024-05-08 17:19:59 +08:00
Wang, Jian4	3209d6b057	Fix spculative llama3 no stop error (#10963 ) * fix normal * add eos_tokens_id on sp and add list if * update * no none	2024-05-08 17:09:47 +08:00
Xiangyu Tian	02870dc385	LLM: Refine README of AutoTP-FastAPI example (#10960 )	2024-05-08 16:55:23 +08:00
Xin Qiu	5973d6c753	make gemma's output better (#10943 )	2024-05-08 14:27:51 +08:00
Qiyuan Gong	164e6957af	Refine axolotl quickstart (#10957 ) * Add default accelerate config for axolotl quickstart. * Fix requirement link. * Upgrade peft to 0.10.0 in requirement.	2024-05-08 09:34:02 +08:00
Qiyuan Gong	c11170b96f	Upgrade Peft to 0.10.0 in finetune examples and docker (#10930 ) * Upgrade Peft to 0.10.0 in finetune examples. * Upgrade Peft to 0.10.0 in docker.	2024-05-07 15:12:26 +08:00
Qiyuan Gong	d7ca5d935b	Upgrade Peft version to 0.10.0 for LLM finetune (#10886 ) * Upgrade Peft version to 0.10.0 * Upgrade Peft version in ARC unit test and HF-Peft example.	2024-05-07 15:09:14 +08:00
hxsz1997	245c7348bc	Add codegemma example (#10884 ) * add codegemma example in GPU/HF-Transformers-AutoModels/ * add README of codegemma example in GPU/HF-Transformers-AutoModels/ * add codegemma example in GPU/PyTorch-Models/ * add readme of codegemma example in GPU/PyTorch-Models/ * add codegemma example in CPU/HF-Transformers-AutoModels/ * add readme of codegemma example in CPU/HF-Transformers-AutoModels/ * add codegemma example in CPU/PyTorch-Models/ * add readme of codegemma example in CPU/PyTorch-Models/ * fix typos * fix filename typo * add codegemma in tables * add comments of lm_head * remove comments of use_cache	2024-05-07 13:35:42 +08:00
Xiangyu Tian	13a44cdacb	LLM: Refine Deepspped-AutoTP-FastAPI example (#10916 )	2024-05-07 09:37:31 +08:00
Wang, Jian4	1de878bee1	LLM: Fix speculative llama3 long input error (#10934 )	2024-05-07 09:25:20 +08:00
Guancheng Fu	2c64754eb0	Add vLLM to ipex-llm serving image (#10807 ) * add vllm * done * doc work * fix done * temp * add docs * format * add start-fastchat-service.sh * fix	2024-04-29 17:25:42 +08:00
Jin Qiao	1f876fd837	Add example for phi-3 (#10881 ) * Add example for phi-3 * add in readme and index * fix * fix * fix * fix indent * fix	2024-04-29 16:43:55 +08:00
Xiangyu Tian	3d4950b0f0	LLM: Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example (#10876 ) Enable batch generate (world_size>1) in Deepspeed-AutoTP-FastAPI example.	2024-04-26 13:24:28 +08:00
Yang Wang	1ce8d7bcd9	Support the `desc_act` feature in GPTQ model (#10851 ) * support act_order * update versions * fix style * fix bug * clean up	2024-04-24 10:17:13 -07:00
binbin Deng	fabf54e052	LLM: make pipeline parallel inference example more common (#10786 )	2024-04-24 09:28:52 +08:00
hxsz1997	328b1a1de9	Fix the not stop issue of llama3 examples (#10860 ) * fix not stop issue in GPU/HF-Transformers-AutoModels * fix not stop issue in GPU/PyTorch-Models/Model/llama3 * fix not stop issue in CPU/HF-Transformers-AutoModels/Model/llama3 * fix not stop issue in CPU/PyTorch-Models/Model/llama3 * update the output in readme * update format * add reference * update prompt format * update output format in readme * update example output in readme	2024-04-23 19:10:09 +08:00
ZehuaCao	36eb8b2e96	Add llama3 speculative example (#10856 ) * Initial llama3 speculative example * update README * update README * update README	2024-04-23 17:03:54 +08:00
ZehuaCao	92ea54b512	Fix speculative decoding bug (#10855 )	2024-04-23 14:28:31 +08:00
Wang, Jian4	18c032652d	LLM: Add mixtral speculative CPU example (#10830 ) * init mixtral sp example * use different prompt_format * update output * update	2024-04-23 10:05:51 +08:00
Qiyuan Gong	5494aa55f6	Downgrade datasets in axolotl example (#10849 ) * Downgrade datasets to 2.15.0 to address axolotl prepare issue https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544 Tks to @kwaa for providing the solution in https://github.com/intel-analytics/ipex-llm/issues/10821#issuecomment-2068861571	2024-04-23 09:41:58 +08:00
Guancheng Fu	47bd5f504c	[vLLM]Remove vllm-v1, refactor v2 (#10842 ) * remove vllm-v1 * fix format	2024-04-22 17:51:32 +08:00
Wang, Jian4	23c6a52fb0	LLM: Fix ipex torchscript=True error (#10832 ) * remove * update * remove torchscript	2024-04-22 15:53:09 +08:00
Heyang Sun	fc33aa3721	fix missing import (#10839 )	2024-04-22 14:34:52 +08:00
Guancheng Fu	ae3b577537	Update README.md (#10833 )	2024-04-22 11:07:10 +08:00
Wang, Jian4	5f95054f97	LLM：Add qwen moe example libs md (#10828 )	2024-04-22 10:03:19 +08:00
Guancheng Fu	61c67af386	Fix vLLM-v2 install instructions(#10822 )	2024-04-22 09:02:48 +08:00
Yang Wang	8153c3008e	Initial llama3 example (#10799 ) * Add initial hf huggingface GPU example * Small fix * Add llama3 gpu pytorch model example * Add llama 3 hf transformers CPU example * Add llama 3 pytorch model CPU example * Fixes * Small fix * Small fixes * Small fix * Small fix * Add links * update repo id * change prompt tuning url * remove system header if there is no system prompt --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com>	2024-04-18 11:01:33 -07:00
Qiyuan Gong	e90e31719f	axolotl lora example (#10789 ) * Add axolotl lora example * Modify readme * Add comments in yml	2024-04-18 16:38:32 +08:00
Guancheng Fu	cbe7b5753f	Add vLLM[xpu] related code (#10779 ) * Add ipex-llm side change * add runable offline_inference * refactor to call vllm2 * Verified async server * add new v2 example * add README * fix * change dir * refactor readme.md * add experimental * fix	2024-04-18 15:29:20 +08:00
Ziteng Zhang	ff040c8f01	LISA Finetuning Example (#10743 ) * enabling xetla only supports qtype=SYM_INT4 or FP8E5 * LISA Finetuning Example on gpu * update readme * add licence * Explain parameters of lisa & Move backend codes to src dir * fix style * fix style * update readme * support chatglm * fix style * fix style * update readme * fix	2024-04-18 13:48:10 +08:00
Heyang Sun	581ebf6104	GaLore Finetuning Example (#10722 ) * GaLore Finetuning Example * Update README.md * Update README.md * change data to HuggingFaceH4/helpful_instructions * Update README.md * Update README.md * shrink train size and delete cache before starting training to save memory * Update README.md * Update galore_finetuning.py * change model to llama2 3b * Update README.md	2024-04-18 13:47:41 +08:00
Yina Chen	ea5b373a97	Add lookahead GPU example (#10785 ) * Add lookahead example * fix style & attn mask * fix typo * address comments	2024-04-17 17:41:55 +08:00
ZehuaCao	0646e2c062	Fix short prompt for IPEX_CPU speculative decoding cause no_attr error (#10783 )	2024-04-17 16:19:57 +08:00
Cengguang Zhang	7ec82c6042	LLM: add README.md for Long-Context examples. (#10765 ) * LLM: add readme to long-context examples. * add precision. * update wording. * add GPU type. * add Long-Context example to GPU examples. * fix comments. * update max input length. * update max length. * add output length. * fix wording.	2024-04-17 15:34:59 +08:00
Qiyuan Gong	9e5069437f	Fix gradio version in axolotl example (#10776 ) * Change to gradio>=4.19.2	2024-04-17 10:23:43 +08:00
Qiyuan Gong	f2e923b3ca	Axolotl v0.4.0 support (#10773 ) * Add Axolotl 0.4.0, remove legacy 0.3.0 support. * replace is_torch_bf16_gpu_available * Add HF_HUB_OFFLINE=1 * Move transformers out of requirement * Refine readme and qlora.yml	2024-04-17 09:49:11 +08:00
Heyang Sun	26cae0a39c	Update FLEX in Deepspeed README (#10774 ) * Update FLEX in Deepspeed README * Update README.md	2024-04-17 09:28:24 +08:00
Qiyuan Gong	d30b22a81b	Refine axolotl 0.3.0 documents and links (#10764 ) * Refine axolotl 0.3 based on comments * Rename requirements to requirement-xpu * Add comments for paged_adamw_32bit * change lora_r from 8 to 16	2024-04-16 14:47:45 +08:00
ZehuaCao	599a88db53	Add deepsped-autoTP-Fastapi serving (#10748 ) * add deepsped-autoTP-Fastapi serving * add readme * add license * update * update * fix	2024-04-16 14:03:23 +08:00
Jin Qiao	73a67804a4	GPU configuration update for examples (windows pip installer, etc.) (#10762 ) * renew chatglm3-6b gpu example readme fix fix fix * fix for comments * fix * fix * fix * fix * fix * apply on HF-Transformers-AutoModels * apply on PyTorch-Models * fix * fix	2024-04-15 17:42:52 +08:00
yb-peng	b5209d3ec1	Update example/GPU/PyTorch-Models/Model/llava/README.md (#10757 ) * Update example/GPU/PyTorch-Models/Model/llava/README.md * Update README.md fix path in windows installation	2024-04-15 13:01:37 +08:00
Jiao Wang	9e668a5bf0	fix_internlm-chat-7b-8k repo name in examples (#10747 )	2024-04-12 10:15:48 -07:00

... 2 3 4 5 6 ...

707 commits