ipex-llm

Author	SHA1	Message	Date
binbin Deng	9975b029c5	LLM: add qlora finetuning example using `trl.SFTTrainer` (#10183 )	2024-02-21 16:40:04 +08:00
Zhicun	c7e839e66c	Add Qwen1.5-7B-Chat (#10113 ) * add Qwen1.5-7B-Chat * modify Qwen1.5 example * update README * update prompt format * update folder name and example README * add Chinese prompt sample output * update link in README * correct the link * update transformer version	2024-02-21 13:29:29 +08:00
Ziteng Zhang	276ef0e885	Speculative Ziya on CPU (#10160 ) * Speculative Ziya on CPU * Without part of Accelerate with BIGDL_OPT_IPEX	2024-02-21 10:30:39 +08:00
Zhicun	add3899311	Add ziya CPU example (#10114 ) * ziya on CPU * add README for ziya * specify use_cache * add arc CPU * update prompt format * update link * add comments to emphasize use_cache * update pip cmd	2024-02-20 13:59:52 +08:00
Wang, Jian4	d3591383d5	LLM : Add CPU chatglm3 speculative example (#10004 ) * init chatglm * update * update	2024-02-19 13:38:52 +08:00
Heyang Sun	177273c1a4	IPEX Speculative Support for Baichuan2 7B (#10112 ) * IPEX Speculative Support for Baichuan2 7B * fix license problems * refine	2024-02-19 09:12:57 +08:00
binbin Deng	11fe5a87ec	LLM: add Modelscope model example (#10126 )	2024-02-08 11:18:07 +08:00
Jin Qiao	0fcfbfaf6f	LLM: add rwkv5 eagle GPU HF example (#10122 ) * LLM: add rwkv5 eagle example * fix * fix link	2024-02-07 16:58:29 +08:00
binbin Deng	c1ec3d8921	LLM: update FAQ about too many open files (#10119 )	2024-02-07 15:02:24 +08:00
Jin Qiao	d3d2ee1b63	LLM: add speech T5 GPU example (#10090 ) * add speech t5 example * fix * fix	2024-02-07 10:50:02 +08:00
Jin Qiao	2f4c754759	LLM: add bark gpu example (#10091 ) * add bark gpu example * fix * fix license * add bark * add example * fix * another way	2024-02-07 10:47:11 +08:00
SONG Ge	0eccb94d75	remove text-generation-webui from bigdl repo (#10107 )	2024-02-06 17:46:52 +08:00
Yuwen Hu	3a46b57253	[LLM] Add RWKV4 HF GPU Example (#10105 ) * Add GPU HF example for RWKV 4 * Add link to rwkv4 * fix	2024-02-06 16:30:24 +08:00
SONG Ge	4b02ff188b	[WebUI] Add prompt format and stopping words for Qwen (#10066 ) * add prompt format and stopping_words for qwen mdoel * performance optimization * optimize * update * meet comments	2024-02-05 18:23:13 +08:00
Zhicun	7d2be7994f	add phixtral and optimize phi-moe (#10052 )	2024-02-05 11:12:47 +08:00
SONG Ge	9050991e4e	fix gradio check issue temply (#10082 )	2024-02-04 16:46:29 +08:00
binbin Deng	7e49fbc5dd	LLM: make finetuning examples more common for other models (#10078 )	2024-02-04 16:03:52 +08:00
Heyang Sun	90f004b80b	remove benchmarkwrapper form deepspeed example (#10079 )	2024-02-04 15:42:15 +08:00
ivy-lv11	428b7105f6	Add HF and PyTorch example InternLM2 (#10061 )	2024-02-04 10:25:55 +08:00
Yina Chen	77be19bb97	LLM: Support gpt-j in speculative decoding (#10067 ) * gptj * support gptj in speculative decoding * fix * update readme * small fix	2024-02-02 14:54:55 +08:00
SONG Ge	19183ef476	[WebUI] Reset bigdl-llm loader options with default value (#10064 ) * reset bigdl-llm loader options with default value * remove options which maybe complex for naive users	2024-02-01 15:45:39 +08:00
binbin Deng	aae20d728e	LLM: Add initial DPO finetuning example (#10021 )	2024-02-01 14:18:08 +08:00
Heyang Sun	601024f418	Mistral CPU example of speculative decoding (#10024 ) * Mistral CPU example of speculative decoding * update transformres version * update example * Update README.md	2024-02-01 10:52:32 +08:00
WeiguangHan	a9018a0e95	LLM: modify the GPU example for redpajama model (#10044 ) * LLM: modify the GPU example for redpajama model * small fix	2024-01-31 14:32:08 +08:00
Yuxuan Xia	95636cad97	Add AutoGen CPU and XPU Example (#9980 ) * Add AutoGen example * Adjust AutoGen README * Adjust AutoGen README * Change AutoGen README * Change AutoGen README	2024-01-31 11:31:18 +08:00
Heyang Sun	7284edd9b7	Vicuna CPU example of speculative decoding (#10018 ) * Vicuna CPU example of speculative decoding * Update speculative.py * Update README.md * add requirements for ipex * Update README.md * Update speculative.py * Update speculative.py	2024-01-31 11:23:50 +08:00
Wang, Jian4	fb53b994f8	LLM : Add llama ipex optimized (#10046 ) * init ipex * remove padding	2024-01-31 10:38:46 +08:00
Heyang Sun	b1ff28ceb6	LLama2 CPU example of speculative decoding (#9962 ) * LLama2 example of speculative decoding * add docs * Update speculative.py * Update README.md * Update README.md * Update speculative.py * remove autocast	2024-01-31 09:45:20 +08:00
WeiguangHan	0fcad6ce14	LLM: add gpu example for redpajama models (#10040 )	2024-01-30 19:39:28 +08:00
Xiangyu Tian	9978089796	[LLM] Enable BIGDL_OPT_IPEX in speculative baichuan2 13b example (#10028 ) Enable BIGDL_OPT_IPEX in speculative baichuan2 13b example	2024-01-30 17:11:37 +08:00
Heyang Sun	cc3f122f6a	Baichuan2 CPU example of speculative decoding (#10003 ) * Baichuan2 CPU example of speculative decoding * Update generate.py * Update README.md * Update generate.py * Update generate.py * Update generate.py * fix default model * fix wrong chinese coding * Update generate.py * update prompt * update sample outputs * baichuan 7b needs transformers==4.31.0 * rename example file's name	2024-01-29 14:21:09 +08:00
Jin Qiao	440cfe18ed	LLM: GPU Example Updates for Windows (#9992 ) * modify aquila * modify aquila2 * add baichuan * modify baichuan2 * modify blue-lm * modify chatglm3 * modify chinese-llama2 * modiy codellama * modify distil-whisper * modify dolly-v1 * modify dolly-v2 * modify falcon * modify flan-t5 * modify gpt-j * modify internlm * modify llama2 * modify mistral * modify mixtral * modify mpt * modify phi-1_5 * modify qwen * modify qwen-vl * modify replit * modify solar * modify starcoder * modify vicuna * modify voiceassistant * modify whisper * modify yi * modify aquila2 * modify baichuan * modify baichuan2 * modify blue-lm * modify chatglm2 * modify chatglm3 * modify codellama * modify distil-whisper * modify dolly-v1 * modify dolly-v2 * modify flan-t5 * modify llama2 * modify llava * modify mistral * modify mixtral * modify phi-1_5 * modify qwen-vl * modify replit * modify solar * modify starcoder * modify yi * correct the comments * remove cpu_embedding in code for whisper and distil-whisper * remove comment * remove cpu_embedding for voice assistant * revert modify voice assistant * modify for voice assistant * add comment for voice assistant * fix comments * fix comments	2024-01-29 11:25:11 +08:00
SONG Ge	421e7cee80	[LLM] Add Text_Generation_WebUI Support (#9884 ) * initially add text_generation_webui support * add env requirements install * add necessary dependencies * update for starting webui * update shared and noted to place models * update heading of part3 * meet comments * add copyright license * remove extensions * convert tutorial to windows side * add warm-up to optimize performance	2024-01-26 15:12:49 +08:00
binbin Deng	171fb2d185	LLM: reorganize GPU finetuning examples (#9952 )	2024-01-25 19:02:38 +08:00
Wang, Jian4	093e6f8f73	LLM: Add qwen CPU speculative example (#9985 ) * init from gpu * update for cpu * update * update * fix xpu readme * update * update example prompt * update prompt and add 72b * update * update	2024-01-25 17:01:34 +08:00
Yina Chen	99ff6cf048	Update gpu spec decoding baichuan2 example dependency (#9990 ) * add dependency * update * update	2024-01-25 11:05:04 +08:00
Jason Dai	3bc3d0bbcd	Update self-speculative readme (#9986 )	2024-01-24 22:37:32 +08:00
Ruonan Wang	d4f65a6033	LLM: add mistral speculative example (#9976 ) * add mistral example * update	2024-01-24 17:35:15 +08:00
Yina Chen	b176cad75a	LLM: Add baichuan2 gpu spec example (#9973 ) * add baichuan2 gpu spec example * update readme & example * remove print * fix typo * meet comments * revert * update	2024-01-24 16:40:16 +08:00
Jinyi Wan	ec2d9de0ea	Fix README.md for solar (#9957 )	2024-01-24 15:50:54 +08:00
Mingyu Wei	bc9cff51a8	LLM GPU Example Update for Windows Support (#9902 ) * Update README in LLM GPU Examples * Update reference of Intel GPU * add cpu_embedding=True in comment * small fixes * update GPU/README.md and add explanation for cpu_embedding=True * address comments * fix small typos * add backtick for cpu_embedding=True * remove extra backtick in the doc * add period mark * update readme	2024-01-24 13:42:27 +08:00
Yina Chen	5aa4b32c1b	LLM: Add qwen spec gpu example (#9965 ) * add qwen spec gpu example * update readme --------- Co-authored-by: rnwang04 <ruonan1.wang@intel.com>	2024-01-23 15:59:43 +08:00
Ruonan Wang	60b35db1f1	LLM: add chatglm3 speculative decoding example (#9966 ) * add chatglm3 example * update * fix	2024-01-23 15:54:12 +08:00
Ruonan Wang	27b19106f3	LLM: add readme for speculative decoding gpu examples (#9961 ) * add readme * add readme * meet code review	2024-01-23 12:54:19 +08:00
Ruonan Wang	3e601f9a5d	LLM: Support speculative decoding in bigdl-llm (#9951 ) * first commit * fix error, add llama example * hidden print * update api usage * change to api v3 * update * meet code review * meet code review, fix style * add reference, fix style * fix style * fix first token time	2024-01-22 19:14:56 +08:00
binbin Deng	db8e90796a	LLM: add avg token latency information and benchmark guide of autotp (#9940 )	2024-01-19 15:09:57 +08:00
Heyang Sun	5184f400f9	Fix Mixtral GGUF Wrong Output Issue (#9930 ) * Fix Mixtral GGUF Wrong Output Issue * fix style * fix style	2024-01-18 14:11:27 +08:00
Jinyi Wan	07485eff5a	Add SOLAR-10.7B to README (#9869 )	2024-01-11 14:28:41 +08:00
ZehuaCao	e76d984164	[LLM] Support llm-awq vicuna-7b-1.5 on arc (#9874 ) * support llm-awq vicuna-7b-1.5 on arc * support llm-awq vicuna-7b-1.5 on arc	2024-01-10 14:28:39 +08:00
Yuwen Hu	023679459e	[LLM] Small fixes for finetune related examples and UTs (#9870 )	2024-01-09 18:05:03 +08:00
Yuwen Hu	23fc888abe	Update llm gpu xpu default related info to PyTorch 2.1 (#9866 )	2024-01-09 15:38:47 +08:00
ZehuaCao	146076bdb5	Support llm-awq backend (#9856 ) * Support for LLM-AWQ Backend * fix * Update README.md * Add awqconfig * modify init * update * support llm-awq * fix style * fix style * update * fix AwqBackendPackingMethod not found error * fix style * update README * fix style --------- Co-authored-by: Uxito-Ada <414416158@qq.com> Co-authored-by: Heyang Sun <60865256+Uxito-Ada@users.noreply.github.com> Co-authored-by: cyita <yitastudy@gmail.com>	2024-01-09 13:07:32 +08:00
binbin Deng	294fd32787	LLM: update DeepSpeed AutoTP example with GPU memory optimization (#9823 )	2024-01-09 09:22:49 +08:00
Mingyu Wei	ed81baa35e	LLM: Use default typing-extension in LangChain examples (#9857 ) * remove typing extension downgrade in readme; minor fixes of code * fix typos in README * change default question of docqa.py	2024-01-08 16:50:55 +08:00
Jinyi Wan	3147ebe63d	Add cpu and gpu examples for SOLAR-10.7B (#9821 )	2024-01-05 09:50:28 +08:00
Ruonan Wang	8504a2bbca	LLM: update qlora alpaca example to change lora usage (#9835 ) * update example * fix style	2024-01-04 15:22:20 +08:00
Ziteng Zhang	05b681fa85	[LLM] IPEX auto importer set on by default (#9832 ) * Set BIGDL_IMPORT_IPEX default to True * Remove import intel_extension_for_pytorch as ipex from GPU example	2024-01-04 13:33:29 +08:00
Wang, Jian4	4ceefc9b18	LLM: Support bitsandbytes config on qlora finetune (#9715 ) * test support bitsandbytesconfig * update style * update cpu example * update example * update readme * update unit test * use bfloat16 * update logic * use int4 * set defalut bnb_4bit_use_double_quant * update * update example * update model.py * update * support lora example	2024-01-04 11:23:16 +08:00
Wang, Jian4	a54cd767b1	LLM: Add gguf falcon (#9801 ) * init falcon * update convert.py * update style	2024-01-03 14:49:02 +08:00
binbin Deng	6584539c91	LLM: fix installation of codellama (#9813 )	2024-01-02 14:32:50 +08:00
Wang, Jian4	7ed9538b9f	LLM: support gguf mpt (#9773 ) * add gguf mpt * update	2023-12-28 09:22:39 +08:00
binbin Deng	40edb7b5d7	LLM: fix get environment variables setting (#9787 )	2023-12-27 09:11:37 +08:00
Jason Dai	361781bcd0	Update readme (#9788 )	2023-12-26 19:46:11 +08:00
Ziteng Zhang	44b4a0c9c5	[LLM] Correct prompt format of Yi, Llama2 and Qwen in generate.py (#9786 ) * correct prompt format of Yi * correct prompt format of llama2 in cpu generate.py * correct prompt format of Qwen in GPU example	2023-12-26 16:57:55 +08:00
Heyang Sun	66e286a73d	Support for Mixtral AWQ (#9775 ) * Support for Mixtral AWQ * Update README.md * Update README.md * Update awq_config.py * Update README.md * Update README.md	2023-12-25 16:08:09 +08:00
Ruonan Wang	1917bbe626	LLM: fix `BF16Linear` related training & inference issue (#9755 ) * fix bf16 related issue * fix * update based on comment & add arc lora script * update readme * update based on comment * update based on comment * update * force to bf16 * fix style * move check input dtype into function * update convert * meet code review * meet code review * update merged model to support new training_mode api * fix typo	2023-12-25 14:49:30 +08:00
Yina Chen	449b387125	Support relora in bigdl-llm (#9687 ) * init * fix style * update * support resume & update readme * update * update * remove important * add training mode * meet comments	2023-12-25 14:04:28 +08:00
Yishuo Wang	be13b162fe	add codeshell example (#9743 )	2023-12-25 10:54:01 +08:00
binbin Deng	ed8ed76d4f	LLM: update deepspeed autotp usage (#9733 )	2023-12-25 09:41:14 +08:00
Qiyuan Gong	4c487313f2	Revert "[LLM] IPEX auto importer turn on by default for XPU (#9730 )" (#9759 ) This reverts commit `0284801fbd`.	2023-12-22 16:38:24 +08:00
Qiyuan Gong	0284801fbd	[LLM] IPEX auto importer turn on by default for XPU (#9730 ) * Set BIGDL_IMPORT_IPEX default to true, i.e., auto import IPEX for XPU. * Remove import intel_extension_for_pytorch as ipex from GPU example. * Add support for bigdl-core-xe-21.	2023-12-22 16:20:32 +08:00
Ruonan Wang	2f36769208	LLM: bigdl-llm lora support & lora example (#9740 ) * lora support and single card example * support multi-card, refactor code * fix model id and style * remove torch patch, add two new class for bf16, update example * fix style * change to training_mode * small fix * add more info in help * fixstyle, update readme * fix ut * fix ut * Handling compatibility issues with default LoraConfig	2023-12-22 11:05:39 +08:00
Wang, Jian4	984697afe2	LLM: Add bloom gguf support (#9734 ) * init * update bloom add merges * update * update readme * update for llama error * update	2023-12-21 14:06:25 +08:00
Heyang Sun	1fa7793fc0	Load Mixtral GGUF Model (#9690 ) * Load Mixtral GGUF Model * refactor * fix empty tensor when to cpu * update gpu and cpu readmes * add dtype when set tensor into module	2023-12-19 13:54:38 +08:00
binbin Deng	12df70953e	LLM: add resume_from_checkpoint related section (#9705 )	2023-12-18 12:27:02 +08:00
Wang, Jian4	b8437a1c1e	LLM: Add gguf mistral model support (#9691 ) * add mistral support * need to upgrade transformers version * update	2023-12-15 13:37:39 +08:00
Wang, Jian4	496bb2e845	LLM: Support load BaiChuan model family gguf model (#9685 ) * support baichuan model family gguf model * update gguf generate.py * add verify models * add support model_family * update * update style * update type * update readme * update * remove support model_family	2023-12-15 13:34:33 +08:00
Lilac09	3afed99216	fix path issue (#9696 )	2023-12-15 11:21:49 +08:00
Jason Dai	37f509bb95	Update readme (#9692 )	2023-12-14 19:50:21 +08:00
Ziteng Zhang	21c7503a42	[LLM] Correct prompt format of Qwen in generate.py (#9678 ) * Change qwen prompt format to chatml	2023-12-14 14:01:30 +08:00
Qiyuan Gong	223c9622f7	[LLM] Mixtral CPU examples (#9673 ) * Mixtral CPU PyTorch and hugging face examples, based on #9661 and #9671	2023-12-14 10:35:11 +08:00
ZehuaCao	877229f3be	[LLM]Add Yi-34B-AWQ to verified AWQ model. (#9676 ) * verfiy Yi-34B-AWQ * update	2023-12-14 09:55:47 +08:00
binbin Deng	68a4be762f	remove disco mixtral, update oneapi version (#9671 )	2023-12-13 23:24:59 +08:00
ZehuaCao	503880809c	verfiy codeLlama (#9668 )	2023-12-13 15:39:31 +08:00
Heyang Sun	c64e2248ef	fix str returned by get_int_from_str rather than expected int (#9667 )	2023-12-13 11:01:21 +08:00
binbin Deng	bf1bcf4a14	add official Mixtral model support (#9663 )	2023-12-12 22:27:07 +08:00
binbin Deng	2fe38b4b9b	LLM: add mixtral GPU examples (#9661 )	2023-12-12 20:26:36 +08:00
ZehuaCao	45721f3473	verfiy llava (#9649 )	2023-12-11 14:26:05 +08:00
Heyang Sun	9f02f96160	[LLM] support for Yi AWQ model (#9648 )	2023-12-11 14:07:34 +08:00
Yina Chen	70f5e7bf0d	Support peft LoraConfig (#9636 ) * support peft loraconfig * use testcase to test * fix style * meet comments	2023-12-08 16:13:03 +08:00
binbin Deng	499100daf1	LLM: Add solution to fix `oneccl` related error (#9630 )	2023-12-08 10:51:55 +08:00
ZehuaCao	6eca8a8bb5	update transformer version (#9631 )	2023-12-08 09:36:00 +08:00
Heyang Sun	3811cf43c9	[LLM] update AWQ documents (#9623 ) * [LLM] update AWQ and verified models' documents * refine * refine links * refine	2023-12-07 16:02:20 +08:00
Jason Dai	51b668f229	Update GGUF readme (#9611 )	2023-12-06 18:21:54 +08:00
dingbaorong	a7bc89b3a1	remove q4_1 in gguf example (#9610 ) * remove q4_1 * fixes	2023-12-06 16:00:05 +08:00
Yina Chen	404e101ded	QALora example (#9551 ) * Support qa-lora * init * update * update * update * update * update * update merge * update * fix style & update scripts * update * address comments * fix typo * fix typo --------- Co-authored-by: Yang Wang <yang3.wang@intel.com>	2023-12-06 15:36:21 +08:00
dingbaorong	89069d6173	Add gpu gguf example (#9603 ) * add gpu gguf example * some fixes * address kai's comments * address json's comments	2023-12-06 15:17:54 +08:00
Ziteng Zhang	aeb77b2ab1	Add minimum Qwen model version (#9606 )	2023-12-06 11:49:14 +08:00
Heyang Sun	4e70e33934	[LLM] code and document for distributed qlora (#9585 ) * [LLM] code and document for distributed qlora * doc * refine for gradient checkpoint * refine * Update alpaca_qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py * add link in doc	2023-12-06 09:23:17 +08:00
Zheng, Yi	d154b38bf9	Add llama2 gpu low memory example (#9514 ) * Add low memory example * Minor fixes * Update readme.md	2023-12-05 17:29:48 +08:00
Jason Dai	06febb5fa7	Update readme for FP8/FP4 inference examples (#9601 )	2023-12-05 15:59:03 +08:00
dingbaorong	a66fbedd7e	add gpu more data types example (#9592 ) * add gpu more data types example * add int8	2023-12-05 15:45:38 +08:00
Jinyi Wan	b721138132	Add cpu and gpu examples for BlueLM (#9589 ) * Add cpu int4 example for BlueLM * addexample optimize_model cpu for bluelm * add example gpu int4 blueLM * add example optimiza_model GPU for bluelm * Fixing naming issues and BigDL package version. * Fixing naming issues... * Add BlueLM in README.md "Verified Models"	2023-12-05 13:59:02 +08:00
Guancheng Fu	8b00653039	fix doc (#9599 )	2023-12-05 13:49:31 +08:00
Wang, Jian4	ed0dc57c6e	LLM: Add cpu qlora support other models guide (#9567 ) * use bf16 flag * add using baichuan model * update merge * remove * update	2023-12-01 11:18:04 +08:00
Jason Dai	bda404fc8f	Update readme (#9575 )	2023-11-30 22:45:52 +08:00
Yishuo Wang	66f5b45f57	[LLM] add a llama2 gguf example (#9553 )	2023-11-30 16:37:17 +08:00
Wang, Jian4	a0a80d232e	LLM: Add qlora cpu distributed readme (#9561 ) * init readme * add distributed guide * update	2023-11-30 13:42:30 +08:00
Qiyuan Gong	d85a430a8c	Uing bigdl-llm-init instead of bigdl-nano-init (#9558 ) * Replace `bigdl-nano-init` with `bigdl-llm-init`. * Install `bigdl-llm` instead of `bigdl-nano`. * Remove nano in README.	2023-11-30 10:10:29 +08:00
binbin Deng	4ff2ca9d0d	LLM: fix loss error on Arc (#9550 )	2023-11-29 15:16:18 +08:00
Wang, Jian4	b824754256	LLM: Update for cpu qlora mpirun (#9548 )	2023-11-29 10:56:17 +08:00
Guancheng Fu	963a5c8d79	Add vLLM-XPU version's README/examples (#9536 ) * test * test * fix last kv cache * add xpu readme * remove numactl for xpu example * fix link error * update max_num_batched_tokens logic * add explaination * add xpu environement version requirement * refine gpu memory * fix * fix style	2023-11-28 09:44:03 +08:00
Guancheng Fu	b6c3520748	Remove xformers from vLLM-CPU (#9535 )	2023-11-27 11:21:25 +08:00
binbin Deng	2b9c7d2a59	LLM: quick fix alpaca qlora finetuning script (#9534 )	2023-11-27 11:04:27 +08:00
binbin Deng	6bec0faea5	LLM: support Mistral AWQ models (#9520 )	2023-11-24 16:20:22 +08:00
Jason Dai	b3178d449f	Update README.md (#9525 )	2023-11-23 21:45:20 +08:00
Jason Dai	82898a4203	Update GPU example README (#9524 )	2023-11-23 21:20:26 +08:00
Jason Dai	064848028f	Update README.md (#9523 )	2023-11-23 21:16:21 +08:00
Guancheng Fu	bf579507c2	Integrate vllm (#9310 ) * done * Rename structure * add models * Add structure/sampling_params,sequence * add input_metadata * add outputs * Add policy,logger * add and update * add parallelconfig back * core/scheduler.py * Add llm_engine.py * Add async_llm_engine.py * Add tested entrypoint * fix minor error * Fix everything * fix kv cache view * fix * fix * fix * format&refine * remove logger from repo * try to add token latency * remove logger * Refine config.py * finish worker.py * delete utils.py * add license * refine * refine sequence.py * remove sampling_params.py * finish * add license * format * add license * refine * refine * Refine line too long * remove exception * so dumb style-check * refine * refine * refine * refine * refine * refine * add README * refine README * add warning instead error * fix padding * add license * format * format * format fix * Refine vllm dependency (#1) vllm dependency clear * fix licence * fix format * fix format * fix * adapt LLM engine * fix * add license * fix format * fix * Moving README.md to the correct position * Fix readme.md * done * guide for adding models * fix * Fix README.md * Add new model readme * remove ray-logic * refactor arg_utils.py * remove distributed_init_method logic * refactor entrypoints * refactor input_metadata * refactor model_loader * refactor utils.py * refactor models * fix api server * remove vllm.stucture * revert by txy 1120 * remove utils * format * fix license * add bigdl model * Refer to a specfic commit * Change code base * add comments * add async_llm_engine comment * refine * formatted * add worker comments * add comments * add comments * fix style * add changes --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com> Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com> Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2023-11-23 16:46:45 +08:00
Heyang Sun	48fbb1eb94	support ccl (MPI) distributed mode in alpaca_qlora_finetuning_cpu (#9507 )	2023-11-23 10:58:09 +08:00
Heyang Sun	11fa5a8a0e	Fix QLoRA CPU dispatch_model issue about accelerate (#9506 )	2023-11-23 08:41:25 +08:00
Heyang Sun	1453046938	install bigdl-llm in deepspeed cpu inference example (#9508 )	2023-11-23 08:39:21 +08:00
binbin Deng	86743fb57b	LLM: fix transformers version in CPU finetuning example (#9511 )	2023-11-22 15:53:07 +08:00
binbin Deng	1a2129221d	LLM: support resume from checkpoint in Alpaca QLoRA (#9502 )	2023-11-22 13:49:14 +08:00
Ruonan Wang	076d106ef5	LLM: GPU QLoRA update to bf16 to accelerate gradient checkpointing (#9499 ) * update to bf16 to accelerate gradient checkpoint * add utils and fix ut	2023-11-21 17:08:36 +08:00
binbin Deng	b7ae572ac3	LLM: update Alpaca QLoRA finetuning example on GPU (#9492 )	2023-11-21 14:22:19 +08:00
Wang, Jian4	c5cb3ab82e	LLM : Add CPU alpaca qlora example (#9469 ) * init * update xpu to cpu * update * update readme * update example * update * add refer * add guide to train different datasets * update readme * update	2023-11-21 09:19:58 +08:00
binbin Deng	96fd26759c	LLM: fix QLoRA finetuning example on CPU (#9489 )	2023-11-20 14:31:24 +08:00
binbin Deng	3dac21ac7b	LLM: add more example usages about alpaca qlora on different hardware (#9458 )	2023-11-17 09:56:43 +08:00
Heyang Sun	921b263d6a	update deepspeed install and run guide in README (#9441 )	2023-11-17 09:11:39 +08:00
Yina Chen	d5263e6681	Add awq load support (#9453 ) * Support directly loading GPTQ models from huggingface * fix style * fix tests * change example structure * address comments * fix style * init * address comments * add examples * fix style * fix style * fix style * fix style * update * remove * meet comments * fix style --------- Co-authored-by: Yang Wang <yang3.wang@intel.com>	2023-11-16 14:06:25 +08:00
Ruonan Wang	0f82b8c3a0	LLM: update qlora example (#9454 ) * update qlora example * fix loss=0	2023-11-15 09:24:15 +08:00
Yang Wang	51d07a9fd8	Support directly loading gptq models from huggingface (#9391 ) * Support directly loading GPTQ models from huggingface * fix style * fix tests * change example structure * address comments * fix style * address comments	2023-11-13 20:48:12 -08:00
Heyang Sun	da6bbc8c11	fix deepspeed dependencies to install (#9400 ) * remove reductant parameter from deepspeed install * Update install.sh * Update install.sh	2023-11-13 16:42:50 +08:00
Zheng, Yi	9b5d0e9c75	Add examples for Yi-6B (#9421 )	2023-11-13 10:53:15 +08:00
Wang, Jian4	ac7fbe77e2	Update qlora readme (#9416 )	2023-11-12 19:29:29 +08:00
Zheng, Yi	0674146cfb	Add cpu and gpu examples of distil-whisper (#9374 ) * Add distil-whisper examples * Fixes based on comments * Minor fixes --------- Co-authored-by: Ariadne330 <wyn2000330@126.com>	2023-11-10 16:09:55 +08:00
Ziteng Zhang	ad81b5d838	Update qlora README.md (#9422 )	2023-11-10 15:19:25 +08:00
Heyang Sun	b23b91407c	fix llm-init on deepspeed missing lib (#9419 )	2023-11-10 13:51:24 +08:00
dingbaorong	36fbe2144d	Add CPU examples of fuyu (#9393 ) * add fuyu cpu examples * add gpu example * add comments * add license * remove gpu example * fix inference time	2023-11-09 15:29:19 +08:00
binbin Deng	54d95e4907	LLM: add alpaca qlora finetuning example (#9276 )	2023-11-08 16:25:17 +08:00
binbin Deng	97316bbb66	LLM: highlight transformers version requirement in mistral examples (#9380 )	2023-11-08 16:05:03 +08:00
Heyang Sun	af94058203	[LLM] Support CPU deepspeed distributed inference (#9259 ) * [LLM] Support CPU Deepspeed distributed inference * Update run_deepspeed.py * Rename * fix style * add new codes * refine * remove annotated codes * refine * Update README.md * refine doc and example code	2023-11-06 17:56:42 +08:00
Jin Qiao	e6b6afa316	LLM: add aquila2 model example (#9356 )	2023-11-06 15:47:39 +08:00
Yining Wang	9377b9c5d7	add CodeShell CPU example (#9345 ) * add CodeShell CPU example * fix some problems	2023-11-03 13:15:54 +08:00
Zheng, Yi	63411dff75	Add cpu examples of WizardCoder (#9344 ) * Add wizardcoder example * Minor fixes	2023-11-02 20:22:43 +08:00
dingbaorong	2e3bfbfe1f	Add internlm_xcomposer cpu examples (#9337 ) * add internlm-xcomposer cpu examples * use chat * some fixes * add license * address shengsheng's comments * use demo.jpg	2023-11-02 15:50:02 +08:00
Jin Qiao	97a38958bd	LLM: add CodeLlama CPU and GPU examples (#9338 ) * LLM: add codellama CPU pytorch examples * LLM: add codellama CPU transformers examples * LLM: add codellama GPU transformers examples * LLM: add codellama GPU pytorch examples * LLM: add codellama in readme * LLM: add LLaVA link	2023-11-02 15:34:25 +08:00
Zheng, Yi	63b2556ce2	Add cpu examples of skywork (#9340 )	2023-11-02 15:10:45 +08:00
dingbaorong	f855a864ef	add llava gpu example (#9324 ) * add llava gpu example * use 7b model * fix typo * add in README	2023-11-02 14:48:29 +08:00
Wang, Jian4	149146004f	LLM: Add qlora finetunning CPU example (#9275 ) * add qlora finetunning example * update readme * update example * remove merge.py and update readme	2023-11-02 09:45:42 +08:00
Cengguang Zhang	9f3d4676c6	LLM: Add qwen-vl gpu example (#9290 ) * create qwen-vl gpu example. * add readme. * fix. * change input figure and update outputs. * add qwen-vl pytorch model gpu example. * fix. * add readme.	2023-11-01 11:01:39 +08:00
Jin Qiao	96f8158fe2	LLM: adjust dolly v2 GPU example README (#9318 )	2023-11-01 09:50:22 +08:00
Jin Qiao	c44c6dc43a	LLM: add chatglm3 examples (#9305 )	2023-11-01 09:50:05 +08:00
Ruonan Wang	d383ee8efb	LLM: update QLoRA example about accelerate version(#9314 )	2023-10-31 13:54:38 +08:00
dingbaorong	ee5becdd61	use coco image in Qwen-VL (#9298 ) * use coco image * add output * address yuwen's comments	2023-10-30 14:32:35 +08:00
Yang Wang	8838707009	Add deepspeed autotp example readme (#9289 ) * Add deepspeed autotp example readme * change word	2023-10-27 13:04:38 -07:00
dingbaorong	f053688cad	add cpu example of LLaVA (#9269 ) * add LLaVA cpu example * Small text updates * update link --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2023-10-27 18:59:20 +08:00
Zheng, Yi	7f2ad182fd	Minor Fixes of README (#9294 )	2023-10-27 18:25:46 +08:00
Zheng, Yi	1bff54a378	Display demo.jpg n the README.md of HuggingFace Transformers Agent (#9293 ) * Display demo.jpg * remove demo.jpg	2023-10-27 18:00:03 +08:00
Zheng, Yi	a4a1dec064	Add a cpu example of HuggingFace Transformers Agent (use vicuna-7b-v1.5) (#9284 ) * Add examples of HF Agent * Modify folder structure and add link of demo.jpg * Fixes of readme * Merge applications and Applications	2023-10-27 17:14:12 +08:00
Guoqiong Song	aa319de5e8	Add streaming-llm using llama2 on CPU (#9265 ) Enable streaming-llm to let model take infinite inputs, tested on desktop and SPR10	2023-10-27 01:30:39 -07:00
Yang Wang	067c7e8098	Support deepspeed AutoTP (#9230 ) * Support deepspeed * add test script * refactor convert * refine example * refine * refine example * fix style * refine example and adapte latest ipex * fix style	2023-10-24 23:46:28 -07:00
Yining Wang	a6a8afc47e	Add qwen vl CPU example (#9221 ) * eee * add examples on CPU and GPU * fix * fix * optimize model examples * add Qwen-VL-Chat CPU example * Add Qwen-VL CPU example * fix optimize problem * fix error * Have updated, benchmark fix removed from this PR * add generate API example * Change formats in qwen-vl example * Add CPU transformer int4 example for qwen-vl * fix repo-id problem and add Readme * change picture url * Remove unnecessary file --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2023-10-25 13:22:12 +08:00
dingbaorong	5a2ce421af	add cpu and gpu examples of flan-t5 (#9171 ) * add cpu and gpu examples of flan-t5 * address yuwen's comments * Add explanation why we add modules to not convert * Refine prompt and add a translation example * Add a empty line at the end of files * add examples of flan-t5 using optimize_mdoel api * address bin's comments * address binbin's comments * add flan-t5 in readme	2023-10-24 15:24:01 +08:00
Yining Wang	4a19f50d16	phi-1_5 CPU and GPU examples (#9173 ) * eee * add examples on CPU and GPU * fix * fix * optimize model examples * have updated * Warmup and configs added * Update two tables	2023-10-24 15:08:04 +08:00
Xin Qiu	0c5055d38c	add position_ids and fuse embedding for falcon (#9242 ) * add position_ids for falcon * add cpu * add cpu * add license	2023-10-24 09:58:20 +08:00
Jin Qiao	a3b664ed03	LLM: add GPU More-Data-Types and Save/Load example (#9199 )	2023-10-18 13:13:45 +08:00
Jin Qiao	d946bd7c55	LLM: add CPU More-Data-Types and Save-Load examples (#9179 )	2023-10-17 14:38:52 +08:00
Ruonan Wang	c0497ab41b	LLM: support kv_cache optimization for Qwen-VL-Chat (#9193 ) * dupport qwen_vl_chat * fix style	2023-10-17 13:33:56 +08:00
Yang Wang	7a2de00b48	Fixes for xpu Bf16 training (#9156 ) * Support bf16 training * Use a stable transformer version * remove env * fix style	2023-10-14 21:28:59 -07:00
Jin Qiao	db7f938fdc	LLM: add replit and starcoder to gpu pytorch model example (#9154 )	2023-10-13 15:44:17 +08:00
Jin Qiao	797b156a0d	LLM: add dolly-v1 and dolly-v2 to gpu pytorch model example (#9153 )	2023-10-13 15:43:35 +08:00
Jin Qiao	f754ab3e60	LLM: add baichuan and baichuan2 to gpu pytorch model example (#9152 )	2023-10-13 13:44:31 +08:00
JIN Qiao	1a1ddc4144	LLM: Add Replit CPU and GPU example (#9028 )	2023-10-12 13:42:14 +08:00
JIN Qiao	d74834ff4c	LLM: add gpu pytorch-models example llama2 and chatglm2 (#9142 )	2023-10-12 13:41:48 +08:00
binbin Deng	995b0f119f	LLM: update some gpu examples (#9136 )	2023-10-11 14:23:56 +08:00
binbin Deng	2ad67a18b1	LLM: add mistral examples (#9121 )	2023-10-11 13:38:15 +08:00
Guoqiong Song	e8c5645067	add LLM example of aquila on GPU (#9056 ) * aquila, dolly-v1, dolly-v2, vacuna	2023-10-10 17:01:35 -07:00
binbin Deng	5e9962b60e	LLM: update example layout (#9046 )	2023-10-09 15:36:39 +08:00
Yang Wang	88565c76f6	add export merged model example (#9018 ) * add export merged model example * add sources * add script * fix style	2023-10-04 21:18:52 -07:00
Ruonan Wang	b943d73844	LLM: refactor kv cache (#9030 ) * refactor utils * meet code review; update all models * small fix	2023-09-21 21:28:03 +08:00
Ruonan Wang	bf51ec40b2	LLM: Fix empty cache (#9024 ) * fix * fix * update example	2023-09-21 17:16:07 +08:00
binbin Deng	edb225530b	add bark (#9016 )	2023-09-21 12:24:58 +08:00
JinBridge	48b503c630	LLM: add example of aquila (#9006 ) * LLM: add example of aquila * LLM: replace AquilaChat with Aquila * LLM: shorten prompt of aquila example	2023-09-20 15:52:56 +08:00
Yang Wang	c88f6ec457	Experiment XPU QLora Finetuning (#8937 ) * Support xpu finetuning * support xpu finetuning * fix style * fix style * fix style * refine example * add readme * refine readme * refine api * fix fp16 * fix example * refactor * fix style * fix compute type * add qlora * refine training args * fix example * fix style * fast path forinference * address comments * refine readme * revert lint	2023-09-19 10:15:44 -07:00
Jason Dai	51518e029d	Update llm readme (#9005 )	2023-09-19 20:01:33 +08:00
Ruonan Wang	249386261c	LLM: add Baichuan2 cpu example (#9002 ) * add baichuan2 cpu examples * add link * update prompt	2023-09-19 18:08:30 +08:00
binbin Deng	c1d25a51a8	LLM: add `optimize_model` example for bert (#8975 )	2023-09-18 16:18:35 +08:00
Ruonan Wang	cabe7c0358	LLM: add baichuan2 example for arc (#8994 ) * add baichuan2 examples * add link * small fix	2023-09-18 14:32:27 +08:00
JinBridge	c12b8f24b6	LLM: add use_cache=True for all gpu examples (#8971 )	2023-09-15 09:54:38 +08:00
binbin Deng	be29c75c18	LLM: refactor gpu examples (#8963 ) * restructure * change to hf-transformers-models/	2023-09-13 14:47:47 +08:00
Ruonan Wang	4de73f592e	LLM: add gpu example of chinese-llama-2-7b (#8960 ) * add gpu example of chinese -llama2 * update model name and link * update name	2023-09-13 10:16:51 +08:00
binbin Deng	2d81521019	LLM: add `optimize_model` examples for llama2 and chatglm (#8894 ) * add llama2 and chatglm optimize_model examples * update default usage * update command and some descriptions * move folder and remove general_int4 descriptions * change folder name	2023-09-12 10:36:29 +08:00
Yuwen Hu	ca35c93825	[LLM] Fix langchain UT (#8929 ) * Change dependency version for langchain uts * Downgrade pandas version instead; and update example readme accordingly	2023-09-08 13:51:04 +08:00
Zhao Changmin	8bc1d8a17c	LLM: Fix discards in `optimize_model` with non-hf models and add openai whisper example (#8877 ) * openai-whisper	2023-09-07 10:35:59 +08:00
Yina Chen	bfc71fbc15	Add known issue in arc voice assistant example (#8902 ) * add known issue in voice assistant example * update cpu	2023-09-07 09:28:26 +08:00
Yina Chen	74a2c2ddf5	Update optimize_model=True in llama2 chatglm2 arc examples (#8878 ) * add optimize_model=True in llama2 chatglm2 examples * add ipex optimize in gpt-j example	2023-09-05 10:35:37 +08:00
Zhao Changmin	9c652fbe95	LLM: Whisper long segment recognize example (#8826 ) * LLM: Long segment recognize example	2023-08-31 16:41:25 +08:00
Yina Chen	3462fd5c96	Add arc gpt-j example (#8840 )	2023-08-30 10:31:24 +08:00

... 2 3 4 5 6 ...

425 commits