Replace Llama 7b to Llama2-7b in README.md (#9055)
* Replace Llama 7b with Llama2-7b in README.md Need to replace the base model to Llama2-7b as we are operating on Llama2 here. * Replace Llama 7b to Llama2-7b in README.md a llama 7b in the 1st line is missed * Update architecture graph --------- Co-authored-by: Heyang Sun <60865256+Uxito-Ada@users.noreply.github.com>
This commit is contained in:
		
							parent
							
								
									cc84ed70b3
								
							
						
					
					
						commit
						a717352c59
					
				
					 1 changed files with 5 additions and 5 deletions
				
			
		| 
						 | 
				
			
			@ -1,16 +1,16 @@
 | 
			
		|||
## Run BF16-Optimized Lora Finetuning on Kubernetes with OneCCL
 | 
			
		||||
 | 
			
		||||
[Alpaca Lora](https://github.com/tloen/alpaca-lora/tree/main) uses [low-rank adaption](https://arxiv.org/pdf/2106.09685.pdf) to speed up the finetuning process of base model [Llama 7b](https://huggingface.co/decapoda-research/llama-7b-hf), and tries to reproduce the standard Alpaca, a general finetuned LLM. This is on top of Hugging Face transformers with Pytorch backend, which natively requires a number of expensive GPU resources and takes significant time.
 | 
			
		||||
[Alpaca Lora](https://github.com/tloen/alpaca-lora/tree/main) uses [low-rank adaption](https://arxiv.org/pdf/2106.09685.pdf) to speed up the finetuning process of base model [Llama2-7b](https://huggingface.co/meta-llama/Llama-2-7b), and tries to reproduce the standard Alpaca, a general finetuned LLM. This is on top of Hugging Face transformers with Pytorch backend, which natively requires a number of expensive GPU resources and takes significant time.
 | 
			
		||||
 | 
			
		||||
By constract, BigDL here provides a CPU optimization to accelerate the lora finetuning of Llama 7b, in the power of mixed-precision and distributed training. Detailedly, [Intel OneCCL](https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html), an available Hugging Face backend, is able to speed up the Pytorch computation with BF16 datatype on CPUs, as well as parallel processing on Kubernetes enabled by [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html). 
 | 
			
		||||
By constract, BigDL here provides a CPU optimization to accelerate the lora finetuning of Llama2-7b, in the power of mixed-precision and distributed training. Detailedly, [Intel OneCCL](https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html), an available Hugging Face backend, is able to speed up the Pytorch computation with BF16 datatype on CPUs, as well as parallel processing on Kubernetes enabled by [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html). 
 | 
			
		||||
 | 
			
		||||
The architecture is illustrated in the following:
 | 
			
		||||
 | 
			
		||||

 | 
			
		||||

 | 
			
		||||
 | 
			
		||||
As above, BigDL implements its MPI training build on [Kubeflow MPI operator](https://github.com/kubeflow/mpi-operator/tree/master), which encapsulates the deployment as MPIJob CRD, and assists users to handle the construction of a MPI worker cluster on Kubernetes, such as public key distribution, SSH connection, and log collection. 
 | 
			
		||||
 | 
			
		||||
Now, let's go to deploy a Lora finetuning to create a LLM from Llama 7b.
 | 
			
		||||
Now, let's go to deploy a Lora finetuning to create a LLM from Llama2-7b.
 | 
			
		||||
 | 
			
		||||
**Note: Please make sure you have already have an available Kubernetes infrastructure and NFS shared storage, and install [Helm CLI](https://helm.sh/docs/helm/helm_install/) for Kubernetes job submission.**
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -22,7 +22,7 @@ Follow [here](https://github.com/kubeflow/mpi-operator/tree/master#installation)
 | 
			
		|||
 | 
			
		||||
Follow [here](https://github.com/intel-analytics/BigDL/tree/main/docker/llm/finetune/lora/docker#prepare-bigdl-image-for-lora-finetuning) to prepare BigDL Lora Finetuning image in your cluster.
 | 
			
		||||
 | 
			
		||||
As finetuning is from a base model, first download [Llama 7b hf model from the public download site of Hugging Face](https://huggingface.co/decapoda-research/llama-7b-hf/tree/main). Then, download [cleaned alpaca data](https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned_archive.json), which contains all kinds of general knowledge and has already been cleaned. Next, move the downloaded files to a shared directory on your NFS server.
 | 
			
		||||
As finetuning is from a base model, first download [Llama2-7b model from the public download site of Hugging Face](https://huggingface.co/meta-llama/Llama-2-7b). Then, download [cleaned alpaca data](https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned_archive.json), which contains all kinds of general knowledge and has already been cleaned. Next, move the downloaded files to a shared directory on your NFS server.
 | 
			
		||||
 | 
			
		||||
### 3. Deploy through Helm Chart
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue