Update gzip command in FashionMNIST tutorial (#7298)
This commit is contained in:
		
							parent
							
								
									8e4e643f53
								
							
						
					
					
						commit
						73ceb1ca92
					
				
					 2 changed files with 7 additions and 11 deletions
				
			
		| 
						 | 
				
			
			@ -64,7 +64,7 @@ kubectl describe pod <driver-pod-name>
 | 
			
		|||
 | 
			
		||||
 | 
			
		||||
### 1.3 Load Data from Volumes
 | 
			
		||||
When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example.
 | 
			
		||||
When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example. You are recommended to put your working directory in the Volume (NFS) as well.
 | 
			
		||||
 | 
			
		||||
To load data from Volumes, please set the corresponding Volume configurations for spark using `--conf` option in Spark scripts or specifying `conf` in `init_orca_context`. Here we list the configurations for using NFS as Volume.
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -112,8 +112,6 @@ def train_data_creator(config, batch_size):
 | 
			
		|||
    return trainloader
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
You are recommended to put your working directory in the Volume (NFS) as well.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
## 2. Create BigDL K8s Container 
 | 
			
		||||
| 
						 | 
				
			
			@ -213,12 +211,11 @@ Please download the Fashion-MNIST dataset manually on your __Develop Node__ and
 | 
			
		|||
# PyTorch official dataset download link
 | 
			
		||||
git clone https://github.com/zalandoresearch/fashion-mnist.git
 | 
			
		||||
 | 
			
		||||
# Move the dataset to NFS under the folder FashionMNIST/raw
 | 
			
		||||
mv /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw
 | 
			
		||||
# Copy the dataset files to the folder FashionMNIST/raw in NFS
 | 
			
		||||
cp /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw
 | 
			
		||||
 | 
			
		||||
# Extract FashionMNIST archives
 | 
			
		||||
# May need to upgrade gzip before running the command
 | 
			
		||||
gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
 | 
			
		||||
gzip -d /bigdl/nfsdata/dataset/FashionMNIST/raw/*
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In the given example, you can specify the argument `--remote_dir` to be the directory on NFS for the Fashion-MNIST dataset. The directory should contain `FashionMNIST/raw/train-images-idx3-ubyte` and `FashionMNIST/raw/t10k-images-idx3`.
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -117,12 +117,11 @@ First, download the Fashion-MNIST dataset manually on your __Client Node__. Note
 | 
			
		|||
# PyTorch official dataset download link
 | 
			
		||||
git clone https://github.com/zalandoresearch/fashion-mnist.git
 | 
			
		||||
 | 
			
		||||
# Move the dataset under the folder FashionMNIST/raw
 | 
			
		||||
mv /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw
 | 
			
		||||
# Copy the dataset files to the folder FashionMNIST/raw
 | 
			
		||||
cp /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw
 | 
			
		||||
 | 
			
		||||
# Extract FashionMNIST archives
 | 
			
		||||
# May need to upgrade gzip before running the command
 | 
			
		||||
gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
 | 
			
		||||
gzip -d /path/to/local/data/FashionMNIST/raw/*
 | 
			
		||||
```
 | 
			
		||||
Then upload it to a distributed storage. Sample command to upload data to HDFS is as follows:
 | 
			
		||||
```bash
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue