Update gzip command in FashionMNIST tutorial (#7298)
This commit is contained in:
parent
8e4e643f53
commit
73ceb1ca92
2 changed files with 7 additions and 11 deletions
|
|
@ -64,7 +64,7 @@ kubectl describe pod <driver-pod-name>
|
|||
|
||||
|
||||
### 1.3 Load Data from Volumes
|
||||
When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example.
|
||||
When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example. You are recommended to put your working directory in the Volume (NFS) as well.
|
||||
|
||||
To load data from Volumes, please set the corresponding Volume configurations for spark using `--conf` option in Spark scripts or specifying `conf` in `init_orca_context`. Here we list the configurations for using NFS as Volume.
|
||||
|
||||
|
|
@ -112,8 +112,6 @@ def train_data_creator(config, batch_size):
|
|||
return trainloader
|
||||
```
|
||||
|
||||
You are recommended to put your working directory in the Volume (NFS) as well.
|
||||
|
||||
|
||||
---
|
||||
## 2. Create BigDL K8s Container
|
||||
|
|
@ -213,12 +211,11 @@ Please download the Fashion-MNIST dataset manually on your __Develop Node__ and
|
|||
# PyTorch official dataset download link
|
||||
git clone https://github.com/zalandoresearch/fashion-mnist.git
|
||||
|
||||
# Move the dataset to NFS under the folder FashionMNIST/raw
|
||||
mv /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw
|
||||
# Copy the dataset files to the folder FashionMNIST/raw in NFS
|
||||
cp /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw
|
||||
|
||||
# Extract FashionMNIST archives
|
||||
# May need to upgrade gzip before running the command
|
||||
gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
|
||||
gzip -d /bigdl/nfsdata/dataset/FashionMNIST/raw/*
|
||||
```
|
||||
|
||||
In the given example, you can specify the argument `--remote_dir` to be the directory on NFS for the Fashion-MNIST dataset. The directory should contain `FashionMNIST/raw/train-images-idx3-ubyte` and `FashionMNIST/raw/t10k-images-idx3`.
|
||||
|
|
|
|||
|
|
@ -117,12 +117,11 @@ First, download the Fashion-MNIST dataset manually on your __Client Node__. Note
|
|||
# PyTorch official dataset download link
|
||||
git clone https://github.com/zalandoresearch/fashion-mnist.git
|
||||
|
||||
# Move the dataset under the folder FashionMNIST/raw
|
||||
mv /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw
|
||||
# Copy the dataset files to the folder FashionMNIST/raw
|
||||
cp /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw
|
||||
|
||||
# Extract FashionMNIST archives
|
||||
# May need to upgrade gzip before running the command
|
||||
gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
|
||||
gzip -d /path/to/local/data/FashionMNIST/raw/*
|
||||
```
|
||||
Then upload it to a distributed storage. Sample command to upload data to HDFS is as follows:
|
||||
```bash
|
||||
|
|
|
|||
Loading…
Reference in a new issue