Update gzip command in FashionMNIST tutorial (#7298)

This commit is contained in:
Kai Huang 2023-01-17 16:54:38 +08:00 committed by GitHub
parent 8e4e643f53
commit 73ceb1ca92
2 changed files with 7 additions and 11 deletions

View file

@ -64,7 +64,7 @@ kubectl describe pod <driver-pod-name>
### 1.3 Load Data from Volumes
When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example.
When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example. You are recommended to put your working directory in the Volume (NFS) as well.
To load data from Volumes, please set the corresponding Volume configurations for spark using `--conf` option in Spark scripts or specifying `conf` in `init_orca_context`. Here we list the configurations for using NFS as Volume.
@ -112,8 +112,6 @@ def train_data_creator(config, batch_size):
return trainloader
```
You are recommended to put your working directory in the Volume (NFS) as well.
---
## 2. Create BigDL K8s Container
@ -213,12 +211,11 @@ Please download the Fashion-MNIST dataset manually on your __Develop Node__ and
# PyTorch official dataset download link
git clone https://github.com/zalandoresearch/fashion-mnist.git
# Move the dataset to NFS under the folder FashionMNIST/raw
mv /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw
# Copy the dataset files to the folder FashionMNIST/raw in NFS
cp /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw
# Extract FashionMNIST archives
# May need to upgrade gzip before running the command
gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
gzip -d /bigdl/nfsdata/dataset/FashionMNIST/raw/*
```
In the given example, you can specify the argument `--remote_dir` to be the directory on NFS for the Fashion-MNIST dataset. The directory should contain `FashionMNIST/raw/train-images-idx3-ubyte` and `FashionMNIST/raw/t10k-images-idx3`.

View file

@ -117,12 +117,11 @@ First, download the Fashion-MNIST dataset manually on your __Client Node__. Note
# PyTorch official dataset download link
git clone https://github.com/zalandoresearch/fashion-mnist.git
# Move the dataset under the folder FashionMNIST/raw
mv /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw
# Copy the dataset files to the folder FashionMNIST/raw
cp /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw
# Extract FashionMNIST archives
# May need to upgrade gzip before running the command
gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
gzip -d /path/to/local/data/FashionMNIST/raw/*
```
Then upload it to a distributed storage. Sample command to upload data to HDFS is as follows:
```bash