Update gzip command in FashionMNIST tutorial (#7298)

2023-01-17 16:54:38 +08:00 · 2023-01-17 16:54:38 +08:00 · 73ceb1ca92
commit 73ceb1ca92
parent 8e4e643f53
2 changed files with 7 additions and 11 deletions
--- a/docs/readthedocs/source/doc/Orca/Tutorial/k8s.md
+++ b/docs/readthedocs/source/doc/Orca/Tutorial/k8s.md
@ -64,7 +64,7 @@ kubectl describe pod <driver-pod-name>


 ### 1.3 Load Data from Volumes
-When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example.
+When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example. You are recommended to put your working directory in the Volume (NFS) as well.

 To load data from Volumes, please set the corresponding Volume configurations for spark using `--conf` option in Spark scripts or specifying `conf` in `init_orca_context`. Here we list the configurations for using NFS as Volume.

@ -112,8 +112,6 @@ def train_data_creator(config, batch_size):
    return trainloader
 ```

-You are recommended to put your working directory in the Volume (NFS) as well.
-

 ---
 ## 2. Create BigDL K8s Container 
@ -213,12 +211,11 @@ Please download the Fashion-MNIST dataset manually on your __Develop Node__ and
 # PyTorch official dataset download link
 git clone https://github.com/zalandoresearch/fashion-mnist.git

-# Move the dataset to NFS under the folder FashionMNIST/raw
-mv /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw
+# Copy the dataset files to the folder FashionMNIST/raw in NFS
+cp /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw

 # Extract FashionMNIST archives
-# May need to upgrade gzip before running the command
-gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
+gzip -d /bigdl/nfsdata/dataset/FashionMNIST/raw/*
 ```

 In the given example, you can specify the argument `--remote_dir` to be the directory on NFS for the Fashion-MNIST dataset. The directory should contain `FashionMNIST/raw/train-images-idx3-ubyte` and `FashionMNIST/raw/t10k-images-idx3`.
--- a/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md
+++ b/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md
@ -117,12 +117,11 @@ First, download the Fashion-MNIST dataset manually on your __Client Node__. Note
 # PyTorch official dataset download link
 git clone https://github.com/zalandoresearch/fashion-mnist.git

-# Move the dataset under the folder FashionMNIST/raw
-mv /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw
+# Copy the dataset files to the folder FashionMNIST/raw
+cp /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw

 # Extract FashionMNIST archives
-# May need to upgrade gzip before running the command
-gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/*
+gzip -d /path/to/local/data/FashionMNIST/raw/*
 ```
 Then upload it to a distributed storage. Sample command to upload data to HDFS is as follows:
 ```bash