diff --git a/docs/readthedocs/source/doc/Orca/Tutorial/k8s.md b/docs/readthedocs/source/doc/Orca/Tutorial/k8s.md index f9ed5d42..5f17e851 100644 --- a/docs/readthedocs/source/doc/Orca/Tutorial/k8s.md +++ b/docs/readthedocs/source/doc/Orca/Tutorial/k8s.md @@ -64,7 +64,7 @@ kubectl describe pod ### 1.3 Load Data from Volumes -When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example. +When you are running programs on K8s, please load data from [Volumes](https://kubernetes.io/docs/concepts/storage/volumes/) accessible to all K8s pods. We use Network File Systems (NFS) with path `/bigdl/nfsdata` in this tutorial as an example. You are recommended to put your working directory in the Volume (NFS) as well. To load data from Volumes, please set the corresponding Volume configurations for spark using `--conf` option in Spark scripts or specifying `conf` in `init_orca_context`. Here we list the configurations for using NFS as Volume. @@ -112,8 +112,6 @@ def train_data_creator(config, batch_size): return trainloader ``` -You are recommended to put your working directory in the Volume (NFS) as well. - --- ## 2. Create BigDL K8s Container @@ -213,12 +211,11 @@ Please download the Fashion-MNIST dataset manually on your __Develop Node__ and # PyTorch official dataset download link git clone https://github.com/zalandoresearch/fashion-mnist.git -# Move the dataset to NFS under the folder FashionMNIST/raw -mv /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw +# Copy the dataset files to the folder FashionMNIST/raw in NFS +cp /path/to/fashion-mnist/data/fashion/* /bigdl/nfsdata/dataset/FashionMNIST/raw # Extract FashionMNIST archives -# May need to upgrade gzip before running the command -gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/* +gzip -d /bigdl/nfsdata/dataset/FashionMNIST/raw/* ``` In the given example, you can specify the argument `--remote_dir` to be the directory on NFS for the Fashion-MNIST dataset. The directory should contain `FashionMNIST/raw/train-images-idx3-ubyte` and `FashionMNIST/raw/t10k-images-idx3`. diff --git a/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md b/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md index e2d7e576..050fe511 100644 --- a/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md +++ b/docs/readthedocs/source/doc/Orca/Tutorial/yarn.md @@ -117,12 +117,11 @@ First, download the Fashion-MNIST dataset manually on your __Client Node__. Note # PyTorch official dataset download link git clone https://github.com/zalandoresearch/fashion-mnist.git -# Move the dataset under the folder FashionMNIST/raw -mv /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw +# Copy the dataset files to the folder FashionMNIST/raw +cp /path/to/fashion-mnist/data/fashion/* /path/to/local/data/FashionMNIST/raw # Extract FashionMNIST archives -# May need to upgrade gzip before running the command -gzip -dk /bigdl/nfsdata/dataset/FashionMNIST/raw/* +gzip -d /path/to/local/data/FashionMNIST/raw/* ``` Then upload it to a distributed storage. Sample command to upload data to HDFS is as follows: ```bash