Fix libhdfs known issue doc (#5257)

* fix doc * minor
2022-08-01 20:28:36 +08:00 · 2022-08-01 20:28:36 +08:00 · cda8967416
commit cda8967416
parent df2a7dcef0
1 changed files with 10 additions and 6 deletions
--- a/docs/readthedocs/source/doc/Orca/Overview/known_issues.md
+++ b/docs/readthedocs/source/doc/Orca/Overview/known_issues.md
@ -4,12 +4,11 @@

 ### **OSError: Unable to load libhdfs: ./libhdfs.so: cannot open shared object file: No such file or directory**

-This error occurs while running Orca with `yarn-client` mode on Cloudera, where PyArrow failed to locate `libhdfs.so` in default path of `$HADOOP_HOME/lib/native`. To solve this, we need to set the path of `libhdfs.so` in Cloudera to the environment variable of `ARROW_LIBHDFS_DIR` on spark executors. 
+This error occurs while running Orca TF2 Estimator with spark backend for YARN on Cloudera, where PyArrow fails to locate `libhdfs.so` in default path of `$HADOOP_HOME/lib/native`. 
+To solve this issue, you need to set the path of `libhdfs.so` in Cloudera to the environment variable of `ARROW_LIBHDFS_DIR` on Spark driver and executors with the following steps:

-You could follow below steps:
-
-1. use `locate libhdfs.so` to find `libhdfs.so`
-2. `export ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64` (replace with the result of locate libhdfs.so)
+1. Run `locate libhdfs.so` on the client node to find `libhdfs.so`
+2. `export ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64` (replace with the result of `locate libhdfs.so` in your environment)
 3. If you are using `init_orca_context(cluster_mode="yarn-client")`: 
   ```
   conf = {"spark.executorEnv.ARROW_LIBHDFS_DIR": "/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64"}
@ -17,7 +16,12 @@ You could follow below steps:
   ```
   If you are using `init_orca_context(cluster_mode="spark-submit")`:
   ```
-   spark-submit --conf "spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64"
+   # For yarn-client mode
+   spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64
+
+   # For yarn-cluster mode
+   spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64 \
+                --conf spark.yarn.appMasterEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64
   ```

 ## **Orca Context Issues**