Fix libhdfs known issue doc (#5257)

* fix doc

* minor
This commit is contained in:
Kai Huang 2022-08-01 20:28:36 +08:00 committed by GitHub
parent df2a7dcef0
commit cda8967416

View file

@ -4,12 +4,11 @@
### **OSError: Unable to load libhdfs: ./libhdfs.so: cannot open shared object file: No such file or directory** ### **OSError: Unable to load libhdfs: ./libhdfs.so: cannot open shared object file: No such file or directory**
This error occurs while running Orca with `yarn-client` mode on Cloudera, where PyArrow failed to locate `libhdfs.so` in default path of `$HADOOP_HOME/lib/native`. To solve this, we need to set the path of `libhdfs.so` in Cloudera to the environment variable of `ARROW_LIBHDFS_DIR` on spark executors. This error occurs while running Orca TF2 Estimator with spark backend for YARN on Cloudera, where PyArrow fails to locate `libhdfs.so` in default path of `$HADOOP_HOME/lib/native`.
To solve this issue, you need to set the path of `libhdfs.so` in Cloudera to the environment variable of `ARROW_LIBHDFS_DIR` on Spark driver and executors with the following steps:
You could follow below steps: 1. Run `locate libhdfs.so` on the client node to find `libhdfs.so`
2. `export ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64` (replace with the result of `locate libhdfs.so` in your environment)
1. use `locate libhdfs.so` to find `libhdfs.so`
2. `export ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64` (replace with the result of locate libhdfs.so)
3. If you are using `init_orca_context(cluster_mode="yarn-client")`: 3. If you are using `init_orca_context(cluster_mode="yarn-client")`:
``` ```
conf = {"spark.executorEnv.ARROW_LIBHDFS_DIR": "/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64"} conf = {"spark.executorEnv.ARROW_LIBHDFS_DIR": "/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64"}
@ -17,7 +16,12 @@ You could follow below steps:
``` ```
If you are using `init_orca_context(cluster_mode="spark-submit")`: If you are using `init_orca_context(cluster_mode="spark-submit")`:
``` ```
spark-submit --conf "spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64" # For yarn-client mode
spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64
# For yarn-cluster mode
spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64 \
--conf spark.yarn.appMasterEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64
``` ```
## **Orca Context Issues** ## **Orca Context Issues**