Add Orca Known issues (#4940)

2022-06-24 16:45:41 +08:00 · 2022-06-24 16:45:41 +08:00 · 8f51ed4cf1
commit 8f51ed4cf1
parent 145216bfc1
2 changed files with 22 additions and 0 deletions
--- a/docs/readthedocs/source/doc/Orca/Overview/known_issues.md
+++ b/docs/readthedocs/source/doc/Orca/Overview/known_issues.md
@ -0,0 +1,21 @@
+# Orca Known Issues
+
+## **Estimator Issues**
+
+### **OSError: Unable to load libhdfs: ./libhdfs.so: cannot open shared object file: No such file or directory**
+
+This error occurs while running Orca with `yarn-client` mode on Cloudera, where PyArrow failed to locate `libhdfs.so` in default path of `$HADOOP_HOME/lib/native`. To solve this, we need to set the path of `libhdfs.so` in Cloudera to the environment variable of `ARROW_LIBHDFS_DIR` on spark executors. 
+
+You could follow below steps:
+
+1. use `locate libhdfs.so` to find `libhdfs.so`
+2. `export ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64` (replace with the result of locate libhdfs.so)
+3. If you are using `init_orca_context(cluster_mode="yarn-client")`: 
+   ```
+   conf = {"spark.executorEnv.ARROW_LIBHDFS_DIR": "/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64"}
+   init_orca_context(cluster_mode="yarn", conf=conf)
+   ```
+   If you are using `init_orca_context(cluster_mode="spark-submit")`:
+   ```
+   spark-submit --conf "spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64"
+   ```
--- a/docs/readthedocs/source/index.rst
+++ b/docs/readthedocs/source/index.rst
@ -71,6 +71,7 @@ BigDL Documentation
   doc/Orca/Overview/distributed-training-inference.md
   doc/Orca/Overview/distributed-tuning.md
   doc/Ray/Overview/ray.md
+   doc/Orca/Overview/known_issues.md

 .. toctree::
   :maxdepth: 1