2.7 KiB
Orca Known Issues
Estimator Issues
OSError: Unable to load libhdfs: ./libhdfs.so: cannot open shared object file: No such file or directory
This error occurs while running Orca TF2 Estimator with spark backend for YARN on Cloudera, where PyArrow fails to locate libhdfs.so in default path of $HADOOP_HOME/lib/native.
To solve this issue, you need to set the path of libhdfs.so in Cloudera to the environment variable of ARROW_LIBHDFS_DIR on Spark driver and executors with the following steps:
- Run
locate libhdfs.soon the client node to findlibhdfs.so export ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64(replace with the result oflocate libhdfs.soin your environment)- If you are using
init_orca_context(cluster_mode="yarn-client"):
If you are usingconf = {"spark.executorEnv.ARROW_LIBHDFS_DIR": "/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64"} init_orca_context(cluster_mode="yarn", conf=conf)init_orca_context(cluster_mode="spark-submit"):# For yarn-client mode spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64 # For yarn-cluster mode spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64 \ --conf spark.yarn.appMasterEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64
Orca Context Issues
Exception: Failed to read dashbord log: [Errno 2] No such file or directory: '/tmp/ray/.../dashboard.log'
This error occurs when initialize an orca context with init_ray_on_spark=True. We have not locate the root cause of this problem, but it might be caused by an atypical python environment.
You could follow below steps to workaround:
-
If you only need to use functions in ray (e.g.
bigdl.orca.learnwithbackend="ray",bigdl.orca.automlfor pytorch/tensorflow model,bigdl.chronos.autotsfor time series model's auto-tunning), we may use ray as the first-class.- Start a ray cluster by
ray start --head. if you already have a ray cluster started, please direcetly jump to step 2. - Initialize an orca context with
runtime="ray"andinit_ray_on_spark=False, please refer to detailed information here. - If you are using
bigdl.orca.automlorbigdl.chronos.autotson a single node, please set:ray_ctx = OrcaContext.get_ray_context() ray_ctx.is_local=True
- Start a ray cluster by
-
If you really need to use ray on spark, please install bigdl-orca under a conda environment. Detailed information please refer to here.