From 985aec4425b4868255e63a42120f39a4af7fb2c0 Mon Sep 17 00:00:00 2001 From: Cengguang Zhang Date: Mon, 8 Aug 2022 18:10:29 +0800 Subject: [PATCH] Orca: add grpc error to orca known issues (#5309) * feat: add grpc error to orca known issues * refactor: update short name and style. * refactor: refine error explanation * Update known_issues.md * Update known_issues.md --- .../source/doc/Orca/Overview/known_issues.md | 20 ++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/docs/readthedocs/source/doc/Orca/Overview/known_issues.md b/docs/readthedocs/source/doc/Orca/Overview/known_issues.md index 626c025c..ff996fd5 100644 --- a/docs/readthedocs/source/doc/Orca/Overview/known_issues.md +++ b/docs/readthedocs/source/doc/Orca/Overview/known_issues.md @@ -12,18 +12,32 @@ To solve this issue, you need to set the path of `libhdfs.so` in Cloudera to the 3. If you are using `init_orca_context(cluster_mode="yarn-client")`: ``` conf = {"spark.executorEnv.ARROW_LIBHDFS_DIR": "/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64"} - init_orca_context(cluster_mode="yarn", conf=conf) + init_orca_context(cluster_mode="yarn-client", conf=conf) ``` If you are using `init_orca_context(cluster_mode="spark-submit")`: ``` # For yarn-client mode spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64 - + # For yarn-cluster mode spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64 \ --conf spark.yarn.appMasterEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64 ``` +### UnkownError: Could not start gRPC server + +This error occurs while running Orca TF2 Estimator with spark backend, which may because the previous pyspark tensorflow job was not cleaned completely. You can retry later or you can set spark config `spark.python.worker.reuse=false` in your application. + +If you are using `init_orca_context(cluster_mode="yarn-client")`: + ``` + conf = {"spark.python.worker.reuse": "false"} + init_orca_context(cluster_mode="yarn-client", conf=conf) + ``` + If you are using `init_orca_context(cluster_mode="spark-submit")`: + ``` + spark-submit --conf spark.python.worker.reuse=false + ``` + ## **Orca Context Issues** ### **Exception: Failed to read dashbord log: [Errno 2] No such file or directory: '/tmp/ray/.../dashboard.log'** @@ -42,4 +56,4 @@ You could follow below steps to workaround: ray_ctx.is_local=True ``` -2. If you really need to use ray on spark, please install bigdl-orca under a conda environment. Detailed information please refer to [here](./orca.html). \ No newline at end of file +2. If you really need to use ray on spark, please install bigdl-orca under a conda environment. Detailed information please refer to [here](./orca.html).