Docs: update orca known issues (#5395)

* docs: update known issues

* fix: add example to known issue

* fix: add example to known issue

* fix: fix typo

* fix: modify issue description.

* fix: fix typo.

* fix: fix typo.
This commit is contained in:
Cengguang Zhang 2022-08-15 18:20:45 +08:00 committed by GitHub
parent a440a05b0c
commit 2c0424e964

View file

@ -4,7 +4,7 @@
### **OSError: Unable to load libhdfs: ./libhdfs.so: cannot open shared object file: No such file or directory**
This error occurs while running Orca TF2 Estimator with spark backend for YARN on Cloudera, where PyArrow fails to locate `libhdfs.so` in default path of `$HADOOP_HOME/lib/native`.
This error occurs while running Orca TF2 Estimator with YARN on Cloudera, where PyArrow fails to locate `libhdfs.so` in default path of `$HADOOP_HOME/lib/native`.
To solve this issue, you need to set the path of `libhdfs.so` in Cloudera to the environment variable of `ARROW_LIBHDFS_DIR` on Spark driver and executors with the following steps:
1. Run `locate libhdfs.so` on the client node to find `libhdfs.so`
@ -24,7 +24,7 @@ To solve this issue, you need to set the path of `libhdfs.so` in Cloudera to the
--conf spark.yarn.appMasterEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64
```
### UnkownError: Could not start gRPC server
### **UnkownError: Could not start gRPC server**
This error occurs while running Orca TF2 Estimator with spark backend, which may because the previous pyspark tensorflow job was not cleaned completely. You can retry later or you can set spark config `spark.python.worker.reuse=false` in your application.
@ -38,7 +38,34 @@ If you are using `init_orca_context(cluster_mode="yarn-client")`:
spark-submit --conf spark.python.worker.reuse=false
```
## **Orca Context Issues**
### **RuntimeError: Inter op parallelism cannot be modified after initialization**
This error occurs if you build your TensorFlow model on the driver rather than on workers. You should build the complete model in `model_creator` which runs on each worker node. You can refer to the following examples:
**Wrong Example**
```
model = ...
def model_creator(config):
model.compile(...)
return model
estimator = Estimator.from_keras(model_creator=model_creator,...)
...
```
**Correct Example**
```
def model_creator(config):
model = ...
model.compile(...)
return model
estimator = Estimator.from_keras(model_creator=model_creator,...)
...
```
## **OrcaContext Issues**
### **Exception: Failed to read dashbord log: [Errno 2] No such file or directory: '/tmp/ray/.../dashboard.log'**