Add Ray arguments in init_orca_context to Doc page (#5686)

* add ray docs * fix * fix * fix style * minor
2022-09-08 18:18:18 +08:00 · 2022-09-08 18:18:18 +08:00 · dcff01690b
commit dcff01690b
parent 103546d8fe
2 changed files with 15 additions and 1 deletions
--- a/docs/readthedocs/source/doc/Ray/Overview/ray.md
+++ b/docs/readthedocs/source/doc/Ray/Overview/ray.md
@ -36,6 +36,20 @@ from bigdl.orca import init_orca_context
 sc = init_orca_context(cluster_mode="yarn-client", cores=4, memory="10g", num_nodes=2, init_ray_on_spark=True)
 ```

+You can input the following RayOnSpark related arguments when you `init_orca_context` for Ray configurations:
+- `redis_port`: The redis port for the ray head node. The value would be randomly picked if not specified.
+- `redis_password`: The password for redis. The value would be ray's default password if not specified.
+- `object_store_memory`: The memory size for ray object_store in string. This can be specified in bytes(b), kilobytes(k), megabytes(m) or gigabytes(g). For example, "50b", "100k", "250m", "30g".
+- `verbose`: True for more logs when starting ray. Default is False.
+- `env`: The environment variable dict for running ray processes. Default is None.
+- `extra_params`: The key value dict for extra options to launch ray. For example, `extra_params={"dashboard-port": "11281", "temp-dir": "/tmp/ray/"}`.
+- `include_webui`: Default is True for including web ui when starting ray.
+- `system_config`: The key value dict for overriding RayConfig defaults. Mainly for testing purposes. An example for system_config could be: `{"object_spilling_config":"{\"type\":\"filesystem\", \"params\":{\"directory_path\":\"/tmp/spill\"}}"}`.
+- `num_ray_nodes`: The number of ray processes to start across the cluster. For Spark local mode, you don't need to specify this value. 
+For Spark cluster mode, it is default to be the number of Spark executors. If spark.executor.instances can't be detected in your SparkContext, you need to explicitly specify this. It is recommended that num_ray_nodes is not larger than the number of Spark executors to make sure there are enough resources in your cluster.
+- `ray_node_cpu_cores`: The number of available cores for each ray process. For Spark local mode, it is default to be the number of Spark local cores. 
+For Spark cluster mode, it is default to be the number of cores for each Spark executor. If spark.executor.cores or spark.cores.max can't be detected in your SparkContext, you need to explicitly specify this. It is recommended that ray_node_cpu_cores is not larger than the number of cores for each Spark executor to make sure there are enough resources in your cluster.
+
 By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`:

 ```python
--- a/docs/readthedocs/source/doc/Ray/QuickStart/ray-quickstart.md
+++ b/docs/readthedocs/source/doc/Ray/QuickStart/ray-quickstart.md
@ -33,7 +33,7 @@ elif cluster_mode == "yarn":  # For Hadoop/YARN cluster
    sc = init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1, init_ray_on_spark=True)
 ```

-This is the only place where you need to specify local or distributed mode.
+This is the only place where you need to specify local or distributed mode. See [here](./../../Ray/Overview/ray.md#initialize) for more RayOnSpark related arguments when you `init_orca_context`.

 By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`: