# Installation --- ## Prepare the environment You can follow the commands in this section to install Java and conda before installing BigDL Orca. ### Install Java You need to download and install JDK in the environment, and properly set the environment variable `JAVA_HOME`. JDK8 is highly recommended. ```bash # For Ubuntu sudo apt-get install openjdk-8-jre export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ # For CentOS su -c "yum install java-1.8.0-openjdk" export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-1.el7_9.x86_64/jre export PATH=$PATH:$JAVA_HOME/bin java -version # Verify the version of JDK. ``` ### Install Anaconda We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the Python environment. You can follow the steps below to install conda: ```bash # Download Anaconda installation script wget -P /tmp https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh # Execute the script to install conda bash /tmp/Anaconda3-2020.02-Linux-x86_64.sh # Run this command in your terminal to activate conda source ~/.bashrc ``` Then create a Python environment for BigDL Orca: ```bash conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like. conda activate py37 ``` --- ## Install BigDL Orca ### To use basic Orca features You can install Orca in your created conda environment for distributed data processing, training and inference with the following command: ```bash pip install bigdl-orca # For the official release version ``` or for the nightly build version, use: ```bash pip install --pre --upgrade bigdl-orca # For the latest nightly build version ``` Note that installing Orca will automatically install the dependencies including `bigdl-dllib`, `bigdl-tf`, `bigdl-math`, `packaging`, `filelock`, `pyzmq` and their dependencies if they haven't been detected in your conda environment._ ### To additionally use RayOnSpark If you wish to run [RayOnSpark](ray.md) or [sklearn-style Estimator APIs in Orca](distributed-training-inference.md) with "ray" backend, use the extra key `[ray]` during the installation above: ```bash pip install bigdl-orca[ray] # For the official release version ``` or for the nightly build version, use: ```bash pip install --pre --upgrade bigdl-orca[ray] # For the latest nightly build version ``` Note that with the extra key of [ray], `pip` will automatically install the additional dependencies for RayOnSpark, including `ray[default]==1.9.2`, `aiohttp==3.8.1`, `async-timeout==4.0.1`, `aioredis==1.3.1`, `hiredis==2.0.0`, `prometheus-client==0.11.0`, `psutil`, `setproctitle`. ### To additionally use AutoML If you wish to run AutoML, use the extra key `[automl]` during the installation above: ```bash pip install bigdl-orca[automl] # For the official release version ```` or for the nightly build version, use: ```bash pip install --pre --upgrade bigdl-orca[automl] # For the latest nightly build version ``` Note that with the extra key of [automl], `pip` will automatically install the additional dependencies for distributed hyper-parameter tuning, including `ray[tune]==1.9.2`, `scikit-learn`, `tensorboard`, `xgboost` together with the dependencies given by the extra key [ray]. - To use [Pytorch AutoEstimator](distributed-tuning.md#pytorch-autoestimator), you need to install Pytorch with `pip install torch==1.8.1`. - To use [TensorFlow/Keras AutoEstimator](distributed-tuning.md#tensorflow-keras-autoestimator), you need to install TensorFlow with `pip install tensorflow==1.15.0`. ### To install Orca for Spark3 By default, Orca is built on top of Spark 2.4.6 (with pyspark==2.4.6 as a dependency). If you want to install Orca built on top of Spark 3.1.3 (with pyspark==3.1.3 as a dependency), you can use the following command instead: ```bash # For the official release version pip install bigdl-orca-spark3 pip install bigdl-orca-spark3[ray] pip install bigdl-orca-spark3[automl] # For the latest nightly build version pip install --pre --upgrade bigdl-orca-spark3 pip install --pre --upgrade bigdl-orca-spark3[ray] pip install --pre --upgrade bigdl-orca-spark3[automl] ``` __Note__: You should only install Orca built on top of __ONE__ Spark version, but not both. If you want to switch the Spark version, please [**uninstall**](#to-uninstall-orca) Orca cleanly before reinstall. ### To uninstall Orca ```bash # For default Orca built on top of Spark 2.4.6 pip uninstall bigdl-orca bigdl-dllib bigdl-tf bigdl-math bigdl-core # For Orca built on top of Spark 3.1.3 pip uninstall bigdl-orca-spark3 bigdl-dllib-spark3 bigdl-tf bigdl-math bigdl-core ``` __Note__: If necessary, you need to manually uninstall `pyspark` and other [dependencies](https://github.com/intel-analytics/BigDL/tree/main/python/requirements/orca) introduced by Orca.