dding3 0f20dabda3 add dllib quickstart (#3416 )

* add dllib quickstart
* fix init_nncontext issue

2021-11-07 21:21:49 +08:00

2.5 KiB

Raw Blame History

DLlib Quickstarts

Run in Google Colab View source on GitHub

In this guide we will demonstrate how to use DLlib keras style api and DLlib NNClassifier for classification.

Step 0: Prepare Environment

We recommend using conda to prepare the environment. Please refer to the install guide for more details.

conda create -n my_env python=3.7 # "my_env" is conda environment name, you can use any name you like.
conda activate my_env
pip install bigdl-dllib

Step 1: Data loading and processing using Spark DataFrame

df = spark.read.csv(path, sep=',', inferSchema=True).toDF("num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age", "class")

We process the data using Spark API and split the data into train and test set.

vecAssembler = VectorAssembler(outputCol="features")
vecAssembler.setInputCols(["num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age"])
train_df = vecAssembler.transform(df)

changedTypedf = train_df.withColumn("label", train_df["class"].cast(DoubleType())+lit(1))\
    .select("features", "label")
(trainingDF, validationDF) = changedTypedf.randomSplit([0.9, 0.1])

Step 3: Define classification model using DLlib keras style api

x1 = Input(shape=(8,))
dense1 = Dense(12, activation='relu')(x1)
dense2 = Dense(8, activation='relu')(dense1)
dense3 = Dense(2)(dense2)
model = Model(x1, dense3)

Step 4: Create NNClassifier and Fit NNClassifier

classifier = NNClassifier(model, CrossEntropyCriterion(), [8]) \
    .setOptimMethod(Adam()) \
    .setBatchSize(32) \
    .setMaxEpoch(150)

nnModel = classifier.fit(trainingDF)

Step 5: Evaluate the trained model

predictionDF = nnModel.transform(validationDF).cache()
predictionDF.sample(False, 0.1).show()


evaluator = MulticlassClassificationEvaluator(
    labelCol="label", predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(predictionDF)

2.5 KiB Raw Blame History