ipex-llm/docs/readthedocs/source/doc/DLlib/QuickStart/dllib-quickstart.md
dding3 0f20dabda3 add dllib quickstart (#3416)
* add dllib quickstart
* fix init_nncontext issue
2021-11-07 21:21:49 +08:00

2.5 KiB

DLlib Quickstarts


Run in Google Colab  View source on GitHub


In this guide we will demonstrate how to use DLlib keras style api and DLlib NNClassifier for classification.

Step 0: Prepare Environment

We recommend using conda to prepare the environment. Please refer to the install guide for more details.

conda create -n my_env python=3.7 # "my_env" is conda environment name, you can use any name you like.
conda activate my_env
pip install bigdl-dllib

Step 1: Data loading and processing using Spark DataFrame

df = spark.read.csv(path, sep=',', inferSchema=True).toDF("num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age", "class")

We process the data using Spark API and split the data into train and test set.

vecAssembler = VectorAssembler(outputCol="features")
vecAssembler.setInputCols(["num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age"])
train_df = vecAssembler.transform(df)

changedTypedf = train_df.withColumn("label", train_df["class"].cast(DoubleType())+lit(1))\
    .select("features", "label")
(trainingDF, validationDF) = changedTypedf.randomSplit([0.9, 0.1])

Step 3: Define classification model using DLlib keras style api

x1 = Input(shape=(8,))
dense1 = Dense(12, activation='relu')(x1)
dense2 = Dense(8, activation='relu')(dense1)
dense3 = Dense(2)(dense2)
model = Model(x1, dense3)

Step 4: Create NNClassifier and Fit NNClassifier

classifier = NNClassifier(model, CrossEntropyCriterion(), [8]) \
    .setOptimMethod(Adam()) \
    .setBatchSize(32) \
    .setMaxEpoch(150)

nnModel = classifier.fit(trainingDF)

Step 5: Evaluate the trained model

predictionDF = nnModel.transform(validationDF).cache()
predictionDF.sample(False, 0.1).show()


evaluator = MulticlassClassificationEvaluator(
    labelCol="label", predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(predictionDF)