3 Getting Started With BigDL

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Module 3

Learning Objectives
You will be able to:

▪ Understand BigDL runtime model


▪ Explore options of running BigDL in local mode and distributed mode
▪ Use BigDL utilities
▪ Set up and run BigDL

2
BigDL on Apache Spark*

 BigDL utilizes Apache Spark* runtime

*Other names and brands may be claimed as the property of others.

4
BigDL on Apache Spark*

 BigDL runs as
standard Apache
Spark* jobs
- No changes to
Apache Spark*
required
 Each iteration of
training runs as an
Apache Spark* job
*Other names and brands may be claimed as the property of others.

5
Running BigDL

Native Docker* Cloud


Supported
Pre-Reqs Linux*/Mac* Linux / Mac / cloud
Windows* Pro platforms
Recommended Developers Users Users
For
Ease of use Easy / Medium Easy Easy / Medium

*Other names and brands may be claimed as the property of others.

6
Running BigDL: Natively

Requirements
- JDK 8,
- Apache Spark* v1.6 and v2.x (2.2 or 2.3 and above recommended)
- BigDLa

*Other names and brands may be claimed as the property of others.

7
Apache Spark* Run Modes

- Local Mode : used to develop applications locally / laptop


- Distributed mode : run the application in production mode

*Other names and brands may be claimed as the property of others.

8
Apache Spark* Run Mode: Local

Used for developing programs


on your laptop
Work with small subset of data
that will fit in the laptop

*Other names and brands may be claimed as the property of others.

9
Apache Spark* Run Mode: Distributed

Once the code is ready


we can deploy the code
to cluster

*Other names and brands may be claimed as the property of others.

10
Running BigDL Natively on Apache Spark*

# 3.1: running Spark Shell with BigDL

# install bigDL
export BIGDL_HOME="/path/to/bigDL"

# install Spark
export SPARK_HOME="/path/to/spark"

# option 1 : start Shell in local mode


${BIGDL_HOME}/bin/spark-shell-with-bigdl.sh --master
local[*]

# option 2 : start Shell in distributed mode


${BIGDL_HOME}/bin/spark-shell-with-bigdl.sh --master
spark://spark-master-host:7077
*Other names and brands may be claimed as the property of others.

11
BigDL and Jupyter*

Jupyter* notebook allows quick development of applications


Jupyter* supports Apache Spark* as a kernel
We add BigDL libraries before launching Jupyter*

Apache Spark*
Jupyter* Kernel + BigDL
Libraries

*Other names and brands may be claimed as the property of others.

12
Running Jupyter* With BigDL

# 3.1: running Jupyter with BigDL

# install bigDL
export BIGDL_HOME="/path/to/bigDL"

# install Spark
export SPARK_HOME="/path/to/spark"

# start Jupyter in local mode


${BIGDL_HOME}/bin/jupyter-with-bigdl.sh --master local[*]

# go to jupyter home in browser


*Other names and brands may be claimed as the property of others.

13
Running BigDL on Docker*

 This is the recommended approach for users of BigDL


 The Docker* container has all dependencies installed and ready to run
 See lab notes for information on official Docker image for BigDL

*Other names and brands may be claimed as the property of others.

14
Running BigDL on Docker*
# 3.3 - running BigDL on Docker
## TODO : replace pointers from ElephantScale --> Intel

# Step 1 : download the docker image


docker pull elephantscale/bigdl # TODO

# Step 2 : download bigdl-labs


git clone https://github.com/elephantscale/bigdl-labs # TODO

# Step 3 : run docker


cd bigdl-labs
./run-bigdl-docker.sh elephantscale/bigdl # TODO

*Other names and brands may be claimed as the property of others.

15
Running BigDL in the Cloud

BigDL can be run on the following cloud platforms


- Amazon Cloud Services (AWS)*
- Google Cloud*
- Microsoft Azure*
- IBM cloud*
- Other (Ali*, KingSoft*)

*Other names and brands may be claimed as the property of others.

16
BigDL: 'Hello World' – Python*

# 3.4 : hello world


from bigdl.util.common import *
from pyspark import SparkContext
from bigdl.nn.layer import *
import bigdl.version

# create sparkcontext with bigdl configuration


sc = SparkContext.getOrCreate(conf=create_spark_conf().setMaster("local[*]"))
init_engine() # prepare the bigdl environment
print("BigDL version : " , bigdl.version.__version__) # Get the current BigDL version
linear = Linear(2, 3) # Try to create a Linear layer

*Other names and brands may be claimed as the property of others.


Layers

Here are the layers in BigDL - Normalization Layer


- Simple Layer - Dropout Layer
- Convolution Layer - Distance Layer
- Pooling Layer - Embedding Layer
- Recurrent Layer - Merge/Split Layer
- Recursive Layer - Math Layer
- Sparse Layer
- Padding Layer
Layers: Input Layer

Just passes inputs through without change


Not needed with the sequential container
# 3.7 - Input Layer creating: createInput
from bigdl.nn.layer import Input
import numpy as np input:
[[0.34014864 0.77003297]
module = Input() [0.4424559 0.63356693]
input = np.random.rand(3,2) [0.90753477 0.969264 ]]
print("input:\n", input)
output:
output = module.element().forward(input) [[0.34014863 0.77003294]
print ("output:\n", output) [0.4424559 0.6335669 ]
[0.9075348 0.96926403]]
Layers: Echo Layer (Utility)

Good for debugging


Prints activation and gradients in topology

## 3.10 - Echo layer input:


from bigdl.nn.layer import Echo [[0.97711168 0.92684554]
[0.62456399 0.22271812]
import numpy as np
[0.29430283 0.44846653]]

input = np.random.rand(3,2) creating: createEcho


print("input:\n", input)
output:
echo = Echo() [[0.9771117 0.92684555]
[0.624564 0.22271812]
output = echo.forward(input)
[0.29430282 0.44846654]]
print("output:\n", output)
Layers: Linear

Provides a linear transformation to the data


- Fully Connected
y = Wx + b
This is also used for hidden layers
- same as `layers.Dense` in Keras*
Linear layer + Activation Function can do nonlinear transformation

*Other names and brands may be claimed as the property of others.


Layers: Linear

from bigdl.nn.layer import Linear


import numpy as np

model = Sequential()
# Hidden layer with ReLu
model.add(Linear(4,4)))
model.add(ReLU())
# Output layer
model.add(Linear(4,3))
model.add(LogSoftMax())
Layers: Linear
Hidden Layers

Densely connected hidden layers are also linear


Every layer performs a single linear transformation
Reverse Layer
This flips / reverses input on an axis
Usually used for formatting the data properly

creating: createInput
module = Reverse(dimension=1)
input = np.random.rand(3,2) input:
print("input:\n", input) [[0.34014864 0.77003297]
[0.4424559 0.63356693]
[0.90753477 0.969264 ]]
output =
module.element().forward(input) output:
print ("output:\n", output) [ [0.9075348 0.96926403]
[0.4424559 0.6335669
[0.34014863 0.77003294]
]
Reverse Layer
Reshape Layer
Reshapes according to new dimensions
Often used for 2-D to 1-D transformation in image recognition
Between convolutional layers and fully connected layers

## 3.10 - Echo layer input:


from bigdl.nn.layer import Reshape [[0.97711168 0.92684554]
[0.62456399 0.22271812]
import numpy as np
[0.29430283 0.44846653]]

input = np.random.rand(3,2) creating: createEcho


print("input:\n", input)
output:
reshape = Reshape(input.shape, (6,1)) [0.9771117 0.92684555 0.624564 0.22271812
0.29430282 0.44846654]
output = echo.forward(input)
print("output:\n", output)
Reshape Layer
Activations

The following Activation functions are supported in BigDL:


• SoftMax*
• SIGMOID*
• Tanh
• ReLU
• Leaky ReLU
• And more
Activation Function: SIGMOID*

Comes from logistic regression


Reduces the input values to an output
of 0 to 1 (probability)
SIGMOID* was the oldest / first
Activation function used
Now eclipsed by other Activation
functions that produce better results
Activation: SIGMOID*

## 3.13 - Activation : Sigmoid creating: createSigmoid


from bigdl.nn.layer import Sigmoid
input sigmoid
layer = Sigmoid()
input = np.array([-100, -2,-1,0,1,2,100])
-100 0.000000
output = layer.forward(input) -2 0.119203
-1 0.268941
# pretty print 0 0.500000
import pandas as pd 1 0.731059
print(pd.DataFrame({'input' : input, 'sigmoid' 2 0.880797
:
output}).to_string(index=False))
100 1.000000
Activation Function: Tanh
Tanh function ranges from -1 to +1
(SIGMOID* function is between 0 and
+1)
So Tanh function can better deal with
negative numbers compared to
SIGMOID*
Activation: Tanh

## 3.14 - Activation : Thanh creating: createTanh


from bigdl.nn.layer import Tanh input tanh
-100 -1.000000
layer = Tanh()
input = np.array([-100, -2,-1,0,1,2,100])
-2 -0.964028
output = layer.forward(input) -1 -0.761594
0 0.000000
# pretty print 1 0.761594
import pandas as pd 2 0.964028
print(pd.DataFrame({'input' : input, 'tanh' : 100 1.000000
output}).to_string(index=False))
Activation Function: Rectified Linear Unit (ReLU)

ReLU is a very simple and effective


Activation function
If input is above zero, it passes it on
If input goes below zero, it is clipped
at zero
ReLU is the current state of the art,
proven to work in many different
datasets
ReLu is also very fast to compute!
Activation: ReLU

## 3.12 - Activation : ReLU creating: createReLU


from bigdl.nn.layer import ReLU
input ReLU
relu = ReLU(ip=False)
input = np.array ([-1,0,1,2])
-1 0.0
output = relu.forward(input) 0 0.0
1 1.0
# pretty print 2 2.0
import pandas as pd
print(pd.DataFrame({'input' : input, 'ReLU' :
output}).to_string(index=False))
Activation: SoftMax*

SoftMax* function is applied to an n-dimensional input Tensor, rescaling so that


the elements of the n-dimensional output Tensor lie in the range (0, 1)
These are probabilities that add up to 1.0

digit 0 1 2 3 4 5 6 7 8 9
(outcome)
Probability 0.8 0 0 0 0 0 0 0 0.1 0.1
Activation: SoftMax*

## 3.11 - Activation : SoftMax creating: createSoftMax


from bigdl.nn.layer import SoftMax
import numpy as np input softmax
layer = SoftMax()
10.0 0.333333
input = np.ones(3)*10 10.0 0.333333
grad_output = np.array([1.0, 0.0, 0.0]) 10.0 0.333333
output = layer.forward(input)

# pretty print
import pandas as pd
print(pd.DataFrame({'input' : input, 'softmax'
:
output}).to_string(index=False))
IRIS* Dataset

We will use IRIS* – a well-known dataset to illustrate these concepts

Input (4 features) Output Classification (3 classes)

- Petal length - IRIS* setosa


- Petal width - IRIS* virginica
- Sepal length - IRIS* versicolor
- Sepal width
Containers

Containers are used to organize Layers


Containers are derived from abstract class `Container`
Container Examples
- Sequential Container
- Functional Container
Sequential Container

Used when all of the layers feed into the next


Example: Feedforward Neural Networks
RNN/LSTM is OK if no recurrence between layers

# 3.5 - Sequential container


from bigdl.nn.layer import Sequential
from bigdl.nn.layer import Linear

val seq = Sequential()


seq.add(Linear(10, 25)) # Add Linear Layer
Sequential Container
Branches

Sequential containers can have branches; still must be feedforward


Functional Container

Also known as graph container


Graph container allows any network topology
- Graph of nodes
- Can allow cycles (RNN)

Example: Linear -> SIGMOID* -> SoftMax*


Functional Container

Linear -> SIGMOID* -> SoftMax*

## 3.6 - Functional Container


from bigdl.nn.layer import Sequential
from bigdl.nn.layer import Softmax
from bigdl.nn.layer import Sigmoid
from bigdl.nn.layer import Model

linear = Linear(10,15)()
sigmoid = Sigmoid()(linear)
softmax = Softmax()(sigmoid)
model = Model([linear], [softmax])
Lab 3.1: Getting Started With BigDL

Overview:
- Getting started with BigDL environment

Run time:
- 15 mins

Instructions
- Follow lab instructions
Lab 3.2: Testing BigDL Environment

Overview:
- Testing BigDL environment

Run time:
- 10 mins

Instructions
- Follow lab instructions
Lab 3.3: Data Loading and Exploration using
Apache Spark*
Overview:
- Use Apache Spark to load data, clean up and preform preliminary analysys
Run time:
- 30 mins
Instructions
- Follow lab instructions

*Other names and brands may be claimed as the property of others.


Summary

We learned about:
- How BigDL and Spark* work together
- How to run BigDL in Docker*
- Layers and Containers in BigDL
- How to get started with BigDL

*Other names and brands may be claimed as the property of others.


Resources

- https://bigdl-project.github.io/0.7.0/

- https://github.com/intel-analytics/BigDL

You might also like