ANN Model Calculation Example with 5 Data Points
Let's work through the mathematics of a simple Artificial Neural Network (ANN) on a very small
dataset with just 5 data points.
We will go through a single-layer neural network with 1 hidden layer, using one of the most popular
training methods: backpropagation
with gradient descent.
Step 1: Setup a Simple Dataset and Network Architecture
Consider the following dataset with 5 data points, each with 2 features, and a binary target variable:
| Input x1 | Input x2 | Target y |
| -------- | -------- | -------- |
|0 |0 |0 |
|0 |1 |1 |
|1 |0 |1 |
|1 |1 |0 |
| 0.5 | 0.5 |1 |
Let's build an ANN with:
1. 2 inputs.
2. 2 neurons in the hidden layer (with ReLU activation).
3. 1 output neuron (with sigmoid activation for binary classification).
Step 2: Initialize Weights and Biases
Randomly initialize weights and biases for each layer:
1. Hidden Layer Weights W1: a 2x2 matrix, with each element representing the weight for each input
to each neuron in the hidden layer.
W1 = [[0.1, -0.2],
[0.4, 0.3]]
2. Hidden Layer Biases b1: a 1x2 vector.
b1 = [0.0, 0.1]
3. Output Layer Weights W2: a 2x1 matrix for weights from the hidden layer to the output neuron.
W2 = [0.3,
-0.4]
4. Output Layer Bias b2: a single value.
b2 = -0.1
Step 3: Forward Pass Calculation
We will go through the forward pass for the first data point, (0, 0) with target y = 0.
Step 3.1: Hidden Layer Calculations
1. Input to Hidden Layer Neurons:
z1 = x . W1 + b1
For input x = [0, 0]:
z1 = [0, 0] . [[0.1, -0.2], [0.4, 0.3]] + [0.0, 0.1] = [0.0, 0.1]
2. Apply Activation Function (ReLU) for Hidden Layer Outputs:
a1 = ReLU(z1) = ReLU([0.0, 0.1]) = [0.0, 0.1]
Step 3.2: Output Layer Calculations
1. Input to Output Neuron:
z2 = a1 . W2 + b2
z2 = [0.0, 0.1] . [0.3, -0.4] - 0.1 = -0.14
2. Apply Sigmoid Activation Function for Output:
y_hat = sigma(z2) approx 0.465
So, the predicted output for input (0, 0) is approximately 0.465.
Step 4: Loss Calculation (Binary Cross-Entropy)
For binary cross-entropy loss:
Loss = - (y * log(y_hat) + (1 - y) * log(1 - y_hat))
With target y = 0:
Loss approx 0.625
Step 5: Backpropagation and Gradient Descent
We would continue by calculating the gradients of this loss with respect to each weight and bias,
adjusting each based on the learning
rate to minimize the loss.