Skip to content

Commit 03a168d

Browse files
authored
Update and rename IntroToCNNs.md to intro-to-cnn.md
1 parent 814aead commit 03a168d

File tree

1 file changed

+17
-58
lines changed

1 file changed

+17
-58
lines changed

contrib/machine-learning/IntroToCNNs.md renamed to contrib/machine-learning/intro-to-cnn.md

Lines changed: 17 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,5 @@
11
# Understanding Convolutional Neural Networks (CNN)
22

3-
## Table of Contents
4-
<details>
5-
<summary>Click to expand</summary>
6-
7-
- [Introduction](#introduction)
8-
- [CNN Architecture](#cnn-architecture)
9-
- <details>
10-
<summary><a href='#convolutional-layer'>Convolutional Layer</a></summary>
11-
12-
- [Input Shape](#input-shape)
13-
- [Stride](#strides)
14-
- [Padding](#padding)
15-
- [Filter](#filters)
16-
- [Output](#output)
17-
18-
</details>
19-
20-
- [Pooling Layer](#pooling-layer)
21-
- [Flatten Layer](#flatten-layer)
22-
- [Dropout Layer](#dropout-layer)
23-
24-
- [Implementation](#implementation)
25-
26-
</details>
27-
283
## Introduction
294
Convolutional Neural Networks (CNNs) are a specialized type of artificial neural network designed primarily for processing structured grid data like images. CNNs are particularly powerful for tasks involving image recognition, classification, and computer vision. They have revolutionized these fields, outperforming traditional neural networks by leveraging their unique architecture to capture spatial hierarchies in images.
305

@@ -52,9 +27,7 @@ The convolutional layer is the core building block of a CNN. The layer's paramet
5227

5328
#### Input Shape
5429
The dimensions of the input image, including the number of channels (e.g., 3 for RGB images & 1 for Grayscale images).
55-
<div align='center'>
56-
<img src='assets/cnn-input_shape.png' width='800'></img>
57-
</div>
30+
![image](assets/cnn-input_shape.png)
5831

5932
- The input matrix is a binary image of handwritten digits,
6033
where '1' marks the pixels containing the digit (ink/grayscale area) and '0' marks the background pixels (empty space).
@@ -63,9 +36,7 @@ where '1' marks the pixels containing the digit (ink/grayscale area) and '0' mar
6336

6437
#### Strides
6538
The step size with which the filter moves across the input image.
66-
<div align='center'>
67-
<img src='assets/cnn-strides.png' width='800'></img>
68-
</div>
39+
![image](assets/cnn-strides.png)
6940

7041
- This visualization will help you understand how the filter (kernel) moves acroos the input matrix with stride values of (3,3) and (2,2).
7142
- A stride of 1 means the filter moves one step at a time, ensuring it covers the entire input matrix.
@@ -74,47 +45,39 @@ The step size with which the filter moves across the input image.
7445

7546
#### Padding
7647
Determines whether the output size is the same as the input size ('same') or reduced ('valid').
77-
<div align='center'>
78-
<img src='assets/cnn-padding.png' width='800'></img>
79-
</div>
48+
![image](assets/cnn-padding.png)
8049

8150
- `Same` padding is preferred in earlier layers to preserve spatial and edge information, as it can help the network learn more detailed features.
8251
- Choose `valid` padding when focusing on the central input region or requiring specific output dimensions.
8352
- Padding value can be determined by $ ( f - 1 ) \over 2 $, where f isfilter size
8453

8554
#### Filters
8655
Small matrices that slide over the input data to extract features.
87-
<div align='center'>
88-
<img src='assets/cnn-filters.png' width='800'></img>
89-
</div>
56+
![image](assets/cnn-filters.png)
9057

9158
- The first filter aims to detect closed loops within the input image, being highly relevant for recognizing digits with circular or oval shapes, such as '0', '6', '8', or '9'.
9259
- The next filter helps in detecting vertical lines, crucial for identifying digits like '1', '4', '7', and parts of other digits that contain vertical strokes.
9360
- The last filter shows how to detect diagonal lines in the input image, useful for identifying the slashes present in digits like '1', '7', or parts of '4' and '9'.
9461

95-
<br>
96-
9762
#### Output
9863
A set of feature maps that represent the presence of different features in the input.
99-
<div align='center'>
100-
<img src='assets/cnn-ouputs.png' width='800'></img>
101-
</div>
64+
![image](assets/cnn-ouputs.png)
10265

10366
- With no padding and a stride of 1, the 3x3 filter moves one step at a time across the 7x5 input matrix. The filter can only move within the original boundaries of the input, resulting in a smaller 5x3 output matrix. This configuration is useful when you want to reduce the spatial dimensions of the feature map while preserving the exact spatial relationships between features.
10467
- By adding zero padding to the input matrix, it is expanded to 9x7, allowing the 3x3 filter to "fit" fully on the edges and corners. With a stride of 1, the filter still moves one step at a time, but now the output matrix is the same size (7x5) as the original input. Same padding is often preferred in early layers of a CNN to preserve spatial information and avoid rapid feature map shrinkage.
10568
- Without padding, the 3x3 filter operates within the original input matrix boundaries, but now it moves two steps at a time (stride 2). This significantly reduces the output matrix size to 3x2. Larger strides are employed to decrease computational cost and the output size, which can be beneficial in speeding up the training process and preventing overfitting. However, they might miss some finer details due to the larger jumps.
10669
- The output dimension of a CNN model is given by, $$ n_{out} = { n_{in} + (2 \cdot p) - k \over s } $$
107-
where, <br>
108-
n<sub>in</sub> = number of input features <br>
109-
p = padding <br>
110-
k = kernel size <br>
70+
where,
71+
n<sub>in</sub> = number of input features
72+
p = padding
73+
k = kernel size
11174
s = stride
11275

113-
- Also, the number of trainable parameters for each layer is given by, $ (n_c \cdot [k \cdot k] \cdot f) + f $
114-
<br> where, <br>
115-
n<sub>c</sub> = number of input channels <br>
116-
k x k = kernel size <br>
117-
f = number of filters <br>
76+
- Also, the number of trainable parameters for each layer is given by, $ (n_c \cdot [k \cdot k] \cdot f) + f $
77+
where,
78+
n<sub>c</sub> = number of input channels
79+
k x k = kernel size
80+
f = number of filters
11881
an additional f is added for bias
11982

12083
### Pooling Layer
@@ -135,18 +98,14 @@ Pooling layers reduce the dimensionality of each feature map while retaining the
13598
The flatten layer converts the 2D matrix data to a 1D vector, which can be fed into a fully connected (dense) layer.
13699
- **Input Shape:** The 2D feature maps from the previous layer.
137100
- **Output:** A 1D vector that represents the same data in a flattened format.
138-
<div align='center'>
139-
<img src='assets/cnn-flattened.png' width='800'></img>
140-
</div>
101+
![image](assets/cnn-flattened.png)
141102

142103
### Dropout Layer
143104
Dropout is a regularization technique to prevent overfitting in neural networks by randomly setting a fraction of input units to zero at each update during training time.
144105
- **Input Shape:** The data from the previous layer.
145106
- **Dropout Rate:** The fraction of units to drop (e.g., 0.5 for 50% dropout).
146107
- **Output:** The same shape as the input, with some units set to zero.
147-
<div align='center'>
148-
<img src='assets/cnn-dropout.png' width='800'></img>
149-
</div>
108+
![image](assets/cnn-dropout.png)
150109

151110
- The updated 0 values represents the dropped units.
152111

@@ -263,4 +222,4 @@ dropout_output = cnn_model.dropout(flattened_output, dropout_rate=0.3)
263222
print("\nDropout Output:\n", dropout_output)
264223
```
265224

266-
Feel free to play around with the parameters!
225+
Feel free to play around with the parameters!

0 commit comments

Comments
 (0)