You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Convolutional Neural Networks (CNNs) are a specialized type of artificial neural network designed primarily for processing structured grid data like images. CNNs are particularly powerful for tasks involving image recognition, classification, and computer vision. They have revolutionized these fields, outperforming traditional neural networks by leveraging their unique architecture to capture spatial hierarchies in images.
30
5
@@ -52,9 +27,7 @@ The convolutional layer is the core building block of a CNN. The layer's paramet
52
27
53
28
#### Input Shape
54
29
The dimensions of the input image, including the number of channels (e.g., 3 for RGB images & 1 for Grayscale images).
- The first filter aims to detect closed loops within the input image, being highly relevant for recognizing digits with circular or oval shapes, such as '0', '6', '8', or '9'.
92
59
- The next filter helps in detecting vertical lines, crucial for identifying digits like '1', '4', '7', and parts of other digits that contain vertical strokes.
93
60
- The last filter shows how to detect diagonal lines in the input image, useful for identifying the slashes present in digits like '1', '7', or parts of '4' and '9'.
94
61
95
-
<br>
96
-
97
62
#### Output
98
63
A set of feature maps that represent the presence of different features in the input.
- With no padding and a stride of 1, the 3x3 filter moves one step at a time across the 7x5 input matrix. The filter can only move within the original boundaries of the input, resulting in a smaller 5x3 output matrix. This configuration is useful when you want to reduce the spatial dimensions of the feature map while preserving the exact spatial relationships between features.
104
67
- By adding zero padding to the input matrix, it is expanded to 9x7, allowing the 3x3 filter to "fit" fully on the edges and corners. With a stride of 1, the filter still moves one step at a time, but now the output matrix is the same size (7x5) as the original input. Same padding is often preferred in early layers of a CNN to preserve spatial information and avoid rapid feature map shrinkage.
105
68
- Without padding, the 3x3 filter operates within the original input matrix boundaries, but now it moves two steps at a time (stride 2). This significantly reduces the output matrix size to 3x2. Larger strides are employed to decrease computational cost and the output size, which can be beneficial in speeding up the training process and preventing overfitting. However, they might miss some finer details due to the larger jumps.
106
69
- The output dimension of a CNN model is given by, $$ n_{out} = { n_{in} + (2 \cdot p) - k \over s } $$
107
-
where, <br>
108
-
n<sub>in</sub> = number of input features <br>
109
-
p = padding <br>
110
-
k = kernel size <br>
70
+
where,
71
+
n<sub>in</sub> = number of input features
72
+
p = padding
73
+
k = kernel size
111
74
s = stride
112
75
113
-
- Also, the number of trainable parameters for each layer is given by, $ (n_c \cdot [k \cdot k] \cdot f) + f $
114
-
<br> where, <br>
115
-
n<sub>c</sub> = number of input channels <br>
116
-
k x k = kernel size <br>
117
-
f = number of filters <br>
76
+
- Also, the number of trainable parameters for each layer is given by, $ (n_c \cdot [k \cdot k] \cdot f) + f $
77
+
where,
78
+
n<sub>c</sub> = number of input channels
79
+
k x k = kernel size
80
+
f = number of filters
118
81
an additional f is added for bias
119
82
120
83
### Pooling Layer
@@ -135,18 +98,14 @@ Pooling layers reduce the dimensionality of each feature map while retaining the
135
98
The flatten layer converts the 2D matrix data to a 1D vector, which can be fed into a fully connected (dense) layer.
136
99
-**Input Shape:** The 2D feature maps from the previous layer.
137
100
-**Output:** A 1D vector that represents the same data in a flattened format.
Dropout is a regularization technique to prevent overfitting in neural networks by randomly setting a fraction of input units to zero at each update during training time.
144
105
-**Input Shape:** The data from the previous layer.
145
106
-**Dropout Rate:** The fraction of units to drop (e.g., 0.5 for 50% dropout).
146
107
-**Output:** The same shape as the input, with some units set to zero.
0 commit comments