You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: contrib/machine-learning/IntroToCNNs.md
+21-36Lines changed: 21 additions & 36 deletions
Original file line number
Diff line number
Diff line change
@@ -26,25 +26,15 @@
26
26
</details>
27
27
28
28
## Introduction
29
-
30
29
Convolutional Neural Networks (CNNs) are a specialized type of artificial neural network designed primarily for processing structured grid data like images. CNNs are particularly powerful for tasks involving image recognition, classification, and computer vision. They have revolutionized these fields, outperforming traditional neural networks by leveraging their unique architecture to capture spatial hierarchies in images.
31
30
32
31
### Why CNNs are Superior to Traditional Neural Networks
33
-
34
-
1.**Localized Receptive Fields**:
35
-
- CNNs use convolutional layers that apply filters to local regions of the input image. This localized connectivity ensures that the network learns spatial hierarchies and patterns, such as edges and textures, which are essential for image recognition tasks.
36
-
37
-
2.**Parameter Sharing**:
38
-
- In CNNs, the same filter (set of weights) is used across different parts of the input, significantly reducing the number of parameters compared to fully connected layers in traditional neural networks. This not only lowers the computational cost but also mitigates the risk of overfitting.
39
-
40
-
3.**Translation Invariance**:
41
-
- Due to the shared weights and pooling operations, CNNs are inherently invariant to translations of the input image. This means that they can recognize objects even when they appear in different locations within the image.
42
-
43
-
4.**Hierarchical Feature Learning**:
44
-
- CNNs automatically learn a hierarchy of features from low-level features like edges to high-level features like shapes and objects. Traditional neural networks, on the other hand, require manual feature extraction which is less effective and more time-consuming.
32
+
1.**Localized Receptive Fields**: CNNs use convolutional layers that apply filters to local regions of the input image. This localized connectivity ensures that the network learns spatial hierarchies and patterns, such as edges and textures, which are essential for image recognition tasks.
33
+
2.**Parameter Sharing**: In CNNs, the same filter (set of weights) is used across different parts of the input, significantly reducing the number of parameters compared to fully connected layers in traditional neural networks. This not only lowers the computational cost but also mitigates the risk of overfitting.
34
+
3.**Translation Invariance**: Due to the shared weights and pooling operations, CNNs are inherently invariant to translations of the input image. This means that they can recognize objects even when they appear in different locations within the image.
35
+
4.**Hierarchical Feature Learning**: CNNs automatically learn a hierarchy of features from low-level features like edges to high-level features like shapes and objects. Traditional neural networks, on the other hand, require manual feature extraction which is less effective and more time-consuming.
45
36
46
37
### Use Cases of CNNs
47
-
48
38
-**Image Classification**: Identifying objects within an image (e.g., classifying a picture as containing a cat or a dog).
49
39
-**Object Detection**: Detecting and locating objects within an image (e.g., finding faces in a photo).
50
40
-**Image Segmentation**: Partitioning an image into segments or regions (e.g., dividing an image into different objects and background).
@@ -55,7 +45,6 @@ Convolutional Neural Networks (CNNs) are a specialized type of artificial neural
55
45
### Let's start by understanding the basic architecture of CNNs.
56
46
57
47
## CNN Architecture
58
-
59
48
Convolution layers, pooling layers, and fully connected layers are just a few of the many building blocks that CNNs use to automatically and adaptively learn spatial hierarchies of information through backpropagation.
60
49
61
50
### Convolutional Layer
@@ -67,42 +56,31 @@ The dimensions of the input image, including the number of channels (e.g., 3 for
- This visualization will help you understand how the filter (kernel) moves acroos the input matrix with stride values of 3 and 2.
86
-
70
+
- This visualization will help you understand how the filter (kernel) moves acroos the input matrix with stride values of (3,3) and (2,2).
87
71
- A stride of 1 means the filter moves one step at a time, ensuring it covers the entire input matrix.
88
-
89
72
- However, with larger strides (like 3 or 2 in this example), the filter may not cover all elements, potentially missing some information.
90
-
91
73
- While this might seem like a drawback, higher strides are often used to reduce computational cost and decrease the output size, which can be beneficial in speeding up the training process and preventing overfitting.
92
74
93
-
<br>
94
-
95
75
#### Padding
96
76
Determines whether the output size is the same as the input size ('same') or reduced ('valid').
-`Same` padding is preferred in earlier layers to preserve spatial and edge information, as it can help the network learn more detailed features.
102
-
103
82
- Choose `valid` padding when focusing on the central input region or requiring specific output dimensions.
104
-
105
-
<br>
83
+
- Padding value can be determined by $ ( f - 1 ) \over 2 $, where f isfilter size
106
84
107
85
#### Filters
108
86
Small matrices that slide over the input data to extract features.
@@ -111,9 +89,7 @@ Small matrices that slide over the input data to extract features.
111
89
</div>
112
90
113
91
- The first filter aims to detect closed loops within the input image, being highly relevant for recognizing digits with circular or oval shapes, such as '0', '6', '8', or '9'.
114
-
115
92
- The next filter helps in detecting vertical lines, crucial for identifying digits like '1', '4', '7', and parts of other digits that contain vertical strokes.
116
-
117
93
- The last filter shows how to detect diagonal lines in the input image, useful for identifying the slashes present in digits like '1', '7', or parts of '4' and '9'.
118
94
119
95
<br>
@@ -125,12 +101,21 @@ A set of feature maps that represent the presence of different features in the i
125
101
</div>
126
102
127
103
- With no padding and a stride of 1, the 3x3 filter moves one step at a time across the 7x5 input matrix. The filter can only move within the original boundaries of the input, resulting in a smaller 5x3 output matrix. This configuration is useful when you want to reduce the spatial dimensions of the feature map while preserving the exact spatial relationships between features.
128
-
129
104
- By adding zero padding to the input matrix, it is expanded to 9x7, allowing the 3x3 filter to "fit" fully on the edges and corners. With a stride of 1, the filter still moves one step at a time, but now the output matrix is the same size (7x5) as the original input. Same padding is often preferred in early layers of a CNN to preserve spatial information and avoid rapid feature map shrinkage.
130
-
131
105
- Without padding, the 3x3 filter operates within the original input matrix boundaries, but now it moves two steps at a time (stride 2). This significantly reduces the output matrix size to 3x2. Larger strides are employed to decrease computational cost and the output size, which can be beneficial in speeding up the training process and preventing overfitting. However, they might miss some finer details due to the larger jumps.
132
-
133
-
<br>
106
+
- The output dimension of a CNN model is given by, $$ n_{out} = { n_{in} + (2 \cdot p) - k \over s } $$
107
+
where, <br>
108
+
n<sub>in</sub> = number of input features <br>
109
+
p = padding <br>
110
+
k = kernel size <br>
111
+
s = stride
112
+
113
+
- Also, the number of trainable parameters for each layer is given by, $ (n_c \cdot [k \cdot k] \cdot f) + f $
114
+
<br> where, <br>
115
+
n<sub>c</sub> = number of input channels <br>
116
+
k x k = kernel size <br>
117
+
f = number of filters <br>
118
+
an additional f is added for bias
134
119
135
120
### Pooling Layer
136
121
Pooling layers reduce the dimensionality of each feature map while retaining the most critical information. The most common form of pooling is max pooling.
0 commit comments