Networks. These Are Formed From Trillions of Neurons (Nerve
Networks. These Are Formed From Trillions of Neurons (Nerve
Networks. These Are Formed From Trillions of Neurons (Nerve
Activation functions
So what does an artificial neuron do? Simply put, it calculates a
“weighted sum” of its input, adds a bias and then decides
whether it should be “fired” or not ( yeah right, an activation
function does this, but let’s go with the flow for a moment ).
So consider a neuron.
Step function
The first thing that comes to our minds is how about a threshold
based activation function? If the value of Y is above a certain
value, declare it activated. If it’s less than the threshold, then say
it’s not. Hmm great. This could work!
Well, what we just did is a “step function”, see the below figure.
,
,
and
.
When the activation functions g1 and g2 are identity activation
functions, the single-layer neural net is equivalent to a linear
regression model. Similarly, if g1 and g2are logistic activation
functions, then the single-layer neural net is equivalent to
logistic regression. Because of this correspondence between
single-layer neural networks and linear and logistic regression,
single-layer neural networks are rarely used in place of linear
and logistic regression.
The next most complicated neural network is one with two
layers. This extra layer is referred to as a hidden layer. In
general there is no restriction on the number of hidden layers.
However, it has been shown mathematically that a two-layer
neural network
can accurately reproduce any differentiable function, provided
the number of perceptrons in the hidden layer is unlimited.
However, increasing the number of perceptrons increases the
number of weights that must be estimated in the network, which
in turn increases the execution time for the network. Instead of
increasing the number of perceptrons in the hidden layers to
improve accuracy, it is sometimes better to add additional
hidden layers, which typically reduce both the total number of
network weights and the computational time. However, in
practice, it is uncommon to see neural networks with more than
two or three hidden layers.
recurrent networks
Recurrent Neural Networks (RNNs) are popular models that have shown great promise in many NLP
tasks. But despite their recent popularity I’ve only found a limited number of resources that throughly
explain how RNNs work, and how to implement them. That’s what this tutorial is about. It’s a multi-part
series in which I’m planning to cover the following:
I’m assuming that you are somewhat familiar with basic Neural Networks. If
you’re not, you may want to head over to Implementing A Neural Network
From Scratch, which guides you through the ideas and implementation
behind non-recurrent networks.
WHAT ARE RNNS?
You can think of the hidden state as the memory of the network. captures
information about what happened in all the previous time steps. The output at
step is calculated solely based on the memory at time . As briefly mentioned
above, it’s a bit more complicated in practice because typically can’t capture
information from too many time steps ago.
Unlike a traditional deep neural network, which uses different parameters at
each layer, a RNN shares the same parameters ( above) across all steps. This
reflects the fact that we are performing the same task at each step, just with different
inputs. This greatly reduces the total number of parameters we need to learn.
The above diagram has outputs at each time step, but depending on the task this
may not be necessary. For example, when predicting the sentiment of a sentence we
may only care about the final output, not the sentiment after each word. Similarly,
we may not need inputs at each time step. The main feature of an RNN is its hidden
state, which captures some information about a sequence.
RNNs have shown great success in many NLP tasks. At this point I should
mention that the most commonly used type of RNNs are LSTMs, which are
much better at capturing long-term dependencies than vanilla RNNs are. But
don’t worry, LSTMs are essentially the same thing as the RNN we will develop
in this tutorial, they just have a different way of computing the hidden state.
We’ll cover LSTMs in more detail in a later post. Here are some example
applications of RNNs in NLP (by non means an exhaustive list).