Answerkey

KONGU ENGINEERING COLLEGE, PERUNDURAI 638 060
CONTINUOUS ASSESSMENT TEST II

Answerkey
20ADH02 Internet of Things and Machine Learning
PART - A
1. Binary [CO2] [K1]
Multiclass
2. Instead of depending upon one single model, we can use a group of models (ensemble) to make a [CO2] [K2]
prediction or classification decision. This form of learning is called ensemble learning.
Ex: Voting Classifier, Boosting, Bagging
3. Lasso regularization: In this case, N is L1 norm. It uses the modulus of weight as the penalty term [CO2] [K2]
N:
Ridge regularization: In this case, the N is L2 norm, given by the following:
4.  Sigmoid [CO3] [K1]

 Hyperbolic Tangent
 ReLU
 Softmax
 Leaky ReLU
 ELU
 Threshold (any four)
5. The process of learning involves adapting the weights such that a predefined loss function (L) [CO3] [K1]
reduces. If we update the weights in the direction opposite to the gradient of the loss function with
respect to weights, it will ensure that loss function decreases with each update. This algorithm is
called the gradient descent algorithm.
6. If yj is the output of our single neuron for the input vector X and is the output we desire for output [CO3] [K1]
neuron j, then the MSE error is mathematically expressed as
7.  The size of the filters [CO3] [K1]

 The number of filters in the layer
 The number of pixels the filter strides through the image
 The padding to be used while convolving
8. In the truncated-BPTT, the data is processed one timestep at a time and the BPTT weight update is [CO3] [K1]
performed periodically for a fixed number of time steps.
1. Present the sequence of K1 time steps of input and output pairs to the network
2. Calculate and accumulate the errors across K2 2. time steps by unrolling the network
3. Update the weights by rolling up the network
9. [CO3] [K1]
10. Denoising Autoencoders: [CO3] [K2]

 Help the network learn how to denoise an input.
 A denoising autoencoder learns from a corrupted (noisy) input; we feed the encoder network
the noisy input and the reconstructed image from the decoder is compared with the original
denoised input.
Variational Autoencoders:
 VAEs can be used to generate images.
 VAEs have an additional stochastic layer; this layer, after the encoder network, samples the
data using a Gaussian distribution, and the one after the decoder network samples the data
using Bernoulli's distribution.
Part – B
11. i) Logistic regression is an old classification technique. It provides the probability of an event (10) [CO2] [K2]
taking place, given an input value. The events are represented as categorical dependent
variables, and the probability of a particular dependent variable being 1 is given using the
logit function:
Example:
# Import the modules
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
%matplotlib inline
filename = 'winequality-red.csv'
df = pd.read_csv(filename, sep=';')
2
X, Y = df[columns[0:-1]], df[columns[-1]]
scaler = MinMaxScaler()
X_new = scaler.fit_transform(X)
Y.loc[(Y<3)]=3
Y.loc[(Y<6.5) & (Y>=3 )] = 2
Y.loc[(Y>=6.5)] = 1
Y_new = pd.get_dummies(Y) # One hot encode
X_train, X_test, Y_train, y_test = \
train_test_split(X_new, Y_new, test_size=0.4, random_state=333)
class LogisticRegressor:
def __init__(self, d, n, lr=0.001 ):
# Place holders for input-output training data
self.X = tf.placeholder(tf.float32,\
shape=[None,d], name='input')
self.Y = tf.placeholder(tf.float32,\
name='output')
# Variables for weight and bias
self.b = tf.Variable(tf.zeros(n), dtype=tf.float32)
self.W = tf.Variable(tf.random_normal([d,n]),\
dtype=tf.float32)
# The Logistic Regression Model
h = tf.matmul(self.X, self.W) + self.b
self.Ypred = tf.nn.sigmoid(h)
# Loss function
self.loss = cost = tf.reduce_mean(-
tf.reduce_sum(self.Y*tf.log(self.Ypred),\
reduction_indices=1), name = 'cross-entropy-loss')
# Gradient Descent with learning
# rate of 0.05 to minimize loss
optimizer = tf.train.GradientDescentOptimizer(lr)
self.optimize = optimizer.minimize(self.loss)
# Initializing Variables
init_op = tf.global_variables_initializer()
self.sess = tf.Session()
self.sess.run(init_op)
def fit(self, X, Y,epochs=500):
total = []
for i in range(epochs):
_, l = self.sess.run([self.optimize,self.loss],\
feed_dict={self.X: X, self.Y: Y})
total.append(l)
if i%1000==0:
print('Epoch {0}/{1}: Loss {2}'.format(i,epochs,l))
return total
def predict(self, X):
return self.sess.run(self.Ypred, feed_dict={self.X:X})
def get_weights(self):
return self.sess.run([self.W, self.b])
12. i) class MLP: (10) [CO3] [K3]
def __init__(self,n_input=2,n_hidden=4, n_output=1,
act_func=[tf.nn.elu, tf.sigmoid], learning_rate= 0.001):
self.n_input = n_input # Number of inputs to the neuron
self.act_fn = act_func
seed = 123
self.X = tf.placeholder(tf.float32, name='X', shape=[None,n_input])
self.y = tf.placeholder(tf.float32, name='Y')
# Build the graph for a single neuron
# Hidden layer
self.W1 = tf.Variable(tf.random_normal([n_input,n_hidden],\
stddev=2, seed = seed), name = "weights")
3
self.b1 = tf.Variable(tf.random_normal([1, n_hidden], seed =
seed),\
name="bias")
tf.summary.histogram("Weights_Layer_1",self.W1)
tf.summary.histogram("Bias_Layer_1", self.b1)
# Output Layer
self.W2 = tf.Variable(tf.random_normal([n_hidden,n_output],\
stddev=2, seed = 0), name = "weights")
self.b2 = tf.Variable(tf.random_normal([1, n_output], seed =
seed),\
name="bias")
tf.summary.histogram("Weights_Layer_2",self.W2)
tf.summary.histogram("Bias_Layer_2", self.b2)
activity = tf.matmul(self.X, self.W1) + self.b1
h1 = self.act_fn[0](activity)
activity = tf.matmul(h1, self.W2) + self.b2
self.y_hat = self.act_fn[1](activity)
error = self.y - self.y_hat
self.loss = tf.reduce_mean(tf.square(error))\
+ 0.6*tf.nn.l2_loss(self.W1)
self.opt = tf.train.GradientDescentOptimizer(learning_rate\
=learning_rate).minimize(self.loss)
tf.summary.scalar("loss",self.loss)
init = tf.global_variables_initializer()
self.sess = tf.Session()
self.sess.run(init)
self.merge = tf.summary.merge_all()
self.writer = tf.summary.FileWriter("logs/",\
graph=tf.get_default_graph())
def train(self, X, Y, X_val, Y_val, epochs=100):
epoch = 0
X, Y = shuffle(X,Y)
loss = []
loss_val = []
while epoch < epochs:
# Run the optimizer for the training set
merge, _, l = self.sess.run([self.merge,self.opt,self.loss],\
feed_dict={self.X: X, self.y: Y})
l_val = self.sess.run(self.loss, feed_dict=\
{self.X: X_val, self.y: Y_val})
loss.append(l)
loss_val.append(l_val)
self.writer.add_summary(merge, epoch)
if epoch % 10 == 0:
print("Epoch {}/{} training loss: {} Validation loss {}".\
format(epoch,epochs,l, l_val ))
epoch += 1
return loss, loss_val
def predict(self, X):
return self.sess.run(self.y_hat, feed_dict={self.X: X})
self.loss = tf.reduce_mean(tf.square(error)) + 0.6*tf.nn.l2_loss(self.W1)
_, d = X_train.shape
_, n = Y_train.shape
model = MLP(n_input=d, n_hidden=15, n_output=n)
loss, loss_val = model.train(X_train, Y_train, X_val, y_val, 6000)
13. i) LSTM: (10) [CO3] [K1]
4
The Forget Gate f(.) controls the amount of short-term memory, h, to be remembered for
further flow in the present time step.
The Input Gate i(.) controls the amount of input and working memory influencing the output
of the cell.
The Output Gate o(.) controls the amount of information that's used for updating the
shortterm memory.
GRU:
14. i) Algorithm: (10) [CO3] [K2]

1. Apply the input to the network
2. Propagate the input forward and calculate the output of the network
3. Calculate the loss at the output, and then using the preceding expressions,
calculate weight updates for output layer neuron
4. Using the weighted errors at output layers, calculate the weight updates for
hidden layer
5. Update all the weights
6. Repeat the steps for other training examples
5
The loss function at the output neuron
The weight connecting hidden neuron k to the output neuron j would be given as follows:
Applying the chain rule of differentiation
The weight update connecting input neuron i to the hidden neuron k of hidden layer n can be
written as the following:
Again applying the chain rule

Answerkey

Uploaded by

Copyright:

Available Formats

Answerkey

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Answerkey

Uploaded by

Copyright:

Available Formats

KONGU ENGINEERING COLLEGE, PERUNDURAI 638 060

CONTINUOUS ASSESSMENT TEST II

20ADH02 Internet of Things and Machine Learning

Ridge regularization: In this case, the N is L2 norm, given by the following:

4.  Sigmoid [CO3] [K1]

7.  The size of the filters [CO3] [K1]

10. Denoising Autoencoders: [CO3] [K2]

13. i) LSTM: (10) [CO3] [K1]

14. i) Algorithm: (10) [CO3] [K2]

Applying the chain rule of differentiation

Again applying the chain rule

You might also like