20ADH02 Internet of Things and Machine Learning

1. Binary [CO2] [K1]


2. Instead of depending upon one single model, we can use a group of models (ensemble) to make a [CO2] [K2]
prediction or classification decision. This form of learning is called ensemble learning.
Ex: Voting Classifier, Boosting, Bagging

3. Lasso regularization: In this case, N is L1 norm. It uses the modulus of weight as the penalty term [CO2] [K2]

Ridge regularization: In this case, the N is L2 norm, given by the following:

4.  Sigmoid [CO3] [K1]

 Hyperbolic Tangent
 ReLU
 Softmax
 Leaky ReLU
 Threshold (any four)
5. The process of learning involves adapting the weights such that a predefined loss function (L) [CO3] [K1]
reduces. If we update the weights in the direction opposite to the gradient of the loss function with
respect to weights, it will ensure that loss function decreases with each update. This algorithm is
called the gradient descent algorithm.

6. If yj is the output of our single neuron for the input vector X and is the output we desire for output [CO3] [K1]
neuron j, then the MSE error is mathematically expressed as

7.  The size of the filters [CO3] [K1]

 The number of filters in the layer
 The number of pixels the filter strides through the image
 The padding to be used while convolving
8. In the truncated-BPTT, the data is processed one timestep at a time and the BPTT weight update is [CO3] [K1]
performed periodically for a fixed number of time steps.
1. Present the sequence of K1 time steps of input and output pairs to the network
2. Calculate and accumulate the errors across K2 2. time steps by unrolling the network
3. Update the weights by rolling up the network
9. [CO3] [K1]

10. Denoising Autoencoders: [CO3] [K2]

 Help the network learn how to denoise an input.
 A denoising autoencoder learns from a corrupted (noisy) input; we feed the encoder network
the noisy input and the reconstructed image from the decoder is compared with the original
denoised input.
Variational Autoencoders:
 VAEs can be used to generate images.
 VAEs have an additional stochastic layer; this layer, after the encoder network, samples the
data using a Gaussian distribution, and the one after the decoder network samples the data
using Bernoulli's distribution.
Part – B

11. i) Logistic regression is an old classification technique. It provides the probability of an event (10) [CO2] [K2]
taking place, given an input value. The events are represented as categorical dependent
variables, and the probability of a particular dependent variable being 1 is given using the
logit function:

# Import the modules
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
%matplotlib inline
filename = 'winequality-red.csv'
df = pd.read_csv(filename, sep=';')
X, Y = df[columns[0:-1]], df[columns[-1]]
scaler = MinMaxScaler()
X_new = scaler.fit_transform(X)
Y.loc[(Y<6.5) & (Y>=3 )] = 2
Y.loc[(Y>=6.5)] = 1
Y_new = pd.get_dummies(Y) # One hot encode
X_train, X_test, Y_train, y_test = \
train_test_split(X_new, Y_new, test_size=0.4, random_state=333)

class LogisticRegressor:
def __init__(self, d, n, lr=0.001 ):
# Place holders for input-output training data
self.X = tf.placeholder(tf.float32,\
shape=[None,d], name='input')
self.Y = tf.placeholder(tf.float32,\
# Variables for weight and bias
self.b = tf.Variable(tf.zeros(n), dtype=tf.float32)
self.W = tf.Variable(tf.random_normal([d,n]),\
# The Logistic Regression Model
h = tf.matmul(self.X, self.W) + self.b
self.Ypred = tf.nn.sigmoid(h)
# Loss function
self.loss = cost = tf.reduce_mean(-
reduction_indices=1), name = 'cross-entropy-loss')
# Gradient Descent with learning
# rate of 0.05 to minimize loss
optimizer = tf.train.GradientDescentOptimizer(lr)
self.optimize = optimizer.minimize(self.loss)
# Initializing Variables
init_op = tf.global_variables_initializer()
self.sess = tf.Session()
def fit(self, X, Y,epochs=500):
total = []
for i in range(epochs):
_, l =[self.optimize,self.loss],\
feed_dict={self.X: X, self.Y: Y})
if i%1000==0:
print('Epoch {0}/{1}: Loss {2}'.format(i,epochs,l))
return total
def predict(self, X):
return, feed_dict={self.X:X})
def get_weights(self):
return[self.W, self.b])
12. i) class MLP: (10) [CO3] [K3]
def __init__(self,n_input=2,n_hidden=4, n_output=1,
act_func=[tf.nn.elu, tf.sigmoid], learning_rate= 0.001):
self.n_input = n_input # Number of inputs to the neuron
self.act_fn = act_func
seed = 123
self.X = tf.placeholder(tf.float32, name='X', shape=[None,n_input])
self.y = tf.placeholder(tf.float32, name='Y')
# Build the graph for a single neuron
# Hidden layer
self.W1 = tf.Variable(tf.random_normal([n_input,n_hidden],\
stddev=2, seed = seed), name = "weights")
self.b1 = tf.Variable(tf.random_normal([1, n_hidden], seed =
tf.summary.histogram("Bias_Layer_1", self.b1)
# Output Layer
self.W2 = tf.Variable(tf.random_normal([n_hidden,n_output],\
stddev=2, seed = 0), name = "weights")
self.b2 = tf.Variable(tf.random_normal([1, n_output], seed =

tf.summary.histogram("Bias_Layer_2", self.b2)
activity = tf.matmul(self.X, self.W1) + self.b1
h1 = self.act_fn[0](activity)
activity = tf.matmul(h1, self.W2) + self.b2
self.y_hat = self.act_fn[1](activity)
error = self.y - self.y_hat
self.loss = tf.reduce_mean(tf.square(error))\
+ 0.6*tf.nn.l2_loss(self.W1)
self.opt = tf.train.GradientDescentOptimizer(learning_rate\
init = tf.global_variables_initializer()
self.sess = tf.Session()
self.merge = tf.summary.merge_all()
self.writer = tf.summary.FileWriter("logs/",\
def train(self, X, Y, X_val, Y_val, epochs=100):
epoch = 0
X, Y = shuffle(X,Y)
loss = []
loss_val = []
while epoch &amp;lt; epochs:
# Run the optimizer for the training set
merge, _, l =[self.merge,self.opt,self.loss],\
feed_dict={self.X: X, self.y: Y})
l_val =, feed_dict=\
{self.X: X_val, self.y: Y_val})
self.writer.add_summary(merge, epoch)
if epoch % 10 == 0:
print("Epoch {}/{} training loss: {} Validation loss {}".\
format(epoch,epochs,l, l_val ))
epoch += 1
return loss, loss_val
def predict(self, X):
return, feed_dict={self.X: X})
self.loss = tf.reduce_mean(tf.square(error)) + 0.6*tf.nn.l2_loss(self.W1)
_, d = X_train.shape
_, n = Y_train.shape
model = MLP(n_input=d, n_hidden=15, n_output=n)
loss, loss_val = model.train(X_train, Y_train, X_val, y_val, 6000)

13. i) LSTM: (10) [CO3] [K1]

The Forget Gate f(.) controls the amount of short-term memory, h, to be remembered for
further flow in the present time step.

The Input Gate i(.) controls the amount of input and working memory influencing the output
of the cell.

The Output Gate o(.) controls the amount of information that's used for updating the
shortterm memory.


14. i) Algorithm: (10) [CO3] [K2]

1. Apply the input to the network
2. Propagate the input forward and calculate the output of the network
3. Calculate the loss at the output, and then using the preceding expressions,
calculate weight updates for output layer neuron
4. Using the weighted errors at output layers, calculate the weight updates for
hidden layer
5. Update all the weights
6. Repeat the steps for other training examples

The loss function at the output neuron

The weight connecting hidden neuron k to the output neuron j would be given as follows:

Applying the chain rule of differentiation

The weight update connecting input neuron i to the hidden neuron k of hidden layer n can be
written as the following:

Again applying the chain rule

