0% found this document useful (0 votes)
4 views12 pages

Week2_PytorchIntro.ipynb - Colaboratory

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views12 pages

Week2_PytorchIntro.ipynb - Colaboratory

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

keyboard_arrow_down Outline

PyTorch
What are tensors
Initialising, slicing, reshaping tensors
Numpy and PyTorch interfacing
GPU support for PyTorch + Enabling GPUs on Google Colab
Speed comparisons, Numpy -- PyTorch -- PyTorch on GPU
Autodiff concepts and application
Writing a basic learning loop using autograd
Exercises

1. PyTorch is one of the latest deep learning frameworks and was developed by the team at
Facebook and open sourced on GitHub in 2017
2. TensorFlow is an open-source deep learning framework created by developers at Google
and released in 2015.
3. PyTorch is an optimized Deep Learning tensor library based on Python and Torch and is
mainly used for applications using GPUs and CPUs. PyTorch is favored over other Deep
Learning frameworks like TensorFlow and Keras since it uses dynamic computation
graphs and is completely Pythonic
4. The two main features of PyTorch are:

Tensor Computation (similar to NumPy) with strong GPU (Graphical Processing Unit)
acceleration support
Automatic Differentiation for creating and training deep neural networks

5. While PyTorch started off as a framework focused on research, beginning with the 1.0
release, a set of production-oriented features were added that today make PyTorch an
ideal end-to-end platform from research to large-scale production.
6. Deploying PyTorch in Python via a REST API with Flask

import torch
import numpy as np
import matplotlib.pyplot as plt

keyboard_arrow_down Initialise tensors


Tensors are also like multi Dimensional arrays. But tensors help to build the relation between
other tensors which helps to perform book keeping when especially deals with the calulating
the gradients in the reverse order.

PyTorch provides us with a data structure called a Tensor, which is very similar to NumPy’s
ndarray. But unlike the latter, tensors can tap into the resources of a GPU to significantly speed
up matrix operations.

keyboard_arrow_down Representation of Tensors


1. Scalar
2. 1D tensor
3. 2D tensor

x = torch.ones(3, 2) # 3 rows 2 colums # [3,2] # defaults are floats values


print(x)
print(x.shape)
x = torch.zeros(3, 2) # tensor of zeros with size [3,2]
print(x)
x = torch.rand(3, 2)
print(x)

tensor([[1., 1.],
[1., 1.],
[1., 1.]])
torch.Size([3, 2])
tensor([[0., 0.],
[0., 0.],
[0., 0.]])
tensor([[0.8817, 0.6066],
[0.4131, 0.7344],
[0.0328, 0.0881]])

x = torch.empty(3, 2) # Create an empty tensor with some random values


print(x)
y = torch.zeros_like(x) # to Fillempty tensor with zeros use zeros_like
print(y)

tensor([[2.9350e-15, 4.5125e-41],
[2.9350e-15, 4.5125e-41],
[3.6659e-30, 4.5125e-41]])
tensor([[0., 0.],
[0., 0.],
[0., 0.]])

x = torch.linspace(0, 1, steps=5) # create a tensor of size [5]


# 1 d array of size 5 with linearly separaable value from 0 to 1
print(x)

tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])

x = torch.tensor([[1, 2],
[3, 4],
[5, 6]])
# we can also create a tensor with lists
# it create a tensor of [3,2]
print(x.shape)

torch.Size([3, 2])

keyboard_arrow_down Slicing tensors


print(x.size())
print(x[:, 1])
print(x[:, 1].shape)

# all the rows in the first column # return is also a tensor with 1d # size is 3
#First row all the elements
print(x[0, :])

torch.Size([3, 2])
tensor([2, 4, 6])
torch.Size([3])
tensor([1, 2])

y = x[1, 1]
# gets a single element but again its a tensor
# to get the single value from the tensor use item()
# only one item tensor is converted to scalar

print(y)
print(y.item())
z=x[1:]
print(z)
print(z[0].item()) # error only one item tensor can be converted to scalar

tensor(4)
4
tensor([[3, 4],
[5, 6]])
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-7-8255e4500429> in <cell line: 10>()
8 z=x[1:]
9 print(z)
---> 10 print(z[0].item()) # error only one item tensor can be converted to
scalar

RuntimeError: a Tensor with 2 elements cannot be converted to Scalar


keyboard_arrow_down Reshaping tensors
print(x)
# to reshape the tensor use view on it
y = x.view(2, 3)
print(y)

tensor([[1, 2],
[3, 4],
[5, 6]])
tensor([[1, 2, 3],
[4, 5, 6]])

keyboard_arrow_down Simple Tensor Operations


x = torch.ones([3, 2]) # Create a tensor with 3 by 2
y = torch.ones([3, 2])
z = x + y
print(z)
z = x - y
print(z)
z = x * y
print(z)

tensor([[2., 2.],
[2., 2.],
[2., 2.]])
tensor([[0., 0.],
[0., 0.],
[0., 0.]])
tensor([[1., 1.],
[1., 1.],
[1., 1.]])

z = y.add(x)
print(z)
print(y)

tensor([[2., 2.],
[2., 2.],
[2., 2.]])
tensor([[1., 1.],
[1., 1.],
[1., 1.]])

z = y.add_(x)
print(z)
print(y)

tensor([[2., 2.],
[2., 2.],
[2., 2.]])
tensor([[2., 2.],
[2., 2.],
[2., 2.]])

keyboard_arrow_down Numpy <> PyTorch


x_np = x.numpy()
print(type(x), type(x_np))
print(x_np)

<class 'torch.Tensor'> <class 'numpy.ndarray'>


[[1. 1.]
[1. 1.]
[1. 1.]]

a = np.random.randn(5)
print(a)
a_pt = torch.from_numpy(a)
print(type(a), type(a_pt))
print(a_pt)

[-1.07924499 -0.44339283 1.61918516 -0.01045001 -0.15773694]


<class 'numpy.ndarray'> <class 'torch.Tensor'>
tensor([-1.0792, -0.4434, 1.6192, -0.0105, -0.1577], dtype=torch.float64)

np.add(a, 1, out=a)
print(a)
print(a_pt)
a=a+1
print(a)
print(a_pt)

[-0.07924499 0.55660717 2.61918516 0.98954999 0.84226306]


tensor([-0.0792, 0.5566, 2.6192, 0.9895, 0.8423], dtype=torch.float64)
[0.92075501 1.55660717 3.61918516 1.98954999 1.84226306]
tensor([-0.0792, 0.5566, 2.6192, 0.9895, 0.8423], dtype=torch.float64)

%%time
for i in range(1000):
a = np.random.randn(100,100)
b = np.random.randn(100,100)
c = np.matmul(a, b)

CPU times: user 696 ms, sys: 773 µs, total: 697 ms
Wall time: 697 ms

%%time
for i in range(1000):
a = torch.randn([100, 100])
b = torch.randn([100, 100])
c = torch.matmul(a, b)

CPU times: user 308 ms, sys: 0 ns, total: 308 ms


Wall time: 329 ms
%%time
for i in range(10):
a = np.random.randn(10000,10000)
b = np.random.randn(10000,10000)
c = a + b

CPU times: user 54.6 s, sys: 4.41 s, total: 59 s


Wall time: 59 s

%%time
for i in range(10):
a = torch.randn([10000, 10000])
b = torch.randn([10000, 10000])
c = a + b

CPU times: user 14.6 s, sys: 6.47 s, total: 21 s


Wall time: 20.1 s

keyboard_arrow_down CUDA support


print(torch.cuda.device_count())

print(torch.cuda.device(0))
print(torch.cuda.get_device_name(0))

<torch.cuda.device object at 0x7dc9537a4c70>


Tesla T4

CUDA (Compute Unified Device Architecture) is a parallel computing platform and application
programming interface (API) model created by Nvidia. ... When CUDA was first introduced by
Nvidia, the name was an acronym for Compute Unified Device Architecture, but Nvidia
subsequently dropped the common use of the acronym.

cuda0 = torch.device('cuda:0') # 0 represents the number of cuda devies


#The torch.device contains a device type ('cpu' or 'cuda') and optional device ordinal f
# not present, this object will always represent the current device for the device type
#is called; e.g., a torch.Tensor constructed with device 'cuda' is
#equivalent to 'cuda:X' where X is the result of torch.cuda.current_device()

a = torch.ones(3, 2, device=cuda0)
b = torch.ones(3, 2, device=cuda0)
c = a + b
print(c)

tensor([[2., 2.],
[2., 2.],
[2., 2.]], device='cuda:0')

print(a)
tensor([[1., 1.],
[1., 1.],
[1., 1.]], device='cuda:0')

%%time
for i in range(10):
a = np.random.randn(10000,10000)
b = np.random.randn(10000,10000)
np.add(b, a)

CPU times: user 53.5 s, sys: 4.31 s, total: 57.8 s


Wall time: 57.8 s

%%time
for i in range(10):
a_cpu = torch.randn([10000, 10000])
b_cpu = torch.randn([10000, 10000])
b_cpu.add_(a_cpu)

CPU times: user 14.5 s, sys: 4.2 s, total: 18.7 s


Wall time: 18 s

%%time
for i in range(10):
a = torch.randn([10000, 10000], device=cuda0)
b = torch.randn([10000, 10000], device=cuda0)
b.add_(a)

CPU times: user 2.18 ms, sys: 3.89 ms, total: 6.07 ms
Wall time: 16.2 ms

%%time
for i in range(10):
a = np.random.randn(10000,10000)
b = np.random.randn(10000,10000)
np.matmul(b, a)

CPU times: user 12min 58s, sys: 16.7 s, total: 13min 15s
Wall time: 8min 10s

%%time
for i in range(10):
a_cpu = torch.randn([10000, 10000])
b_cpu = torch.randn([10000, 10000])
torch.matmul(a_cpu, b_cpu)

CPU times: user 3min 27s, sys: 8.23 s, total: 3min 35s
Wall time: 3min 34s

%%time
for i in range(10):
a = torch.randn([10000, 10000], device=cuda0)
b = torch.randn([10000, 10000], device=cuda0)
torch.matmul(a, b)

CPU times: user 25.6 ms, sys: 16.8 ms, total: 42.4 ms
Wall time: 118 ms
keyboard_arrow_down Autodiff
torch.tensor() or torch.ones r rand take few parameters data: device: data (array_like) – The
returned Tensor copies data.

dtype (torch.dtype, optional) – the desired type of returned tensor. Default: if None, same
torch.dtype as this tensor.

device (torch.device, optional) – the desired device of returned tensor. Default: if None, same
torch.device as this tensor.

requires_grad (bool, optional) – If autograd should record operations on the returned tensor.
Default: False.

IMportant points:

1. Given the image network ill learn the representations from the image. Learning of the
networking is done bby adjusting the weights and the bias to the actual data which
captures how important that features are.
2. In forward pass weights are added to inputs and apply some non linear function function
on that and the output again feeded to another layer
3. Through back propogation we want to find the gradient of the loss function n update the
weights accordingly
4. The loss with respect to the weight is computed using the chain rule by considering all the
multiple paths that are influenced by these weights.
5. Using tensors and with auto grad feature of pytorch computing these chain makes very
simpler.

Double-click (or enter) to edit

x = torch.ones([3, 2], requires_grad=True) # creates torch object. the requries_grad att


print(x) # in the output requies _ grad is set to true. means that gradient of this x ne

tensor([[1., 1.],
[1., 1.],
[1., 1.]], requires_grad=True)

y = x + 5 # y is linear expression of torch object


print(y) # here we are trying build the forward pass taking input x and performing som
# in the output gran_fn will actually tell that function is differential functionn and c
# the pytorch actually doing some bookeeping means that it is building the computationa

tensor([[6., 6.],
[6., 6.],
[6., 6.]], grad_fn=<AddBackward0>)

z = y*y + 1 # again one more forward operation take the input y and compute z.
# output observe that this function is also a differential function with respect to y.
#pytorch provides the facility of computing the gradients of these function by building
print(z)

tensor([[37., 37.],
[37., 37.],
[37., 37.]], grad_fn=<AddBackward0>)

t = torch.sum(z) # this is important here grad is created only for the scalar outputs
# should give a single output value
print(t)

tensor(222., grad_fn=<SumBackward0>)

t.backward() # now i want to find the gradient of t wrt x


# this can be simply done by calling the backward function on t and the gradient of x c

print(x.grad)

tensor([[12., 12.],
[12., 12.],
[12., 12.]])

2
t = ∑ zi , zi = y + 1, yi = xi + 5
i i

∂zi ∂zi ∂y
∂t i
= = = 2yi × 1
∂xi ∂xi ∂yi ∂xi

∂t
At x = 1, y = 6, = 12
∂xi

x = torch.ones([3, 2], requires_grad=True)


y = x + 5
r = 1/(1 + torch.exp(-y))
print(r)
s = torch.sum(r)
s.backward()
print(x.grad)

tensor([[0.9975, 0.9975],
[0.9975, 0.9975],
[0.9975, 0.9975]], grad_fn=<MulBackward0>)
tensor([[0.0025, 0.0025],
[0.0025, 0.0025],
[0.0025, 0.0025]])

x = torch.ones([3, 2], requires_grad=True)


y = x + 5
r = 1/(1 + torch.exp(-y))
a = torch.ones([3, 2])
r.backward(a)
print(x.grad)
tensor([[0.0025, 0.0025],
[0.0025, 0.0025],
[0.0025, 0.0025]])

∂s ∂s ∂r
= ⋅
∂x ∂r ∂x

∂s ∂s
For the above code a represents and then x. grad gives directly
∂r ∂x

keyboard_arrow_down Autodiff example


x = torch.randn([20, 1], requires_grad=True)
# take an example how we do for neural networsk
# x requires gradient that means we need to compute the gradient of x
# y is linear eqaution in the form of y = mx+c and this is the true relation
y = 3*x - 2

w = torch.tensor([1.], requires_grad=True)
b = torch.tensor([1.], requires_grad=True)

y_hat = w*x + b

loss = torch.sum((y_hat - y)**2) # loss sum of mean sqaure errored loss

print(loss)

tensor(211.0147, grad_fn=<SumBackward0>)

loss.backward()

print(w.grad, b.grad)# not getting the values properly because learning is not happened

tensor([-52.4902]) tensor([105.6830])

keyboard_arrow_down Do it in a loop
learning_rate = 0.01

w = torch.tensor([1.], requires_grad=True)
b = torch.tensor([1.], requires_grad=True)

print(w.item(), b.item()) # to print the scalar value of the w tensor and b tensor

for i in range(10):

x = torch.randn([20, 1]) # input data


y = 3*x - 2 # true function

y_hat = w*x + b # model


loss = torch.sum((y_hat - y)**2) # loss

loss.backward() # find gradient of loss function with w and b


with torch.no_grad():
# the weight update rules or not part of the forward pass and they are not involved i
# to stop book keeping or to stop the building the computational graph these weight up

w -= learning_rate * w.grad
b -= learning_rate * b.grad

w.grad.zero_() # set the grads to zero and iterate for some iterations
b.grad.zero_()

print(w.item(), b.item()) # item on torch object is used to print its scalar value

# output converges to 3 and -2

1.0 1.0
1.8122749328613281 -0.23074424266815186
2.006035327911377 -0.9175978899002075
2.6289658546447754 -1.311774730682373
2.785614013671875 -1.601137399673462
2.916221857070923 -1.7670209407806396
2.9244179725646973 -1.8520290851593018
2.9346094131469727 -1.9059644937515259
2.9583282470703125 -1.9466989040374756
2.9707868099212646 -1.966143012046814
2.980755567550659 -1.9801193475723267

keyboard_arrow_down Do it for a large problem


%%time
# to know the computation time we use time at beginning that is linux command. this VM i
# linux commands as well

learning_rate = 0.001
N = 10000000
epochs = 200

w = torch.rand([N], requires_grad=True)
b = torch.ones([1], requires_grad=True)

# print(torch.mean(w).item(), b.item())

for i in range(epochs):

x = torch.randn([N])
y = torch.dot(3*torch.ones([N]), x) - 2

y_hat = torch.dot(w, x) + b
loss = torch.sum((y_hat - y)**2)

loss.backward()

with torch.no_grad():
w -= learning_rate * w.grad
b -= learning_rate * b.grad

w.grad.zero_()
b.grad.zero_()
# print(torch.mean(w).item(), b.item())

CPU times: user 26.2 s, sys: 22.5 s, total: 48.8 s


Wall time: 48 s

%%time
learning_rate = 0.001
N = 10000000
epochs = 200

w = torch.rand([N], requires_grad=True, device=cuda0) # which parameters we neeed to com


#by setting the value for decive attribute. it takes a string "CUDA" or "CPU" or object
b = torch.ones([1], requires_grad=True, device=cuda0)

# print(torch.mean(w).item(), b.item())

for i in range(epochs):

x = torch.randn([N], device=cuda0)
y = torch.dot(3*torch.ones([N], device=cuda0), x) - 2

y_hat = torch.dot(w, x) + b
loss = torch.sum((y_hat - y)**2)

loss.backward()

with torch.no_grad():
w -= learning_rate * w.grad
b -= learning_rate * b.grad

w.grad.zero_()
b.grad.zero_()

#print(torch.mean(w).item(), b.item())

CPU times: user 319 ms, sys: 259 ms, total: 578 ms
Wall time: 644 ms

You might also like