0% found this document useful (0 votes)
11 views17 pages

04 Pytorch Custom Datasets

The document provides an overview of using custom datasets with PyTorch, including how to load, prepare, and visualize data for model training. It discusses various PyTorch domain libraries for different data types and outlines methods for improving model performance through data augmentation and transfer learning. Additionally, it highlights key considerations for predicting on custom data, ensuring the model is properly set up for inference.

Uploaded by

amoghpanthangi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views17 pages

04 Pytorch Custom Datasets

The document provides an overview of using custom datasets with PyTorch, including how to load, prepare, and visualize data for model training. It discusses various PyTorch domain libraries for different data types and outlines methods for improving model performance through data augmentation and transfer learning. Additionally, it highlights key considerations for predicting on custom data, ensuring the model is properly set up for inference.

Uploaded by

amoghpanthangi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Custom datasets

with
Where can you get help?
“If in doubt, run the code”

• Follow along with the code


• Try it for yourself
• Press SHIFT + CMD + SPACE to read the docstring
• Search for it
• Try again
• Ask

https://www.github.com/mrdbourke/pytorch-deep-learning/discussions
“What is a custom dataset?”
“I’ve got my own dataset, can I build a model with PyTorch to predict on it?”

Yes.
PyTorch Domain Libraries
“Are these reviews positive or negative?”
“Is this a photo of pizza, steak or sushi?”
⭐⭐⭐⭐⭐

⭐⭐⭐⭐⭐

TorchVision
TorchText
“What song is playing?”
“How do we recommend similar products?”

Different domain
libraries contain data
loading functions for
different data sources
TorchAudio TorchRec
Source: movielens.org
PyTorch Domain Libraries
Problem Space Pre-built Datasets and Fuctions

Vision torchvision.datasets

Text torchtext.datasets

Audio torchaudio.datasets

Recommendation system torchrec.datasets

Bonus TorchData*

*TorchData contains many di erent helper functions for loading data and is currently in beta as of April 2022.
ff
What we’re going to build
FoodVision Mini

🍕
Load data Build a model Predict with the model

We’re going to write code to


load images of food (our own
custom dataset for
FoodVision Mini)
torchvision.transforms
torch.utils.data.Dataset
torch.save
torch.utils.data.DataLoader torchmetrics torch.load

torch.optim torch.nn torch.utils.tensorboard


torch.nn.Module
torchvision.models

See more: https://pytorch.org/tutorials/beginner/ptcheat.html


What we’re going to cover
(broadly)
• Getting a custom dataset with PyTorch

• Becoming one with the data (preparing and visualising)

• Transforming data for use with a model

• Loading custom data with pre-built functions and custom functions

• Building FoodVision Mini to classify 🍕🥩🍣 images

• Comparing models with and without data augmentation

• Making predictions on custom data

👩🍳 👩🔬
(we’ll be cooking up lots of code!)

How:
Let’s code!
Source: @mrdbourke Twitter
Standard image classification data format

Your own data format


will depend on what
you’re working

The premi
se remain
write code s:
to get you
data into r
tensors fo
use with P r
yTorch
What is data augmentation?
Looking at the same image but from di erent perspective(s)*. To arti cially
increase the diversity of a dataset.

Original Rotate Shift Zoom

*Note: There are many more di erent kinds of data augmentation such as, cropping, replacing, shearing. This slide only demonstrates a few.
ff
ff
fi
PyTorch State of the Art Recipe

Research comes out often on


how best to train models,
state-of-the-art (SOTA)
methods are always
changing).

Source: Training state-of-the-art computer vision models with torchvision


from the PyTorch blog.
Loss curves (a way to evaluate your
model’s performance over
time)

*There are more combinations of these, to see them check out Google’s Interpreting Loss Curves guide.
Dealing with overfitting
Method to improve a model
What does it do?
(reduce over tting)
Gives a model more of a chance to learn patterns between samples (e.g. if a model is performing poorly on
Get more data images of pizza, show it more images of pizza).
Increase the diversity of your training dataset without collecting more data (e.g. take your photos of pizza
Data augmentation and randomly rotate them 30°). Increased diversity forces a model to learn more generalisation patterns.
Not all data samples are created equally. Removing poor samples from or adding better samples to your
Better data dataset can improve your model’s performance.

Take a model’s pre-learned patterns from one problem and tweak them to suit your own problem. For
Use transfer learning example, take a model trained on pictures of cars to recognise pictures of trucks.

If the current model is already over tting the training data, it may be too complicated of a model. This
means it's learning the patterns of the data too well and isn't able to generalize well to unseen data. One
Simplify your model way to simplify a model is to reduce the number of layers it uses or to reduce the number of hidden units in
each layer.

The idea here is to slowly decrease the learning rate as a model trains. This is akin to reaching for a coin at
Use learning rate decay the back of a couch. The closer you get, the smaller your steps. The same with the learning rate, the closer
you get to convergence, the smaller you'll want your weight updates to be.

Early stopping stops model training *before* it begins to over t. As in, say the model's loss has stopped
Use early stopping decreasing for the past 10 epochs (this number is arbitrary), you may want to stop the model training here
and go with the model weights that had the lowest loss (10 epochs prior).
fi
fi
fi
Dealing with underfitting
Method to improve a model
What does it do?
(reduce under tting)
If your model is under tting, it may not have enough capability to *learn* the required
patterns/weights/representations of the data to be predictive. One way to add more
Add more layers/units to your model
predictive power to your model is to increase the number of hidden layers/units within
those layers.

Perhaps your model's learning rate is too high to begin with. And it's trying to update its
Tweak the learning rate weights each epoch too much, in turn not learning anything. In this case, you might lower
the learning rate and see what happens.

Sometimes a model just needs more time to learn representations of data. If you nd in
Train for longer your smaller experiments your model isn't learning anything, perhaps leaving it train for a
more epochs may result in better performance.

Take a model’s pre-learned patterns from one problem and tweak them to suit your own
Use transfer learning problem. For example, take a model trained on pictures of cars to recognise pictures of
trucks.

Perhaps your model is under tting because you're trying to prevent over tting too much.
Use less regularization
Holding back on regularization techniques can help your model t the data better.
fi
fi
fi
fi
fi
fi
Predicting on custom data (3 thin g s to m a k e s ure of … )

Is the model on the GPU?


Original
,
44…]
Custom image , 0 .62,
0 .
7…],
Shape = [64, 64, 3]
. 31 0. 2
[[0 0.03, …],
2, 0. 07
[0.9 0.78,
5,
[0.2
…,
Add batch dimension & rearrange if needed
[[0.31, 0.62, 0.44…],
[0.92, 0.03, 0.27…],
[0.25, 0.78, 0.07…],
…, Shape = [None, 64, 64, 3] (NHWC)
torch.float32 Shape = [None, 3, 64, 64] (NCHW)
Same as model input

2. Data on same 3. Data in


1. Data in right datatype
device as model correct shape

You might also like