Lab Manual ML
Lab Manual ML
Lab Manual ML
Enrollment No.: - 09
Ayush Dumka
BSc. LL.B
1
101FLBSBL2122009
Machine Learning
2
101FLBSBL2122009
Machine Learning
3
101FLBSBL2122009
Machine Learning
4
101FLBSBL2122009
Machine Learning
5
101FLBSBL2122009
Machine Learning
6
101FLBSBL2122009
Machine Learning
7
101FLBSBL2122009
Machine Learning
8
101FLBSBL2122009
Machine Learning
9
101FLBSBL2122009
Machine Learning
10
101FLBSBL2122009
Machine Learning
11
101FLBSBL2122009
Machine Learning
iv. Variables: Special tensors that maintain state across executions, essential for
machine learning models.
v. Operations (Ops): Functions that perform computations on tensors, such as
addition, multiplication, etc.
vi. Layers and Models: High-level abstractions for building neural networks,
including layers, models, and loss functions.
vii. Estimators: High-level API for training and evaluating machine learning
models, simplifying the implementation process.
viii. Data Pipeline: Tools for loading, preprocessing, and feeding data into models,
including datasets and iterators.
13
101FLBSBL2122009
Machine Learning
14
101FLBSBL2122009
Machine Learning
Loss Functions:
Optimizers:
15
101FLBSBL2122009
Machine Learning
16
101FLBSBL2122009
Machine Learning
17
101FLBSBL2122009
Machine Learning
11 TIME SERIES: -
18
101FLBSBL2122009
Machine Learning
19
101FLBSBL2122009
Machine Learning
12 SCALAR: -
20
101FLBSBL2122009
Machine Learning
DISTRIBUTIONS: -
GRAPHS: -
21
101FLBSBL2122009
Machine Learning
22
101FLBSBL2122009
Machine Learning
23
101FLBSBL2122009
Machine Learning
24
101FLBSBL2122009
Machine Learning
25
101FLBSBL2122009
Machine Learning
26
101FLBSBL2122009
Machine Learning
Tensor Operations: -
Arithmetic Operations:
torch.add(a, b): Element-wise addition of two tensors.
torch.sub(a, b): Element-wise subtraction of two tensors.
torch.mul(a, b): Element-wise multiplication of two tensors.
torch.div(a, b): Element-wise division of two tensors.
Matrix Operations:
torch.matmul(a, b): Matrix multiplication of two tensors.
27
101FLBSBL2122009
Machine Learning
Reduction Operations:
torch.sum(tensor): Computes the sum of all elements in the tensor.
torch.mean(tensor): Computes the mean (average) of all elements in the tensor.
torch.std(tensor): Computes the standard deviation of all elements in the tensor.
torch.var(tensor): Computes the variance of all elements in the tensor.
torch.min(tensor): Returns the minimum value in the tensor.
torch.max(tensor): Returns the maximum value in the tensor.
Comparison Operations:
torch.eq(a, b): Element-wise equality comparison between two tensors.
torch.ne(a, b): Element-wise inequality comparison between two tensors.
torch.gt(a, b): Element-wise "greater than" comparison between two tensors.
torch.ge(a, b): Element-wise "greater than or equal" comparison between two
tensors.
torch.lt(a, b): Element-wise "less than" comparison between two tensors.
torch.le(a, b): Element-wise "less than or equal" comparison between two
tensors.
Reshaping Operations:
torch.view(tensor, shape): Returns a new tensor with the same data as the
original tensor but with a different shape.
torch.reshape(tensor, shape): Similar to view(), but can handle more flexible
reshaping.
torch.transpose(tensor, dim0, dim1): Swaps the specified dimensions of the
tensor.
torch.flatten(tensor): Flattens the input tensor into a one-dimensional tensor.
28
101FLBSBL2122009
Machine Learning
Loss Functions:
torch.nn.MSELoss(): Mean Squared Error Loss, often used for regression tasks.
torch.nn.CrossEntropyLoss(): Cross-entropy loss, commonly used for
classification tasks.
torch.nn.NLLLoss(): Negative Log Likelihood Loss, used for classification tasks
often in conjunction with log_softmax.
torch.nn.BCELoss(): Binary Cross Entropy Loss, used for binary classification
tasks.
torch.nn.BCEWithLogitsLoss(): Combines a sigmoid layer and the BCELoss in
one single class. This is numerically more stable than using a plain Sigmoid
followed by a BCELoss.
29
101FLBSBL2122009
Machine Learning
Optimizers:
torch.optim.SGD(params, lr): Stochastic Gradient Descent optimizer.
torch.optim.Adam(params, lr): Adam optimizer, which is an adaptive learning
rate optimization algorithm that's been designed specifically for training deep
neural networks.
Utility Modules:
torch.nn.Module(): Base class for all neural network modules. Your models
should subclass this class.
torch.nn.Sequential(*args): A sequential container. Modules will be added to it
in the order they are passed in the constructor. A Sequential module runs its
registered modules in sequence.
torch.nn.DataParallel(module): Implements data parallelism at the module level
which can be useful for distributing computations across multiple GPUs.
30
101FLBSBL2122009
Machine Learning
Implement neural networks using PyTorch with 4 and 3 hidden layers, adjusting the
architecture to observe how varying the number of layers impacts model accuracy,
thereby assessing the trade-off between model complexity and performance.
31
101FLBSBL2122009
Machine Learning
32
101FLBSBL2122009
Machine Learning
33
101FLBSBL2122009
Machine Learning
Output: -
34
101FLBSBL2122009
Machine Learning
35
101FLBSBL2122009
Machine Learning
36
101FLBSBL2122009
Machine Learning
37
101FLBSBL2122009
Machine Learning
9. Tune Hyperparameters
Grid Search/Random Search: Use methods like grid search or random
search to find the best hyperparameters.
Cross-Validation: Perform cross-validation to ensure the model
generalizes well to unseen data.
38
101FLBSBL2122009
Machine Learning
shuffling, and parallel data loading, making the data feeding process efficient and
easy to use. Here are some key aspects of the DataLoader and why it is useful
1. Efficiency:
o By handling batching, shuffling, and parallel loading, the DataLoader makes
data loading more efficient. This helps in utilizing the GPU/CPU resources
effectively, leading to faster training times.
2. Simplifies Code:
39
101FLBSBL2122009
Machine Learning
o The DataLoader simplifies the code needed for data handling. Instead of
writing custom loops for batching and shuffling, you can rely on the
DataLoader to handle these tasks, making your code cleaner and easier to
maintain.
3. Flexibility:
o The ability to use custom datasets and apply a variety of transformations
makes the DataLoader highly flexible. This is particularly useful in research
and development, where datasets and preprocessing steps can vary
significantly.
4. Consistency:
o Using a DataLoader ensures that the data feeding process is consistent and
reproducible. This is important for experiments where you need to ensure that
the data pipeline does not introduce variability.
40
101FLBSBL2122009
Machine Learning
Data Transformations: -
Data transformations are used to preprocess and augment data on-the-fly while
loading it. PyTorch provides the torchvision.transforms module with a variety of
built-in transformations for images.
41
101FLBSBL2122009
Machine Learning
42
101FLBSBL2122009
Machine Learning
3. Efficiency
Optimized Pipelines: TFDS is designed to work efficiently with
TensorFlow’s tf.data API, enabling the creation of performant and
scalable data pipelines.
Caching and Shuffling: Built-in support for caching, shuffling, and
other common data preprocessing steps, which can significantly speed
up model training.
4. Reproducibility
Versioning: Datasets in TFDS are versioned, ensuring that the same
dataset can be reliably used in different experiments, which is crucial for
reproducibility in research.
5. Extensibility
Custom Datasets: Users can create and integrate their own datasets with
TFDS, allowing for flexible extension and use of custom data sources.
The Natural Language Toolkit (NLTK) is a powerful library for working with
human language data (text) in Python. Here's an explanation of the main features and
functions you listed:
Tokenization
Tokenization is the process of splitting text into smaller parts, such as words or
sentences.
43
101FLBSBL2122009
Machine Learning
Stemming: -
Stemming reduces words to their root form by removing suffixes. It helps in
normalizing text.
nltk.stem.PorterStemmer: Uses the Porter stemming algorithm, which is a
common stemming technique.
Lemmatization: -
Lemmatization reduces words to their base or dictionary form (lemma), considering
the context.
nltk.stem.WordNetLemmatizer: Uses WordNet, a lexical database, for
lemmatization.
44
101FLBSBL2122009
Machine Learning
POS Tagging: -
Part-of-speech (POS) tagging assigns word types (e.g., noun, verb) to each word in a
text.
nltk.pos_tag(): Tags each word with its part of speech.
Stopwords: -
Stopwords are common words (e.g., "the", "and") that are often removed in text
processing.
nltk.corpus.stopwords.words(): Provides lists of stopwords for different
languages.
Text Similarity: -
Measures how similar two pieces of text are.
nltk.edit_distance(): Calculates the edit distance (Levenshtein distance)
between two strings.
Frequency Distributions: -
Frequency distributions show the frequency of each item in a dataset.
45
101FLBSBL2122009
Machine Learning
Text Classification: -
Text classification assigns predefined categories to text.
nltk.NaiveBayesClassifier: Implements a Naive Bayes classifier, a simple
probabilistic classifier based on Bayes' theorem.
21. Tokenization.
Tokenization is the process of converting text into individual units called tokens, such
as words, subwords, or characters, essential for preparing textual data for machine
learning models. Practically, this involves splitting sentences into tokens using
libraries like NLTK or spaCy for natural language processing tasks.
46
101FLBSBL2122009
Machine Learning
47
101FLBSBL2122009
Machine Learning
48
101FLBSBL2122009
Machine Learning
49
101FLBSBL2122009
Machine Learning
50
101FLBSBL2122009
Machine Learning
51
101FLBSBL2122009
Machine Learning
Applications of TensorBoard:
Monitoring Metrics: Visualize metrics like loss and accuracy during model training
to track performance over time.
Model Graph Visualization: Display the computational graph of the model to
understand its structure and architecture.
Histograms and Distributions: Examine histograms of weights, biases, and other
tensors to analyze their distributions and detect potential issues.
Embedding Visualizations: Project high-dimensional embeddings to 2D or 3D space
to explore and understand the data representations.
Hyperparameter Tuning: Compare different runs and experiments to find the best
hyperparameter settings.
Debugging: Identify and diagnose issues in the model's training process by
visualizing gradients, weights, and other metrics.
52
101FLBSBL2122009