0% found this document useful (0 votes)
5 views52 pages

Lab Manual ML

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 52

Machine Learning

Machine Learning Practical

Enrollment No.: - 09

Ayush Dumka

BSc. LL.B

1
101FLBSBL2122009
Machine Learning

1. Build a naive bayes classifier to predict the type of crime based


on the given features.
To build a Naive Bayes classifier for predicting crime types, utilize a probabilistic
classification algorithm that assumes independence between features, calculating the
posterior probability of each crime type given the observed features using Bayes'
theorem, enabling classification based on the highest probability.

2
101FLBSBL2122009
Machine Learning

3
101FLBSBL2122009
Machine Learning

2. Build a decision tree to predict the type of crime based on the


given features.
To construct a decision tree for crime prediction, employ a hierarchical tree structure
where each internal node represents a feature, and each leaf node represents a class
label, recursively partitioning the data based on features to maximize information
gain, facilitating classification based on the decision paths.

4
101FLBSBL2122009
Machine Learning

5
101FLBSBL2122009
Machine Learning

6
101FLBSBL2122009
Machine Learning

7
101FLBSBL2122009
Machine Learning

8
101FLBSBL2122009
Machine Learning

3. Build a k means clustering on housing data.


To implement k-means clustering on housing data, use an unsupervised machine
learning algorithm that partitions the data into k clusters based on similarities in
feature space, iteratively updating cluster centroids to minimize the within-cluster
sum of squares, enabling segmentation of housing data into distinct groups for
analysis or prediction.

9
101FLBSBL2122009
Machine Learning

10
101FLBSBL2122009
Machine Learning

4. Implement tensor flow basic program.

5. What is tensor? Identify parameters of tensor.

11
101FLBSBL2122009
Machine Learning

A tensor is a mathematical object used to represent multidimensional data and their


interrelationships. It generalizes the concepts of scalars and vectors to higher
dimensions and can be used to model complex interactions between multiple
variables. Tensors are crucial in various fields such as physics, engineering, computer
science, and machine learning due to their ability to capture and manipulate
multidimensional information.
Parameters of a Tensor
The parameters that define a tensor are:
(a) Rank (or Order): This refers to the number of dimensions or modes of a tensor. It
indicates how many indices are needed to specify an element of the tensor.
Examples: A scalar (single number) is a zero-rank tensor, A vector (one-
dimensional array) is a first-rank tensor, A matrix (two-dimensional array) is a
second-rank tensor, Higher-rank tensors can represent more complex data
structures.
(b) Shape: This denotes the size of each dimension. It is represented as a tuple of
integers. Examples: A vector with 5 elements has the shape (5,), A matrix with 3
rows and 4 columns has the shape (3, 4), A 3D tensor with dimensions 2x3x4 has
the shape (2, 3, 4).
(c) Components: These are the individual elements within the tensor. The total
number of components in a tensor is the product of the dimensions specified by its
shape. Example: A 2x2 matrix has 4 components.
(d) Basis: This is a reference system used to define the components of the tensor. In a
given basis, the tensor's components are expressed relative to this reference frame.
Example: For a vector in 3D space, the basis could be the standard unit vectors
along the x, y, and z axes.

6. What are the main components of tensorflow?


The main components of TensorFlow include:
i. Tensors: The primary data structure, representing multidimensional arrays.
ii. Graphs: Dataflow graphs represent computations, where nodes are operations
and edges are tensors.
iii. Sessions: Environments for executing graphs, allowing for the evaluation of
operations and tensors.
12
101FLBSBL2122009
Machine Learning

iv. Variables: Special tensors that maintain state across executions, essential for
machine learning models.
v. Operations (Ops): Functions that perform computations on tensors, such as
addition, multiplication, etc.
vi. Layers and Models: High-level abstractions for building neural networks,
including layers, models, and loss functions.
vii. Estimators: High-level API for training and evaluating machine learning
models, simplifying the implementation process.
viii. Data Pipeline: Tools for loading, preprocessing, and feeding data into models,
including datasets and iterators.

7. How do you create a constant tensor in tensorflow?


Creating a constant tensor in TensorFlow is straightforward using the tf. constant
function. This function allows you to define a tensor with fixed values that do not
change during the execution of the graph. Here's how you can create a constant
tensor:
Steps to Create a Constant Tensor: -
i. First, you need to import TensorFlow. Typically, TensorFlow is imported with
the alias tf.
ii. Use the tf.constant function to create the tensor. You can specify the values,
shape, and data type.
Code: -
import tensorflow as tf
# Create a scalar constant tensor
scalar = tf.constant(5)
print("Scalar Tensor:", scalar)

# Create a 1-D constant tensor (vector)


vector = tf.constant([1, 2, 3, 4, 5])
print("1-D Tensor (Vector):", vector)

# Create a 2-D constant tensor (matrix)


matrix = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("2-D Tensor (Matrix):", matrix)

13
101FLBSBL2122009
Machine Learning

# Create a 3-D constant tensor


tensor_3d = tf.constant([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("3-D Tensor:", tensor_3d)

# Specify data type


float_tensor = tf.constant([1.0, 2.0, 3.0], dtype=tf.float32)
print("Float Tensor:", float_tensor)

8. Explore tensorflow functions.


Tensor Creation and Manipulation:
 tf.constant(): Creates a constant, immutable tensor with fixed values. Useful
for defining static data that won't change during computation.
 tf.Variable(): Creates a trainable variable tensor, which can be modified during
training. Essential for building machine learning models with parameters that
learn from data.
 tf.convert_to_tensor(): Converts a Python object (like a list or NumPy array)
into a TensorFlow tensor. Makes various data types compatible with
TensorFlow operations.

Array and Tensor Operations:

 tf.reshape(): Reshapes a tensor into a new shape without changing its


elements. Useful for formatting tensors for specific operations.
 tf.concat(): Concatenates tensors along a given axis. Combines multiple
tensors into a single one.
 tf.slice(): Extracts a sub-tensor from a larger tensor. Selects specific portions
of a tensor for further operations.
 tf.split(): Splits a tensor into multiple smaller tensors along a given axis.
Divides a tensor into desired chunks.
 tf.gather(): Gathers elements from a tensor based on specified indices. Selects
elements at specific positions.
 tf.dtypes(): Returns the data type of a tensor. Provides information about the
type of data stored in a tensor (e.g., float32, int32).

Neural Network Functions:

14
101FLBSBL2122009
Machine Learning

 tf.keras.layers.Dense(): Creates a fully connected layer, a common building


block in neural networks that performs linear transformations.
 tf.keras.layers.Conv2D(): Creates a convolutional layer, essential for
processing spatial data like images. Applies filters to extract features.
 tf.keras.layers.MaxPooling2D(): Creates a max pooling layer, used in
convolutional neural networks to reduce dimensionality and capture
important features.
 tf.keras.layers.Flatten(): Flattens a tensor, typically used to convert a
multidimensional tensor (like an image) into a one-dimensional vector before
feeding it into a fully connected layer.
 tf.keras.Model(): Creates a computational model by composing layers.
Defines the overall architecture of your neural network.

Loss Functions:

 tf.keras.losses.MeanSquaredError(): Calculates the mean squared error


between predicted and actual values. A common loss function for regression
tasks.

Optimizers:

 tf.keras.optimizers.Adam(): Implements the Adam optimization algorithm, a


widely used method for updating model parameters during training based on
calculated gradients.

Additional TensorFlow Functions:

 tf.function: Decorates a Python function to convert it into a TensorFlow


graph for improved performance and potential deployment.
 tf.GradientTape: Creates a gradient tape to record operations for calculating
gradients, essential for training machine learning models.
 tf.summary and tf.TensorBoard: Tools for creating and visualizing
summaries (logs) of training and evaluation metrics for debugging and
monitoring.

9. Data handling and preprocessing

(a) Loading Data:

15
101FLBSBL2122009
Machine Learning

tf.data.Dataset.from_tensor_slices(): Creates a dataset from a tensor or a tuple of


tensors.
tf.data.TFRecordDataset(): Reads data records from TFRecord files.
tf.keras.preprocessing.image_dataset_from_directory(): Loads images from a
directory into a dataset, optionally performing preprocessing like resizing and
rescaling.
(b) Preprocessing:
tf.image.resize(): Resizes images to a specified size.
tf.image.random_flip_left_right(): Randomly flips images horizontally.
tf.image.random_flip_up_down(): Randomly flips images vertically.
tf.keras.preprocessing.image.ImageDataGenerator(): Generates batches of tensor
image data with real-time data augmentation.
(c) Text Preprocessing:
tf.keras.preprocessing.text.Tokenizer(): Tokenizes text documents into word-level or
character-level tokens.
(d) Training and Evaluation:
tf.keras.Model.compile(): Configures the model for training by specifying the
optimizer, loss function, and metrics.
tf.keras.Model.fit(): Trains the model on a dataset for a fixed number of epochs.
tf.keras.Model.evaluate(): Evaluates the model on a dataset.
(e) Saving Models:
tf.keras.Model.save(): Saves the entire model to a file, including the model
architecture, weights, and training configuration.
(f) TensorBoard:
tf.keras.callbacks.TensorBoard():Callback to log metrics and visualize training /
validation metrics using TensorBoard.

10. Tensor board exercise.


Practical 1 - To Train a neutral network model on the MNIST dataset of handwritten
digits and to log the training process using TensorBoard for visualization and
analysis.

16
101FLBSBL2122009
Machine Learning

17
101FLBSBL2122009
Machine Learning

11 TIME SERIES: -

18
101FLBSBL2122009
Machine Learning

19
101FLBSBL2122009
Machine Learning

12 SCALAR: -

20
101FLBSBL2122009
Machine Learning

DISTRIBUTIONS: -

GRAPHS: -

21
101FLBSBL2122009
Machine Learning

Practical 2 – Implement Tensorboard by using dataset “CIFAR-10 dataset”

22
101FLBSBL2122009
Machine Learning

11. Tensor flow image


TensorFlow for image processing is a versatile toolset offering pre-built models and
APIs, streamlining the development of image-based applications, including tasks like
image classification, object detection, and image generation, with robust
documentation and community support for practical implementation.

23
101FLBSBL2122009
Machine Learning

24
101FLBSBL2122009
Machine Learning

12. Tensor flow Exercise.

25
101FLBSBL2122009
Machine Learning

26
101FLBSBL2122009
Machine Learning

13. Tensor creation.


Following are the ways of Tensor Creation: -

 torch.tensor(data): Creates a tensor from a Python list or sequence. It can accept


data in various forms like lists, tuples, and NumPy arrays.
 torch.zeros(size): Creates a tensor filled with zeros, with the specified shape.
 torch.ones(size): Creates a tensor filled with ones, with the specified shape.
 torch.arange(start, end, step): Creates a tensor with values ranging from start to
end with a step size of step.
 torch.eye(n): Creates a 2-D tensor with ones on the diagonal and zeros
elsewhere, also known as an identity matrix.
 torch.empty(size): Creates an uninitialized tensor of the specified shape. The
values are whatever was in memory at the time, so this should be initialized
properly before use.
 torch.full(size, fill_value): Creates a tensor filled with the specified value and
shape.
 torch.rand(size): Creates a tensor filled with random numbers from a uniform
distribution on the interval [0, 1).
 torch.randint(low, high, size): Creates a tensor filled with random integers
between the specified low (inclusive) and high (exclusive) values.

Tensor Operations: -
Arithmetic Operations:
 torch.add(a, b): Element-wise addition of two tensors.
 torch.sub(a, b): Element-wise subtraction of two tensors.
 torch.mul(a, b): Element-wise multiplication of two tensors.
 torch.div(a, b): Element-wise division of two tensors.

Matrix Operations:
 torch.matmul(a, b): Matrix multiplication of two tensors.

27
101FLBSBL2122009
Machine Learning

Reduction Operations:
 torch.sum(tensor): Computes the sum of all elements in the tensor.
 torch.mean(tensor): Computes the mean (average) of all elements in the tensor.
 torch.std(tensor): Computes the standard deviation of all elements in the tensor.
 torch.var(tensor): Computes the variance of all elements in the tensor.
 torch.min(tensor): Returns the minimum value in the tensor.
 torch.max(tensor): Returns the maximum value in the tensor.

Comparison Operations:
 torch.eq(a, b): Element-wise equality comparison between two tensors.
 torch.ne(a, b): Element-wise inequality comparison between two tensors.
 torch.gt(a, b): Element-wise "greater than" comparison between two tensors.
 torch.ge(a, b): Element-wise "greater than or equal" comparison between two
tensors.
 torch.lt(a, b): Element-wise "less than" comparison between two tensors.
 torch.le(a, b): Element-wise "less than or equal" comparison between two
tensors.

Reshaping Operations:
 torch.view(tensor, shape): Returns a new tensor with the same data as the
original tensor but with a different shape.
 torch.reshape(tensor, shape): Similar to view(), but can handle more flexible
reshaping.
 torch.transpose(tensor, dim0, dim1): Swaps the specified dimensions of the
tensor.
 torch.flatten(tensor): Flattens the input tensor into a one-dimensional tensor.

Concatenation and Splitting:

 torch.cat(tensors, dim): Concatenates a sequence of tensors along the specified


dimension.
 torch.split(tensor, split_size_or_sections): Splits the tensor into chunks. If
split_size_or_sections is an integer, it splits the tensor into chunks of that size. If
it's a list, it splits the tensor into chunks along the specified sections.

28
101FLBSBL2122009
Machine Learning

Neural Network Modules


Layers:
 torch.nn.Linear(in_features, out_features): A linear transformation layer that
applies a linear transformation to the incoming data: y = x*W^T + b.
 torch.nn.Conv2d(in_channels, out_channels, kernel_size): A 2D convolution
layer for processing 2D data such as images.
 torch.nn.Conv1d(in_channels, out_channels, kernel_size): A 1D convolution
layer for processing 1D data such as time series.
 torch.nn.Conv3d(in_channels, out_channels, kernel_size): A 3D convolution
layer for processing 3D data such as volumetric data.
 torch.nn.RNN(input_size, hidden_size, num_layers): A recurrent neural
network layer that processes sequences of data.
 torch.nn.ReLU(): Applies the rectified linear unit function element-wise:
ReLU(x) = max(0, x).
 torch.nn.Sigmoid(): Applies the sigmoid function element-wise: Sigmoid(x) =
1 / (1 + exp(-x)).
 torch.nn.Tanh(): Applies the hyperbolic tangent function element-wise:
Tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)).

Loss Functions:
 torch.nn.MSELoss(): Mean Squared Error Loss, often used for regression tasks.
 torch.nn.CrossEntropyLoss(): Cross-entropy loss, commonly used for
classification tasks.
 torch.nn.NLLLoss(): Negative Log Likelihood Loss, used for classification tasks
often in conjunction with log_softmax.
 torch.nn.BCELoss(): Binary Cross Entropy Loss, used for binary classification
tasks.
 torch.nn.BCEWithLogitsLoss(): Combines a sigmoid layer and the BCELoss in
one single class. This is numerically more stable than using a plain Sigmoid
followed by a BCELoss.

29
101FLBSBL2122009
Machine Learning

Optimizers:
 torch.optim.SGD(params, lr): Stochastic Gradient Descent optimizer.
 torch.optim.Adam(params, lr): Adam optimizer, which is an adaptive learning
rate optimization algorithm that's been designed specifically for training deep
neural networks.

Utility Modules:
 torch.nn.Module(): Base class for all neural network modules. Your models
should subclass this class.
 torch.nn.Sequential(*args): A sequential container. Modules will be added to it
in the order they are passed in the constructor. A Sequential module runs its
registered modules in sequence.
 torch.nn.DataParallel(module): Implements data parallelism at the module level
which can be useful for distributing computations across multiple GPUs.

30
101FLBSBL2122009
Machine Learning

14. Implement neural network using pytorch with 4 and 3


hidden
layer. Compare the accuracy of the both model.

Implement neural networks using PyTorch with 4 and 3 hidden layers, adjusting the
architecture to observe how varying the number of layers impacts model accuracy,
thereby assessing the trade-off between model complexity and performance.

31
101FLBSBL2122009
Machine Learning

32
101FLBSBL2122009
Machine Learning

33
101FLBSBL2122009
Machine Learning

Output: -

34
101FLBSBL2122009
Machine Learning

15. What are the main differences between pytorch and


tensorflow?

S.No Pytorch TensorFlow

1 It was developed by Facebook It was developed by Google

It was deployed on Theano which


2 It was made using Torch library.
is a python library

It works on a dynamic graph


3 It believes on a static graph concept
concept

Its has a higher level functionality


Pytorch has fewer features as
4 and provides broad spectrum of
compared to Tensorflow.
choices to work on.

It has a major benefit that whole


Pytorch uses simple API which
5 graph could be saved as protocol
saves the entire weight of model.
buffer.

It is more supportive for embedded


It is comparatively less
6 and mobile deployments as
supportive in deployments.
compared to Pytorch

7 It is easy to learn and understand. It is comparatively hard to learn

It requires user to store Default settings are well-defined in


8
everything into the device. Tensorflow.

It has a dynamic computational


9 It requires the use of debugger tool.
process.

Some of its features or libraries


Some of its features or libraries
10 are: PYRO, Horizon, CheXNet,
are: Sonnet, Ludwig, Magenta, etc.
etc.

35
101FLBSBL2122009
Machine Learning

16. Explain the steps involved in building NN model.


Building a neural network model involves several key steps, from defining the
problem to deploying the model. Here's a detailed overview of the process:

1. Define the Problem


 Determine what data you need to achieve your objective and how you will
obtain it.

2. Collect and Prepare Data


 Data Collection: Gather the data from various sources. This could involve
scraping data, using public datasets, or collecting data through
experiments.
 Data Cleaning: Handle missing values, remove duplicates, and correct
errors in the data.
 Data Preprocessing: Normalize or standardize the data, convert
categorical data to numerical, and split the data into training, validation,
and test sets.
 Data Augmentation: Optionally, augment the data to artificially increase
the size of the dataset and improve model robustness (commonly used in
image processing).

3. Choose a Model Architecture


 Model Type: Decide on the type of neural network suitable for the task
(e.g., CNN for image data, RNN for sequential data).
 Layer Design: Choose the number and types of layers (e.g., fully
connected, convolutional, recurrent) and the number of neurons/filters in
each layer.
 Activation Functions: Select appropriate activation functions (e.g.,
ReLU, Sigmoid, Tanh) for the neurons.

4. Initialize the Model

36
101FLBSBL2122009
Machine Learning

 Weight Initialization: Initialize the weights of the neural network.


Common methods include random initialization, He initialization, and
Xavier initialization.
 Bias Initialization: Initialize the biases, often starting with zeros.

5. Compile the Model


 Loss Function: Choose a loss function appropriate for the task (e.g.,
cross-entropy loss for classification, mean squared error for regression).
 Optimizer: Select an optimizer to update the weights during training (e.g.,
SGD, Adam, RMSprop).
 Metrics: Decide on evaluation metrics to monitor during training (e.g.,
accuracy for classification tasks).

6. Train the Model


 Forward Pass: Pass the training data through the network to get
predictions.
 Loss Calculation: Compute the loss by comparing the predictions with
the ground truth.
 Backward Pass (Backpropagation): Calculate the gradients of the loss
with respect to the network parameters.
 Parameter Update: Update the network parameters using the optimizer.
 Epochs and Batches: Repeat the forward and backward passes for a
specified number of epochs, and use mini-batches of data to improve
training efficiency and stability.

7. Validate the Model


 Validation Set: Evaluate the model on a separate validation set after each
epoch to monitor its performance and tune hyperparameters.
 Early Stopping: Optionally use early stopping to prevent overfitting by
halting training when validation performance stops improving.

8. Evaluate the Model


 Test Set: Once training is complete, evaluate the final model on a test set
to assess its performance.

37
101FLBSBL2122009
Machine Learning

 Metrics Analysis: Analyze the performance metrics (accuracy,


precision, recall, F1-score, etc.) to understand the model's effectiveness.

9. Tune Hyperparameters
 Grid Search/Random Search: Use methods like grid search or random
search to find the best hyperparameters.
 Cross-Validation: Perform cross-validation to ensure the model
generalizes well to unseen data.

10. Deploy the Model


 Model Export: Save the trained model in a suitable format (e.g., HDF5,
ONNX).
 Integration: Integrate the model into the production environment, which
could involve embedding it into an application, setting up an API, or
deploying it on a server.
 Monitoring: Continuously monitor the model’s performance in the real
world and update it as necessary.

11. Maintain and Update the Model


 Retraining: Periodically retrain the model with new data to maintain
its performance.
 Performance Monitoring: Keep track of the model’s performance
over time to detect any degradation.
 Model Versioning: Manage different versions of the model to keep
track of improvements and changes.

17. What is a dataloader in pytorch, and why is it useful?


A DataLoader in PyTorch is a utility that provides an efficient and flexible way to
load data during the training and evaluation of neural networks. It abstracts the
complexity involved in data loading and provides features such as batching,

38
101FLBSBL2122009
Machine Learning

shuffling, and parallel data loading, making the data feeding process efficient and
easy to use. Here are some key aspects of the DataLoader and why it is useful

Key Features of DataLoader: -


1. Batching:
o The DataLoader can automatically divide the dataset into smaller batches, which
are fed into the neural network one at a time. This is essential for efficient
training, as it helps to make better use of GPU memory and can improve the
convergence of the training process.
2. Shuffling:
o Shuffling the dataset is important to ensure that the model does not learn the
order of the data, which can lead to overfitting. The DataLoader can shuffle the
data before each epoch, providing more randomness to the training process.
3. Parallel Data Loading:
o Loading large datasets can be a bottleneck if done serially. The DataLoader
supports multi-threaded data loading, which can significantly speed up the
process, especially when working with large datasets or complex data
transformations.
4. Customizable:
o You can define your own dataset by subclassing torch.utils.data.Dataset and
then use the DataLoader to load data from this custom dataset. This allows for
great flexibility in how data is prepared and fed into the model.
5. Transforms:
o The DataLoader works seamlessly with torchvision.transforms, which are
common data preprocessing steps like normalization, cropping, flipping, and
more. These transformations can be applied on-the-fly as data is loaded.

Why DataLoader is Useful?

1. Efficiency:
o By handling batching, shuffling, and parallel loading, the DataLoader makes
data loading more efficient. This helps in utilizing the GPU/CPU resources
effectively, leading to faster training times.
2. Simplifies Code:

39
101FLBSBL2122009
Machine Learning

o The DataLoader simplifies the code needed for data handling. Instead of
writing custom loops for batching and shuffling, you can rely on the
DataLoader to handle these tasks, making your code cleaner and easier to
maintain.
3. Flexibility:
o The ability to use custom datasets and apply a variety of transformations
makes the DataLoader highly flexible. This is particularly useful in research
and development, where datasets and preprocessing steps can vary
significantly.
4. Consistency:
o Using a DataLoader ensures that the data feeding process is consistent and
reproducible. This is important for experiments where you need to ensure that
the data pipeline does not introduce variability.

18. What are custom datasets and data transformations in


pytorch? How do you implement them?
In PyTorch, custom datasets and data transformations allow you to handle and
preprocess your data in flexible and powerful ways. Here's a detailed explanation of
both concepts and how to implement them:
Custom Datasets: -
PyTorch provides the Dataset class as an abstract base class for representing datasets.
You can create your own dataset by subclassing torch.utils.data.Dataset and
implementing the following methods:

1. __len__: Returns the size of the dataset.


2. __getitem__: Fetches a data sample for a given index.

Here’s an example of creating a custom dataset:

40
101FLBSBL2122009
Machine Learning

Data Transformations: -
Data transformations are used to preprocess and augment data on-the-fly while
loading it. PyTorch provides the torchvision.transforms module with a variety of
built-in transformations for images.

Here’s how to apply transformations:

19. What is TensorFlow Datasets (TFDS) and why is it useful?

41
101FLBSBL2122009
Machine Learning

TensorFlow Datasets (TFDS) is a collection of ready-to-use datasets and a library


for downloading, preparing, and loading these datasets in a standard format. It is
particularly useful for TensorFlow users but can also be utilized with other machine
learning frameworks. Here’s a detailed look at what TFDS is and why it’s beneficial:

What is TensorFlow Datasets (TFDS)?


1. A Collection of Datasets: A curated list of datasets covering a wide range of
domains such as image classification, natural language processing, object
detection, and more.
2. Standardized Interface: A consistent API to load and preprocess datasets,
ensuring a uniform format across different datasets.
3. Integration with TensorFlow: Seamless integration with TensorFlow’s
tf.data API, facilitating efficient data pipelines and preprocessing.

Why is TFDS Useful?


1. Ease of Access to Datasets
 Ready-to-Use: TFDS provides a large collection of datasets that are pre-
defined and easily accessible. Users don’t have to worry about finding,
downloading, and formatting data.
 One-Click Download and Preparation: With a single command, TFDS
handles the download, extraction, and preparation of datasets, saving time
and reducing boilerplate code.
2. Standardization
 Consistent Format: All datasets are provided in a standardized format
(typically as tf.data.Dataset objects), which simplifies data handling and
ensures compatibility with TensorFlow’s data processing tools.
 Metadata and Documentation: Each dataset comes with detailed
documentation and metadata, including information about the data splits,
feature descriptions, and citation details.

42
101FLBSBL2122009
Machine Learning

3. Efficiency
 Optimized Pipelines: TFDS is designed to work efficiently with
TensorFlow’s tf.data API, enabling the creation of performant and
scalable data pipelines.
 Caching and Shuffling: Built-in support for caching, shuffling, and
other common data preprocessing steps, which can significantly speed
up model training.
4. Reproducibility
 Versioning: Datasets in TFDS are versioned, ensuring that the same
dataset can be reliably used in different experiments, which is crucial for
reproducibility in research.
5. Extensibility
 Custom Datasets: Users can create and integrate their own datasets with
TFDS, allowing for flexible extension and use of custom data sources.

20. Explore the functions of NLP with examples.

The Natural Language Toolkit (NLTK) is a powerful library for working with
human language data (text) in Python. Here's an explanation of the main features and
functions you listed:
Tokenization
Tokenization is the process of splitting text into smaller parts, such as words or
sentences.

 nltk.word_tokenize(): This function breaks down a piece of text into individual


words.

43
101FLBSBL2122009
Machine Learning

 nltk.sent_tokenize(): This function splits text into sentences.

Stemming: -
Stemming reduces words to their root form by removing suffixes. It helps in
normalizing text.
 nltk.stem.PorterStemmer: Uses the Porter stemming algorithm, which is a
common stemming technique.

 nltk.stem.LancasterStemmer: Uses the Lancaster stemming algorithm, known


for being more aggressive.

 nltk.stem.SnowballStemmer: Supports multiple languages for stemming.

Lemmatization: -
Lemmatization reduces words to their base or dictionary form (lemma), considering
the context.
 nltk.stem.WordNetLemmatizer: Uses WordNet, a lexical database, for
lemmatization.

44
101FLBSBL2122009
Machine Learning

POS Tagging: -
Part-of-speech (POS) tagging assigns word types (e.g., noun, verb) to each word in a
text.
 nltk.pos_tag(): Tags each word with its part of speech.

Named Entity Recognition (NER): -


NER identifies and classifies named entities (e.g., people, organizations) in text.
 nltk.ne_chunk(): Identifies named entities in a text.

Stopwords: -
Stopwords are common words (e.g., "the", "and") that are often removed in text
processing.
 nltk.corpus.stopwords.words(): Provides lists of stopwords for different
languages.

Text Similarity: -
Measures how similar two pieces of text are.
 nltk.edit_distance(): Calculates the edit distance (Levenshtein distance)
between two strings.

Frequency Distributions: -
Frequency distributions show the frequency of each item in a dataset.

45
101FLBSBL2122009
Machine Learning

 nltk.FreqDist(): Computes the frequency distribution of items

Text Classification: -
Text classification assigns predefined categories to text.
 nltk.NaiveBayesClassifier: Implements a Naive Bayes classifier, a simple
probabilistic classifier based on Bayes' theorem.

21. Tokenization.
Tokenization is the process of converting text into individual units called tokens, such
as words, subwords, or characters, essential for preparing textual data for machine
learning models. Practically, this involves splitting sentences into tokens using
libraries like NLTK or spaCy for natural language processing tasks.

46
101FLBSBL2122009
Machine Learning

22. Implement sentiment analysis on the given dataset.


To implement sentiment analysis on a given dataset, start by preprocessing the text
data through tokenization, cleaning, and normalization steps. Use a labeled dataset to
train a machine learning model, such as a logistic regression, Naive Bayes, or a neural
network, on the sentiment labels (positive, negative, neutral).

47
101FLBSBL2122009
Machine Learning

23. NLP text processing exercise.


For NLP text processing, clean the text by removing punctuation and converting it to
lowercase, then tokenize, stem, or lemmatize the words. Finally, convert the tokens
into numerical features using methods like TF-IDF or word embeddings for further
analysis or modeling.

48
101FLBSBL2122009
Machine Learning

49
101FLBSBL2122009
Machine Learning

24. Information extraction.


Information extraction involves automatically retrieving specific, structured
information from unstructured text sources, such as identifying and categorizing
entities (names, dates, places), relationships, and events using techniques like named
entity recognition (NER) and relation extraction. This process is essential for
transforming vast amounts of textual data into organized, usable insights for various
applications.

50
101FLBSBL2122009
Machine Learning

25. Extractive summarization.


Extractive summarization involves creating a concise summary of a text by selecting
and extracting the most relevant sentences or phrases directly from the original
document. This technique relies on algorithms to identify key points and maintain the
original context and meaning, making it useful for quickly understanding the main
ideas without reading the entire text.

51
101FLBSBL2122009
Machine Learning

26. Explain the application of TensorBoard.

Applications of TensorBoard:
 Monitoring Metrics: Visualize metrics like loss and accuracy during model training
to track performance over time.
 Model Graph Visualization: Display the computational graph of the model to
understand its structure and architecture.
 Histograms and Distributions: Examine histograms of weights, biases, and other
tensors to analyze their distributions and detect potential issues.
 Embedding Visualizations: Project high-dimensional embeddings to 2D or 3D space
to explore and understand the data representations.
 Hyperparameter Tuning: Compare different runs and experiments to find the best
hyperparameter settings.
 Debugging: Identify and diagnose issues in the model's training process by
visualizing gradients, weights, and other metrics.

52
101FLBSBL2122009

You might also like