Types of Digital Data
Types of Digital Data
Types of Digital Data
course objective
After going through this course, participant will be able to
In various business scenarios, we can ask some pointed questions on data. From given
options, please select the correct one.
Problem statement
We need to transmit a message over the network: “PREPARE to
NEGOTIATE”.
Message transmission requires encryption at transmitter and
decryption at receiver end.
To encrypt and decrypt, we need to use a confidential piece of
information, usually referred to as a key.
7|Page
Step 2: The message is split into a sequence of 3x1 vectors:
Step 3: A 3 x 3 encoding matrix is used to encrypt the message
vectors:
Step 5: After multiplication, the original enumerated matrix will be
obtained. The original message can now be decoded from this matrix.
Problem statement
Currents I1, I2 and I3 need to be determined for the following electrical
network:
8|Page
Solution
Step 1: The equations for current are written based on Kirchhoff’s Law.
Step 2: These equations are converted into a matrix.
Step 3: The matrix is solved to get the values of the currents.
Problem statement
Five visitors of a social networking site are linked with each other as
depicted by the directed graph G below:
9|Page
How can we use these relationships to extract more information about them
and predict their proposed activities?
Solution
Step 1: These relationships can be converted into a relationship chart in
which “1” indicates related and “0” indicates unrelated:
Step 2: From the chart created in the previous step, the adjacency matrix
for the directed graph is:
Types of Learning
11 | P a g e
A machine is taught to identify various fruits by building a model with the help of images.
A new set of images is given to this model as test data so that it can classify different fruits.
12 | P a g e
There is a basket filled with some fresh fruits. The machine’s task is to
identify the same type of fruits based on the colors (labels).
But here, unlike supervised learning, the machine is not exposed to any
prior knowledge. So how will it arrange the same type of fruits based on
their colors?
Clustering
Machine has identified four clusters of fruits: Red, Green, Yellow, and Orange
13 | P a g e
Reinforcement learning
Reinforcement learning
14 | P a g e
To solve a given business problem, various blocks of the Data Science stack
are tightly coupled with each other
Core algorithms need to be written in some programming language for
implementation.
Most algorithms use the basic concepts of linear algebra.
Statistical computations need to be done on the given data.
Available data in structured, un-structured and semi-structured form
need to be managed through various data management systems.
Computer Science provides us with the necessary programming languages,
database management systems, statistical analysis and machine tools.
15 | P a g e
*Ref. Practical Data Science with R by NINA ZUMEL, JOHN MOUNT, MANNING SHELTER ISLAND
16 | P a g e
Country Bank of India feel that they are losing too much money to bad loans
and want to reduce their losses.
Data Science shall be able to help the bank reduce their losses from bad
loans, say by X%.
Why does the business organization want the project in the first place? What
do they lack, and what do they need?
What are they doing to solve the problem now, and why isn’t that good
enough?
What resources will we need? What kind of data is available? Is domain
expertise available within the team? What are the computational resources
available/required?
How does the business organization plan to deploy the derived results? What
are the constraints that have to be met for successful deployment?
This step encompasses identifying the data you need, exploring it, and
conditioning it to be suitable for analysis. This stage is often the most time
consuming step in the process. This step helps find answers to these
questions:
In this step, we use statistics and machine learning to extract useful insights
from the data in order to achieve our goals. The most common data science
modeling tasks are as follows:
Is the model accurate enough for our needs? Does it generalize well?
Does it perform better than “the obvious guess”? Better than the estimate we
currently use?
If the answer to any of these questions is no, it’s time to go back to the
previous step, or relook whether the selected data support the goal we are
trying to achieve.
Once we have a model that meets our success criteria, we’ll present our
results to our project sponsor and other stakeholders.
19 | P a g e
We must also document the model for those in the organization who are
responsible for using, running, and maintaining the model once it is
deployed.
Different audiences require different kinds of information. For example, a
business-oriented audience may want to understand the impact of our
findings in terms of business metrics.
Refrences
http://www.wolfram.com/
https://en.wikipedia.org/wiki/Data_science
https://en.wikipedia.org/wiki/Statistics
https://en.wikipedia.org/wiki/Probability
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=Mac
hineLearning
https://en.wikipedia.org/wiki/Machine_learning
https://www.r-project.org/
https://cran.r-project.org/
http://www.nist.gov/
http://hortonworks.com/
www.techtarget.com
http://www.businessdictionary.com/
Think Stats - Probability and Statistics for Programmers” by Allen B. Downey,
Green Tea Press, Needham, Massachusetts
22 | P a g e
http://robotics.stanford.edu/~ronnyk/glossary.html
Reinforcement Learning - An Introduction, Richard S. Sutton and Andrew G.
Barto, A Bradford Book. The MIT Press, Cambridge, Massachusetts, London,
England
Practical Data Science with R by NINA ZUMEL, JOHN MOUNT, MANNING
SHELTER ISLAND
Related:
Probability Distribution
Data Visualization
Statistical Inference
Association Analysis