ETI-CH1 Notes
ETI-CH1 Notes
ETI-CH1 Notes
Artificial intelligence
(ai) refers to the simulation of human intelligence in machines that are programmed to
think like humans and mimic their actions. The term may also be applied to any machine
that exhibits traits associated with a human mind such as learning and problem-solving.
speech recognition some intelligent systems are capable of hearing and comprehending
the language in terms of sentences and their meanings while a human talks to it. It can
handle different accents, slang words, noise in the background, change in human’s noise
due to cold, etc.
handwriting recognition the handwriting recognition software reads the text written on
paper by a pen or on screen by a stylus. It can recognize the shapes of the letters and
convert it into editable text.
intelligent robots robots are able to perform the tasks given by a human. They have
sensors to detect physical data from the real world such as light, heat, temperature,
movement, sound, bump, and pressure. They have efficient processors, multiple sensors
and huge memory, to exhibit intelligence. In addition, they are capable of learning from
their mistakes and they can adapt to the new environment.
To understand some of the deeper concepts, such as data mining, natural language
processing, and driving software, you need to know the three basic ai concepts: machine
learning, deep learning, and neural networks.
It’s likely that you’ve interacted with some form of ai in your day-to-day activities. If you use
gmail, for example, you may enjoy the automatic e-mail filtering feature. If you own a
smartphone, you probably fill out a calendar with the help of siri, cortana, or bixby. If you
own a newer vehicle, perhaps you’ve benefited from a driver-assist feature while driving.
In the simplest terms, machines are given a large amount of trial examples for a certain
task. As they go through these trials, machines learn and adapt their strategy to achieve
those goals.
A well-known example of this ai concept is quick, draw!, a google-hosted game that lets
humans draw simple pictures in under 20 seconds, with the machine-learning algorithm
trying to guess the drawing. More than 15 million people have contributed more than 50
million drawings to the app.
How do we get machines to learn more than just a specific task? What if we want it to be
able to take what it has learned from analyzing photographs and use that knowledge to
analyze different data sets? This requires computer scientists to formulate general-purpose
learning algorithms that help machines learn more than just one task.
One famous example of deep learning in action is google’s alphago project written in lua, c+
+, and python code. The alphago ai was able to beat professional go players, a feat that was
thought impossible given the game’s incredible complexity and reliance on focused practice
and human intuition to master.
Neural networks follow natural model
Deep learning is often made possible by artificial neural networks, which imitate neurons,
or brain cells. Artificial neural networks were inspired by things we find in our own biology.
The neural net models use math and computer science principles to mimic the processes of
the human brain, allowing for more general learning.
An artificial neural network tries to simulate the processes of densely interconnected brain
cells, but instead of being built from biology, these neurons, or nodes, are built from code.
Essentially, if the unit of information reaches a certain threshold, then it is able to pass to
the next layer. In order to learn from experience, machines compare outputs from a neural
network, then modify connections, weights, and thresholds based on the differences
among them.
Strong ai or asi.
Strong ai is a futuristic concept that has only been the premise of a sci-fi movie until now.
Strong ai will be the ultimate dominator as it would enable machines to design self-
improvements and outclass humanity. It would construct cognitive abilities, feelings, and
emotions in machines better than us.
Reactive machines.
Reactive machines are the most basic type of unsupervised ai. This means that they
cannot form memories or use past experiences to influence present-made decisions;
they can only react to currently existing situations – hence “reactive.”
Reactive machines have no concept of the world and therefore cannot function beyond
the simple tasks for which they are programmed. A characteristic of reactive machines is
that no matter the time or place, these machines will always behave the way they were
programmed. There is no growth with reactive machines, only stagnation in recurring
actions and behaviors.
Limited memory.
Limited memory is comprised of supervised ai systems that derive knowledge from
experimental data or real-life events. Unlike reactive machines, limited memory learns
from the past by observing actions or data fed to them to create a good-fit model.
Although limited memory builds on observational data in conjunction with pre-
programmed data the machines already contain, these sample pieces of information are
fleeting. An existing form of limited memory is autonomous vehicles.
Theory of mind.
As the name suggests, theory of mind is a technique of passing the baton of your ideas,
decisions, and thought patterns to computers. While there are some machines that
currently exhibit humanlike capabilities, none are fully capable of holding conversations
relative to human standards. Even the most advanced robot in the world lacks
emotional intelligence factor (sounding and behaving like a human).
This future class of machine ability would include understanding that people have
thoughts and emotions that affect behavioral output and thus influence a “theory of
mind” machine’s thought process. Social interaction is a key facet of human interaction.
So to make the theory of mind machines tangible, the ai systems that control the now-
robots would have to identify, understand, retain, and remember emotional responses.
These machines can process human commands and adapt them to their learning centers
to understand the rules of basic communication and interactions. Theory of mind is an
advanced form of proposed artificial intelligence that would require machines to
thoroughly acknowledge rapid shifts in emotional and behavioral patterns in humans.
Harmonizing interactions at this level will require a lot of testing and abstract thinking.
Example:
Some elements of the theory of mind ai currently exist or have existed in the recent past.
Two notable examples are the robots kismet and sophia, created in 2000 and 2016,
respectively.
Self-awareness.
Self-aware ai involves machines that have human-level consciousness. This form of ai is
not currently in existence but would be considered the most advanced form of artificial
intelligence known to man.
Facets of self-aware ai include the ability to not only recognize and replicate humanlike
actions, but also to think for itself, have desires, and understand its feelings. Self-aware
ai, in essence, is an advancement and extension of the theory of mind ai. Where the
theory of mind only focuses on the aspects of comprehension and replication of human
practices, self-aware ai takes it a step further by implying that it can and will have self-
guided thoughts and reactions.
Type 1: reactive machines. These ai systems have no memory and are task specific. An
example is deep blue, the ibm chess program that beat garry kasparov in the 1990s.
Deep blue can identify pieces on the chessboard and make predictions, but because it
has no memory, it cannot use past experiences to inform future ones.
Type 2: limited memory. These ai systems have memory, so they can use past
experiences to inform future decisions. Some of the decision-making functions in self-
driving cars are designed this way.
Type 3: theory of mind. Theory of mind is a psychology term. When applied to ai, it
means that the system would have the social intelligence to understand emotions. This
type of ai will be able to infer human intentions and predict behavior, a necessary skill
for ai systems to become integral members of human teams.
Type 4: self-awareness. In this category, ai systems have a sense of self, which gives
them consciousness. Machines with self-awareness understand their own current
state. This type of ai does not yeWhat are the types of data visualization? The most
common data visualization types are scatter plots, bar charts, heat maps, line graphs,
pie charts, area charts, choropleth maps and histogramst exist.
What are the types of data visualization?
The most common data visualization types are scatter plots, bar charts, heat maps, line
graphs, pie charts, area charts, choropleth maps and histograms
What are the data types in data visualization?
What are the main types of data visualization? The main types of data visualization
include charts, graphs and maps in the form of line charts, bar graphs, tree charts, dual-axis
charts, mind maps, funnel charts and heatmaps.
Tables: This consists of rows and columns used to compare variables. Tables can
show a great deal of information in a structured way, but they can also overwhelm
users that are simply looking for high-level trends.
Pie charts and stacked bar charts: These graphs are divided into sections that
represent parts of a whole. They provide a simple way to organize data and compare
the size of each component to one other.
Line charts and area charts: These visuals show change in one or more quantities by
plotting a series of data points over time and are frequently used within predictive
analytics. Line graphs utilize lines to demonstrate these changes while area charts
connect data points with line segments, stacking variables on top of one another and
using color to distinguish between variables.
Histograms: This graph plots a distribution of numbers using a bar chart (with no
spaces between the bars), representing the quantity of data that falls within a
particular range. This visual makes it easy for an end user to identify outliers within a
given dataset.
Scatter plots: These visuals are beneficial in reveling the relationship between two
variables, and they are commonly used within regression data analysis. However,
these can sometimes be confused with bubble charts, which are used to visualize
three variables via the x-axis, the y-axis, and the size of the bubble.
Heat maps: These graphical representation displays are helpful in visualizing
behavioral data by location. This can be a location on a map, or even a webpage.
Tree maps, which display hierarchical data as a set of nested shapes, typically
rectangles. Treemaps are great for comparing the proportions between categories
via their area size.
Whenever we visualize data, we take data values and convert them in a systematic and
logical way into the visual elements that make up the final graphic. Even though there
are many different types of data visualizations, and on first glance a scatter plot, a pie
chart, and a heatmap don’t seem to have much in common, all these visualizations can
be described with a common language that captures how data values are turned into
blobs of ink on paper or colored pixels on screen. The key insight is the following: All data
visualizations map data values into quantifiable features of the resulting graphic. We
refer to these features as aesthetics.
Figure 2.1: Commonly used aesthetics in data visualization: position, shape, size, color,
line width, line type. Some of these aesthetics can represent both continuous and
discrete data (position, size, line width, color) while others can usually only represent
discrete data (shape, line type).
Appropriate
Type of variable Examples Description
scale
Arbitrary numerical values.
quantitative/numerical 1.3, 5.7, 83,
continuous These can be integers, rational
continuous 1.5x10-2
numbers, or real numbers.
Figure 2.2: Scales link data values to aesthetics. Here, the numbers 1 through 4 have
been mapped onto a position scale, a shape scale, and a color scale. For each scale, each
number corresponds to a unique position, shape, or color and vice versa.
Let’s put things into practice. We can take the dataset shown in Table 2.2, map
temperature onto the y axis, day of the year onto the x axis, location onto color, and
visualize these aesthetics with solid lines. The result is a standard line plot showing the
temperature normals at the four locations as they change during the year (Figure 2.3).
Figure 2.3: Daily temperature normals for four selected locations in the U.S. Temperature
is mapped to the y axis, day of the year to the x axis, and location to line color. Data
source: NOAA.
Figure 2.3 is a fairly standard visualization for a temperature curve and likely the
visualization most data scientists would intuitively choose first. However, it is up to us
which variables we map onto which scales. For example, instead of mapping temperature
onto the y axis and location onto color, we can do the opposite. Because now the key
variable of interest (temperature) is shown as color, we need to show sufficiently large
colored areas for the color to convey useful information (Stone, Albers Szafir, and
Setlur 2014). Therefore, for this visualization I have chosen squares instead of lines, one
for each month and location, and I have colored them by the average temperature
normal for each month (Figure 2.4).
2.2 Scales map data values onto aesthetics
To map data values onto aesthetics, we need to specify which data values correspond to
which specific aesthetics values. For example, if our graphic has an x axis, then we need
to specify which data values fall onto particular positions along this axis. Similarly, we
may need to specify which data values are represented by particular shapes or colors.
This mapping between data values and aesthetics values is created via scales. A scale
defines a unique mapping between data and aesthetics (Figure 2.2). Importantly, a scale
must be one-to-one, such that for each specific data value there is exactly one aesthetics
value and vice versa. If a scale isn’t one-to-one, then the data visualization becomes
ambiguous.
Scales link data values to aesthetics.
Here, the numbers 1 through 4 have been mapped onto a position scale, a shape scale,
and a color scale. For each scale, each number corresponds to a unique position, shape,
or color and vice versa.
Figure 2.2: Scales link data values to aesthetics. Here, the numbers 1 through 4 have
been mapped onto a position scale, a shape scale, and a color scale. For each scale, each
number corresponds to a unique position, shape, or color and vice versa.
Let’s put things into practice. We can take the dataset shown in Table 2.2, map
temperature onto the y axis, day of the year onto the x axis, location onto color, and
visualize these aesthetics with solid lines. The result is a standard line plot showing the
temperature normals at the four locations as they change during the year (Figure 2.3).
Daily temperature normals for four selected locations in the U.S. Temperature is mapped
to the y axis, day of the year to the x axis, and location to line color. Data source: NOAA.
Figure 2.3: Daily temperature normals for four selected locations in the U.S. Temperature
is mapped to the y axis, day of the year to the x axis, and location to line color. Data
source: NOAA.
Figure 2.3 is a fairly standard visualization for a temperature curve and likely the
visualization most data scientists would intuitively choose first. However, it is up to us
which variables we map onto which scales. For example, instead of mapping temperature
onto the y axis and location onto color, we can do the opposite. Because now the key
variable of interest (temperature) is shown as color, we need to show sufficiently large
colored areas for the color to convey useful information (Stone, Albers Szafir, and Setlur
2014). Therefore, for this visualization I have chosen squares instead of lines, one for
each month and location, and I have colored them by the average temperature normal
for each month (Figure 2.4).
Monthly normal mean temperatures for four locations in the U.S. Data source: NOAA
Figure 2.4: Monthly normal mean temperatures for four locations in the U.S. Data source:
NOAA
I would like to emphasize that Figure 2.4 uses two position scales (month along the x axis
and location along the y axis) but neither is a continuous scale. Month is an ordered
factor with 12 levels and location is an unordered factor with four levels. Therefore, the
two position scales are both discrete. For discrete position scales, we generally place the
different levels of the factor at an equal spacing along the axis. If the factor is ordered (as
is here the case for month), then the levels need to placed in the appropriate order. If
the factor is unordered (as is here the case for location), then the order is arbitrary, and
we can choose any order we want. I have ordered the locations from overall coldest
(Chicago) to overall hottest (Death Valley) to generate a pleasant staggering of colors.
However, I could have chosen any other order and the figure would have been equally
valid.
Both Figures 2.3 and 2.4 used three scales in total, two position scales and one color
scale. This is a typical number of scales for a basic visualization, but we can use more
than three scales at once. Figure 2.5 uses five scales, two position scales, one color scale,
one size scale, and one shape scale, and all scales represent a different variable from the
dataset.
Fuel efficiency versus displacement, for 32 cars (1973–74 models). This figure uses five
separate scales to represent data: (i) the x axis (displacement); (ii) the y axis (fuel
efficiency); (iii) the color of the data points (power); (iv) the size of the data points
(weight); and (v) the shape of the data points (number of cylinders). Four of the five
variables displayed (displacement, fuel efficiency, power, and weight) are numerical
continuous. The remaining one (number of cylinders) can be considered to be either
numerical discrete or qualitative ordered. Data source: Motor Trend, 1974.
Figure 2.4: Monthly normal mean temperatures for four locations in the U.S. Data source:
NOAA
I would like to emphasize that Figure 2.4 uses two position scales (month along the x axis
and location along the y axis) but neither is a continuous scale. Month is an ordered
factor with 12 levels and location is an unordered factor with four levels. Therefore, the
two position scales are both discrete. For discrete position scales, we generally place the
different levels of the factor at an equal spacing along the axis. If the factor is ordered (as
is here the case for month), then the levels need to placed in the appropriate order. If
the factor is unordered (as is here the case for location), then the order is arbitrary, and
we can choose any order we want. I have ordered the locations from overall coldest
(Chicago) to overall hottest (Death Valley) to generate a pleasant staggering of colors.
However, I could have chosen any other order and the figure would have been equally
valid.
Both Figures 2.3 and 2.4 used three scales in total, two position scales and one color
scale. This is a typical number of scales for a basic visualization, but we can use more
than three scales at once. Figure 2.5 uses five scales, two position scales, one color scale,
one size scale, and one shape scale, and all scales represent a different variable from the
dataset.
Cartesian coordinates
The most widely used coordinate system for data visualization is the 2d Cartesian
coordinate system, where each location is uniquely specified by an x and a y value.
The x and y axes run orthogonally to each other, and data values are placed in an even
spacing along both axes
Nonlinear axes
In a Cartesian coordinate system, the grid lines along an axis are spaced evenly both in
data units and in the resulting visualization. We refer to the position scales in these
coordinate systems as linear. While linear scales generally provide an accurate
representation of the data, there are scenarios where nonlinear scales are preferred. In
a nonlinear scale, even spacing in data units corresponds to uneven spacing in the
visualization, or conversely even spacing in the visualization corresponds to uneven
spacing in data units.
The most commonly used nonlinear scale is the logarithmic scale or log scale for short.
Log scales are linear in multiplication, such that a unit step on the scale corresponds to
multiplication with a fixed value.
Color scales
There are three fundamental use cases for color in data visualizations: (i) we can use
color to distinguish groups of data from each other; (ii) we can use color to represent
data values; and (iii) we can use color to highlight. The types of colors we use and the
way in which we use them are quite different for these three cases.
4.1 Color as a tool to distinguish
We frequently use color as a means to distinguish discrete items or groups that do not
have an intrinsic order, such as different countries on a map or different manufacturers
of a certain product. In this case, we use a qualitative color scale. Such a scale contains a
finite set of specific colors that are chosen to look clearly distinct from each other while
also being equivalent to each other. The second condition requires that no one color
should stand out relative to the others. And, the colors should not create the impression
of an order, as would be the case with a sequence of colors that get successively lighter.
Such colors would create an apparent order among the items being colored, which by
definition have no order.
With the rise of digital business and data-driven decision-management, data storytelling
has become a skill often associated with data science and business analytics. The idea is to
connect the dots between sophisticated data analyses and decision-makers who might not
have the skills to interpret the data.
Data storytelling can also be used to convey interesting usage metrics for customers.
The three components of data storytelling
Data storytelling comprises data, narrative and visualizations.
1. The data serves as the base of a data story. It's information from accurate data
gathering and analysis. Data can be gathered from such places as charts and dashboards
using data analysis tools.
2. The narrative is a verbal or written storyline that's used to effectively communicate
insights from the data. The narrative should be within the context of the data and aim to
show a clear reasoning for following actions or decisions. Narratives should be based on
data and present a clear explanation of what the data means and its importance.
3. Visualizations act as further representations of both the data and narrative and are used
to communicate the story more clearly. Visualizations include graphs, charts, diagrams
and photos.
Infectiveness of Graphical representation of data/What are the disadvantages of graphical
method?
Disadvantages of Graphical Methods of Estimation
they are biased,
even with large samples, they are not minimum variance (i.e., most precise) estimates,
graphical methods do not give confidence intervals for the parameters (intervals generated
by a regression program for this kind of data are incorrect)