Probability & Sample Space
Probability & Sample Space
Lesson 1
Probability Basics
• A probability is a quantitative measure of uncertainty - a number that
conveys the strength of our belief in the occurrence of an uncertain
event.
• Refers to the chance of occurrence of an event or happening.
Probability: CONCEPTS
• BASIC CONCEPTS
• An experiment
• An experiment is a process that leads to one of several possible outcomes. An outcome
of an experiment is some observation or measurement.
• A sample space
• The sample space is the universal set S pertinent to a given experiment. It is the set of all
possible outcomes of an experiment.
• An event
• An event is a subset of a sample space. It is a set of basic outcomes.
Probability: APPROACHES
• APPROACHES TO PROBABILITY THEORY
• Situation I : THE CLASSICAL APPROACH
• have a sample space with equally likely basic outcomes
• equally likely outcomes.
• P(A) = n(A)/N(S)
• n(A) = Number of outcome favorable to the event.
• N(S) = total number of outcomes.
Probability: APPROACHES
• APPROACHES TO PROBABILITY THEORY
• situation II : THE RELATIVE FREQUENCY APPROACH
• Not equally likely outcomes.
• Not known total number of outcomes. (frequency)
• The probability of occurrence of an event is the ratio of the number of times the event
occurs to the total number of trial.
• P(A) = n/N
• n = The number of the event occur.
• N= total number of Trials.
Probability Basics
• Events that are certain to occur have probability 1.00. The probability
of the entire sample space S is equal to 1.00: P(S) = 1.00.
Probability Basics
• PROBABILITY DISTRIBUTIONS
• sample space : Child birth of three births.
P(X)
1,0.375
2,0.375
0,0.125 3,0.125
• Suppose that 3 out of the 10 members of the board of directors of a large corporation are to
be randomly selected to serve on a particular task committee. How many possible selections
are there?
• If the committee is chosen in a truly random fashion, what is the probability that the three
committee members chosen will be the three senior board members? This is 1 combination
out of a total of 120, so the answer is 1/120 = 0.00833.
Combinations
• In a region, a Telecommunication service provider has the following
services to their subscriber base of million.
• Voice call.
• Video call.
• MMS. (Multimedia Messaging Service)
• SMS. (Short Message Service)
• Text messaging.
The individual services: n=5 & r =1. 120/24 = 5. Five individual services={ Voice call (VC), Video call (VD), MMS , SMS, Text Msg (TM) }
10 CM
• In order to build 100 x 100 size wall, how many
bricks are required? - O(n) ?
20 CM
Big-O Notation – Scaling Out
Max_value_combination(int N){ ///{Voice call (VC), Video call (VD), MMS , SMS, Text Msg (TM) }
.
.
.
finds_n_value_comb(N)
}
Big-O Notation – Scaling Out
finds_the_single_value_comb(int N) //// {Voice call (VC), Video call (VD), MMS , SMS, Text Msg (TM) }
for(i=0;i<N;i++){
QueryDataForEachSingleComb() --- O(n) – run time
}
}
More practical:
Doubling the size of the input roughly doubles the runtime. Therefore, the input and
runtime have a linear (O(n)) relationship
Big-O Notation – Scaling Out
finds_2_value_comb(int N){
for(i=0;i<N;i++){
selectEachAttaribute -----O(n)
for(i=0;i<N;i++){
QueryDataForEachComb() --- O(n) Total run time: O(n^2)
}
}
}
User-Driven and Data-Driven approaches
• User-Driven:
• Using data as one of the inputs against only data as input in decision making.
• The other factors are like preconceived objectives (ideas) and employee expertise,
in decision-making.
• It is Judgmental.
• Data-Driven:
• Data-driven means making decisions based solely on data.
• This approach is equivalent to possibly explores all possible solution states (Sample
Space).
• This approach might bring the hidden truth or the unexplored solution state of the
user expertise.
Machine learning
• Machine learning refers to the automated detection of meaningful
patterns in data.
• In contrast to more traditional uses of computers.
• Due to the complexity of the patterns that need to be detected
(exploring all sample space – Big O), a human programmer cannot
provide an explicit, fine detailed specification of how such tasks
should be executed.
• Machine learning tools are concerned with providing programs with
the ability to “learn” and adapt.
Machine learning
• Tasks beyond Human Capabilities:
• With more and more available digitally recorded data, it becomes obvious
that there are treasures of meaningful information buried in data archives
that are way too large and too complex for humans to make sense of.
• Learning to detect meaningful patterns in large and complex data sets is a
promising domain in which the combination of programs that learn and
detect the predominant patterns.
Information Gain
• In order to define the information gain precisely, a measure
"ENTROPY" is used in information theory.
• ENTROPY: measure the level of Impurity/Homogeneity.
• Pi - Probability of class i.
Entropy
Subscriber VC VD MMS SMS TM Business Value
• Attribute: VC A
B
High
Medium
Medium
High
Low
Medium
Medium
Low
High
Medium
Pos
Pos
C Medium Low Medium Medium Medium Neg
• Phigh = 2/4 = 0.5 D High Medium Low Low High Pos
VD
High Low
P=1 P=1
Classification and Regression Trees (CART)
• CART: Ability to handle both categorical and continuous features.
• CART: Ability to capture non-linear relationships between features and the target variable.
• The CART algorithm produces only binary Trees: non-leaf nodes always have two children.
• On the contrary, ID3, can produce Decision Trees with nodes having more than two
children
• CART can be used to explain a continuous or categorical dependent variable in terms of
multiple independent variables. The independent variables can be continuous or categorical.
• The CART algorithm works to find the independent variable that creates the best
homogeneous group when splitting the data.
• The algorithm selects a feature (attribute) and a threshold that best separates the training
data into two groups. Gini Impurity is used.
Gini impurity
• Measure of the quality of a split in a decision tree algorithm.
• It measures the probability of misclassification of a random sample
P(pos) = ¾ =0.75
P(Neg) =1/4 =0.25
P(VC =Red) = 0.5 * (1-0.5) + 0.5 * (1-0.5) = 0.5 (Branch left side) Pos
Thus, the amount of impurity we’ve “removed” with this split is:
= 1/4 *0 +3/4*0 = 0
Thus, the amount of impurity we’ve “removed” with this split is:
P(VC =Red) = 0.5 * (1-0.5) + 0.5 * (1-0.5) = 0.5 (Branch left side) Pos
Thus, the amount of impurity we’ve “removed” with this split is:
Thus, the amount of impurity we’ve “removed” with this split is:
Thus, the amount of impurity we’ve “removed” with this split is: