Quantum Notes: Life Beyond Classical

QUANTUM NOTES:
life beyond classical probability theory
Ψ
Budapest Semesters in Mathematics / Aquincum Institute of Technology
Preface
The aim of this note is to explain why we cannot use classical probability theory when deal-
ing with quantum phenomena and to outline the framework of a more general probability
theory. However, the explicit details of quantum probability theory are not yet discussed
here since that requires knowledge of Hilbert spaces.
I believe that a good course cannot begin with technical details. So I decided that
the first 2 weeks we would discuss the general ideas explained here, and only afterwards
we would make the necessary linear algebra brush up, study properties of self-adjoint and
positive operators, etc.
Over the years, starting as a couple of pages long handout, the notes kept getting steadily
longer and longer. This particular note was mainly written up by BSM student Chu Yue
(Stella) Dong; so let me also use this preface to thank her.
Mihály Weiner, Budapest 2014.

Contents
1 The structure of events 3

1.1 The structure of events of a classical system . . . . . . . . . . . . . . . . . 4
1.2 The need for a more general probability theory . . . . . . . . . . . . . . . . 4
1.3 Partially ordered sets and lattices . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Not-functions and ortho-lattices . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Probabilities 19
2.1 Probability functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 The convex structure of probability functions . . . . . . . . . . . . . . . . . 22
2.3 Distributivity and determinism . . . . . . . . . . . . . . . . . . . . . . . . 24
2 CONTENTS
Chapter 1
The structure of events
Topics:
• the power set as the set of events of a classical (i.e. non quantum physicial) system,
the distributivity regarding the operations “and” and “or”
• motivations for studying the non-distributive case; the double-slit experiment,
• partial ordering and POSETs;
• “and” (∧) and “or” (∨) in a POSET,
• lattices,
• “not”-functios and ortho-lattices,
• compatible and exclusive events.
3
4 CHAPTER 1. THE STRUCTURE OF EVENTS
1.1 The structure of events of a classical system

One of the basic concepts of probability theory is that of events. In the classical math-
ematical description they appear as subsets of a certain set. Here are some examples of
possible events regarding the casting of a die:
A: “the outcome is even”,
B: “the outcome is a prime number”,
C: “the outcome is greater than 3”,
D: “the outcome is exactly 5”.
In a “standard” probability course all these events would be described as subsets of the
set H := {1, 2, 3, 4, 5, 6}. In particular A ⊂ H would be the subset {2, 4, 6} and D ⊂ H
would be the one-element subset {5}, and in general in this case the set of events would
be identified with the power-set P(H).
In this description the operations “and” (∧) and “or” (∨) would correspond to taking
intersection and union, respectively. In particular, since the set-operations ∩, ∪ satisfy the
law of distributivity, in classical probability theory for any 3 events A, B, C one has that
A ∧ (B ∨ C) = (A ∧ B) ∨ (A ∧ C),
A ∨ (B ∧ C) = (A ∨ B) ∧ (A ∨ C).
1.2 The need for a more general probability theory

According to our understanding of quantum physics, certain things — like position and
momentum of a particle — cannot be simultaneously measured together with arbitrary
precision. So what should a statement like “the particle was here AND it had such and
such momentum” mean?
As we shall later discuss, the operations “and” and “or” can be introduced even in
quantum physics. However, unlike in the classical world, distributivity will not necessary
hold. Let us see the kind of problems we may have with distributivity in an actual example.
In the double-slit experiment a coherent light source illuminates a thin plate with two
parallel slits cut in it. Passing through both slits, the light finally arrives to a screen where
it creates a sequence of bright and dark bands. This is a so-called interference pattern,
which can be well-explained by the wave nature of light.
However, instead of using a strong light source, nowdays we may perform experiments with
“individual” photons shot one after the other. Now suppose that we cover the slit A using
a small block and shoot a number of photons so that in total about 500 of them will pass
through the slit B. Of course, by symmetry around another 500 would be blocked at A.
Nevertheless, the ones passing through at slit B will actually arrive to the screen. Most of
1.2. THE NEED FOR A MORE GENERAL PROBABILITY THEORY 5
Figure 1.1: The double-slit experiment
Figure 1.2: The double-slit experiment with slit A blocked

Figure 1.3: The double-slit experiment with slit B blocked
them would hit the screen relatively close to the slit B, while few of them would hit much
further out giving a distribution like this:
Moving the block from slit A to slit B and repeating the experiment would lead to a similar
distribution with center closer to A:
What kind of distribution should we see if we leave both slit A and B open? We are still
shooting the photons individually; one after the other. Our classical thinking suggests that
• some would pass at slit A, some would pass at slit B,
• the ones passing at A would be distributed on the screen like in figure 1.3,
• the ones passing at B would be distributed on the screen like in figure 1.2,
• so the obtained distribution should be the sum of the two previous distributions, like
on the next figure:
However, this is not what we have! The real experiment will give a distribution of inter-
ference pattern:
Observe that in contrast to the experiments with slit B blocked, now there is almost
no photon arriving to the region of the screen indicated with letter C. So consider the
statements
A: “the photon passes at slit A”,
B: “the photon passes at slit B”,
C: “the photon hits the screen at C”.
Summing the distributions obtained in the first two experiments corresponds to considering
events like “the photon
1.2. THE NEED FOR A MORE GENERAL PROBABILITY THEORY 7
Figure 1.4: Double-slit experiment: what we would expect by “classical” reasoning
Figure 1.5: Double-slit experiment: the real-life distribution

(passes at A and hits at C) or (passes at B and and hits at C)
the screen”. On the other hand, from the last experiment we can get information regarding
the event that “the photon
(passes at A or B) and hits at C
the screen”. So the real-life results of the experiments seem to be in conflict with the naive,
classical reasoning that uses the distributivity law (A∧C)∨(B∧C)= (A∨B)∧C.
1.3 Partially ordered sets and lattices

In a mathematical model, possible events regarding a certain physical system should be
elements of a certain set. What kind of structure this set should be endowed with? One
might be tempted to immediately endow it with some binary operations (that would stand
for the words “and” and “or”) satisfying certain properties. However, as we have seen, when
facing quantum phenomena, their meaning and their properties become rather unclear. It
is therefore better to begin with something which is more basic than “and” and “or”.
Consider the events A:=“the Sun is not shining” and B:=“it is 12 o’clock midnight” as
events taking place at Budapest. At 12 o’clock midnight — at least, at Budapest — the
Sun is surely not shining. So whenever B happens, A must occure, too. We say that B
is a more restrictive event than A, or equivalently, that A is more general than B; in
notations
B ≤ A.
It is clear that we cannot expect that any two events can be compared. Suppose, for
example, that A and B are like above and C is the event “it is cloudy”. Then neither B is
more restrictive than C, nor it is vice versa. However, we may expect that
• X≤X for every event X,
• if both X≤Y and Y≤X for two events X and Y, then infact X=Y,
• if for three events X,Y and Y we have X≤Y and Y≤Z, then X≤Z.
That is, in our mathematical description, events should be elements of a partially ordered
set, or in short: a POSET.
Definition 1.3.1. A relation ≤ on a set S satisfying for all a, b, c ∈ S
• reflexivity: a ≤ a,
• anti-symmetry: if a ≤ b and b ≤ a, then a = b,
• transitivity: if a ≤ b and b ≤ c, then a ≤ c,

1.3. PARTIALLY ORDERED SETS AND LATTICES 9
Figure 1.6: Graphical representation of {1,2,3,4,5,6} with the divisibility relation
is called a partial ordering. A set S together with a (fixed) partial ordering is called a
POSET.
Example 1.3.2. Consider divisibility between natural numbers. For two natural numbers
a, b ∈ N, we write a|b whenever a divides b. It is clear that the relation | is a partial
ordering and hence that any subset of the natural numbers H ⊂ N, together with the
relation |, is a POSET.
Graphical representation of POSETs. Sometimes it is helpful to “visualize” a POSET. A

common way for doing so is the following. Elements of our POSET are represented as
vertices of a graph. Edges are drawn in such a way that the relation a ≤ b will hold if and
only if there is a path from a to b such that it contains steps in the upward direction only.
Example 1.3.3. A graphical representation of the set H := {1, 2, 3, 4, 5, 6} considered as

a POSET with the divisibility relation | is the following:
Let us return now to the question of “and” and “or”; how should they appear in our
mathematical description? Since we have already introduced a partial ordering between
events, first we should consider the kind of relation we want to have between these binary
operations and the ordering.
A natural set of requirements is the following:
• “A and B” is more restrictive than both A and B; in notations (A∧B)≤A,B
• both A and B are more restrictive than “A or B”; in notations A,B≤(A∨B),
• if C≤A and C≤B, then C≤(A∧B),
• if A≤C and B≤C, then (A∨B)≤C.

To put it in another way, “A and B” should be the infimum, while “A or B” should be the
supremum of A and B.
A collection of elements in a POSET does not necessarily have a supremum or infimum.
However, if exists at all, the supremum (and likewise, the infimum) is unique. This is a
simple consequence of the anti-symmetry of the order relation.
Definition 1.3.4. Let S be a POSET with partial ordering ≤. By the supremum of a

set of elements H ⊂ S we mean an element c = sup(H) such that x ≤ c for all x ∈ H and
(∀x ∈ H : x ≤ y) ⇒ c ≤ y.
If c0 also satisfies the above two requirements then on one hand x ≤ c0 for all x ∈ H
implying that c ≤ c0 , on the other hand, exchanging the role of c and c0 we also have that
c0 ≤ c. Hence c = c0 and sup(H), if exists, is unique.
By the infimum of H we mean an element c = inf(H) such that c ≤ x for all x ∈ H and
(∀x ∈ H : y ≤ x) ⇒ y ≤ c.
Similarly to the supremum, inf(H), if exists, is unique.
Definition 1.3.5. For the infimum and supremum of two elements in a POSET we use
the notations
inf{a, b} ≡ a ∧ b, sup{a, b} ≡ a ∨ b.
Example 1.3.6. Consider the POSET given by figure 1.6. Let us investigate what we can
say about 4 ∨ 6, 2 ∧ 4 and 2 ∧ 5. To determine the first one, we need to list all elements
that are smaller than both 4 and 6. They are: 1, 2, of which the greatest element is 2. So
4 ∧ 6 ≡ inf{4, 6} = 2.
Similar argument shows that 2 ∨ 4 = 4. For the last one, first we need to list all elements
that are greater than both 2 and 5. However, in this POSET there is no such element!
Thus there is no 2 ∨ 5 in this POSET. However, instead of the set {1, 2, 3, 4, 5, 6} consider
now — still with partial ordering given by the divisibility relation — the larger set of
natural numbers N. In this POSET x ∨ y and x ∧ y will always exists. Infact, it is not too
difficult to see, that x ∧ y will simply be the greatest common divisor, while x ∨ y will be
the smallest common multiple of x and y.
Example 1.3.7. In the last example, when we considered the POSET given by figure 1.6,
the supremum 2 ∨ 5 did not exists simply because there were no elements greater than both
2 and 5. This, however, is not the only thing that can “go wrong”. Indeed, consider the
four-element POSET given below. In this example there are elements that are greater than
both x and y. Infact, there are two of them. However, of these two, there is no smallest:
simply they cannot be compared. So x and y does not have a supremum.
1.3. PARTIALLY ORDERED SETS AND LATTICES 11
Figure 1.7: Example for the non-existence of a supremum
Example 1.3.8. Let H be a set and consider its power set P(H) as a POSET with the
partial ordering given by the inclusion of subsets; that is, for A, B ∈ P(H) we have that
A ≤ B ⇔ A ⊂ B. It is easy to see then that A ∧ B and A ∨ B, as elements of P(H) (i.e.
as subsets of H) exist for all A, B ∈ P(H). Indeed, A ∩ B is contained in both A and B,
and
(X ⊂ A, and X ⊂ B) ⇒ X ⊂ (A ∩ B).
That is, the infimum is simply given by the intersection of sets: A∧B ≡ inf{A, B} = A∩B.
Similarly, one has that A ∨ B ≡ sup{A, B} = A ∪ B.
For a comparable pair of elements x ≤ y it is easy to show that both x ∧ y and x ∨ y exist
and infact in this case x ∧ y = x and x ∨ y = y. However, as we have seen, in general x ∧ y
and x ∨ y do not always exists. Nevertheless, both in classical and quantum physics, the
operations “and” and “or” — as it turns out — can be introduced. This means that we
shall deal with POSETs having a certain (additional) property.
Definition 1.3.9. A POSET S such as for any a and b in S there exist both a ∧ b and
a ∨ b is called a lattice.
It is easy to see that in a lattice both ∧ and ∨ are commutative (infact in the definition of
a ∧ b and a ∨ b the two elements a and b play exactly the same role). One can also easily
prove that both ∧ and ∨ are associative in any lattice, and hence expressions like
a∧b∧c
(without any parethesis) make sense; see exercise 1.3.

Let us move on now to the question of distributivity. Consider the POSET given below
in a graphycal manner. It is easy to check that this POSET is a lattice. Moreover, on one
hand we have
x ∧ (y ∨ z) = x ∧ a = x,
Figure 1.8: Example for the non-distributivity of ∧ over ∨
on the other hand

(x ∧ y) ∨ (x ∧ z) = b ∨ b = b.
That is,
x ∧ (y ∨ z) 6= (x ∧ y) ∨ (x ∧ z);
so ∧ is not distributive over ∨.
Definition 1.3.10. A lattice in which
• ∧ is distributive over ∨, and
• ∨ is distributive over ∧,
is called a distributive lattice.
In the classical description — at least, in the finite case1 — it is the power set of a set
which plays the role of the lattice of events. Now P (X) for any set X is a distributive
lattice. Indeed, in this case, as was seen, ∧ is simply the set intersection ∩ and ∨ is just
the set union ∪ and as is well known, these set theoretical operations are distributive over
each other.
As was motivated by the double slit experiment, in quantum physics we need to model
the structure of events by a non-distributive lattice. Hence instead of the power set, we
will need something else.
1
When dealing with a system having infinitely many different events, instead of the power set, for
certain mathematical reasons we might be forced to use only a sub-lattice of the power set. E.g. instead
of casting a die, consider picking a random real number between 0 and 1. In this case, an event is of the
form “the chosen number is in A”, where A ⊂ [0, 1]. When talking about probabilities, if we want this
number to picked in a “uniform” way, then to the above event we want to associate the probability λ(A)
where λ(A) is the Lebesgue-measure of A. But not every subset of [0, 1] is measurable, so we cannot allow
every subset of [0, 1] to stand for an event! Nevertheless, the union and intersection of two measurable
sets are measurable and so ∧ and ∨ in the lattice of measurable sets still correspond to the operations of
intersection and union. So the point stressed in the text — namely that in classical systems ∧ and ∨ are
distributive over each other — is still valid.
1.4. NOT-FUNCTIONS AND ORTHO-LATTICES 13
1.4 Not-functions and ortho-lattices

We still need several more concepts regarding events before we can talk about probabilities.
Most importantly, apart from the binary operations “and” and “or” we will also need to
take the “not” of an event.
Definition 1.4.1. A greatest element — i.e. an element that is bigger than any other
element — in a POSET S is denoted by 1. If 1 and 10 are both greatest elements of the
same POSET, then both 1 ≤ 10 and 10 ≤ 1; thus by anti-symmetry, the greatest element,
if exists, is unique. A smallest element of a POSET is denoted by O. Again, if exists, it
is unique.
Definition 1.4.2. Let L be a lattice in which there exists a smallest element O ∈ L. A
function ¬ : L → L satisfying
• ¬(¬a) = a,
• a ≤ b ⇒ ¬b ≤ ¬a,
• a ∧ ¬a = O,
is called a not-function on L.
Proposition 1.4.3. Let L be a lattice and ¬ a not-function on L. Then
• ¬ : L → L is a bijection,
• ∃1 ∈ L,
• ¬O = 1 and ¬1 = O,
• a ∨ ¬a = 1 for all a ∈ L.
Proof. If ¬a = ¬b then a = ¬(¬a) = ¬(¬b) = b implying that ¬ is injective. Moreover,
for every a ∈ L there exists a b ∈ L such that a = ¬b; namely b = ¬a. Thus ¬ is also
surjective which together with injectivity mean that ¬ is a bijection.
By definition, O ≤ a for every a ∈ L. Hence ¬a ≤ ¬O for every a ∈ L. As ¬
is a bijection, this is equivalent to saying that ¬O is greater than any other element;
that is, there is a greatest element 1 and ¬O = 1. Then of course we also have that
¬1 = ¬(¬O) = O.
Finally, for any a ∈ L, by definition of the supremum we have that a ∨ ¬a is bigger
or equal than both a and ¬a. Hence ¬(a ∨ ¬a) is smaller or equal than both ¬a and
¬(¬a) = a, so by definition it is also smaller or equal than their infimum:
¬(a ∨ ¬a) ≤ (a ∧ ¬a) = O.
But as O is the smallest element, the above equality actually means that ¬(a ∧ ¬a) = O
and hence that
a ∨ ¬a = ¬(¬(a ∨ ¬a)) = ¬(¬O) = 1,
which was exactly the last claim in our proposition.
Figure 1.9: Example of an ortho-lattice admitting several not-functions
Example 1.4.4. Let us consider the power set P(H) of a set H. If ¬ is a not-function on
P(H) and A ⊂ H then on one hand we must have that A ∧ ¬A = O, i.e. that A ∩ ¬A = ∅.
On the other hand, by what was just proved we must also have that A ∨ ¬A = 1, i.e. that
A ∪ ¬A is the full set H. Summing it up: the only possibility is that ¬A is the complement
of A:
¬A = Ac ≡ H \ A.
It is easy to see that the complement indeed satisfies the requirements of a not-function.
So on P(H) there exists a not-function, and this not-function is actually unique.
What we have seen in the above example is by far not the general situation. For example,
it may be that a lattice does not admit any not-function.
Indeed, consider that if a lattice L has more than just one element, then O cannot be
equal to 1 and in general ¬a 6= a since the latter equality would imply
O = a ∧ ¬a = a ∧ a = a.
Thus, as ¬ is a bijection, unless L contains a single element only, L cannot have an odd
number of elements. So a lattice with 3, 5, 7, . . . elements cannot have a not-function.
However, examples show that an even number of elements is still not a guarantee for
the existence of a not-function. On the other hand, it may also happen that a lattice
admits several (different) not-functions.
Example 1.4.5. consider the POSET given by the figure below. It is easy to check that
this POSET is a lattice and that for any permutation σ of the set {a, b, c, d} such that
• σ has no fixed points: σ(x) 6= x,
• σ is idempotent: σ 2 ≡ σ ◦ σ = σ,
the function
 1, if x = O,

¬x := σ(x), if x ∈ {a, b, c, d},


O, if x = 1,
is a not-function.
Definition 1.4.6. A lattice with a given not-function is called an ortho-lattice.
We shall need to discuss two more concepts regarding the structure of events. First, we
need to introduce the concept of exclusive events; i.e. events that mutually exclude each
other.
By our classical experience, we might be tempted to define exclusivity by the equation
a ∧ b = O. However, in quantum physics sometimes for 2 events it is impossible to perform
a measurement that would decide for both events in a simultaneous manner whether they
have happened or not. (Remember that this was exactly the reason why the operations ∧
and ∨ were not given a primary meaning but instead were introduced via the ordering as
infimum and supremum.)
Definition 1.4.7. Let L be an ortho-lattice. Two elements a, b ∈ L such that a ≤ ¬b are

said to be exclusive. Note that it is a mutual concept: if a ≤ ¬b then by negating both
side we get that b ≤ ¬a. (So a excludes b if and only if b excludes a). Note also that the
pairs (O, a) and (a, ¬a) are always exclusive.
This definition makes perfect sense in real life, too. Indeed, consider A:=“it is 12 o’clock
midnight” and B:=“the Sun is shining”, as statements regarding Budapest. Now A ex-
cludes B because “not B”, that is, “the Sun is not shining”, is a more general statement
than A; for example, apart from A, the Sun might not be shining because it could be
simply cloudy, etc.
How does this definition relates to our first idea that relied on the operation ∧? In
the classical case, if our lattice is a power set P(X), then A ⊂ B c if and only if A and B
are disjoint, that is, when A ∩ B = ∅. So in this case defining exclusivity by the property
a ≤ ¬b or by the property a ∧ b = O lead to the same concept. However, as we shall see,
a ≤ ¬b is a stronger requirement than a ∧ b = O.
Proposition 1.4.8. Let L be an ortho-lattice and a, b ∈ L. If a and b exclude each other

then a ∧ b = O. However, in general, the converse implication is false: a ∧ b = O does not
imply that a and b exclude each other.
Proof. We always have that a ∧ b ≤ a, b. However, if in addition a ≤ ¬b, then
a ∧ b ≤ a ≤ ¬b
so a ∧ b is smaller or equal than both ¬b and b. Hence by the definition of infimum
a ∧ b ≤ b ∧ ¬b = O.
But O is the smallest element, so the above inequality implies that actually a ∧ b = O. On
the other hand, the fact that the converse implication is false, can be seen by examples.
It is left for the reader to check that in the ortho-lattice of example 1.4.5 the converse
implication is false.
We shall finish this section by discussing the concept of compatibility. Suppose we have
2 events A and B such that there exists a simultaneous way to check whether they have
happened or not. Then upon such a check we will find that either
• both of them have happened, or
• A has happened, but B has not, or
• B has happened, but A has not, or
• none of them have happened.
This means that
(A and B) or (A and not B) or (not A and B) or (not A and not B)
is always true.
Definition 1.4.9. Let L be an ortho-lattice. If for two elements a, b ∈ L
(a ∧ b) ∨ (a ∧ ¬b) ∨ (¬a ∧ b) ∨ (¬a ∧ ¬b) 6= 1,
then we say that a and b are incompatible.
In case of incompatibility of two events, there can be no simultaneous measurement check-
ing both of them. It is not that only we cannot find such a simultaneous measurement:
there is a theoretical obstruction. Now of course it may happen even in case of a classical
physical system that we have some practical limitations regarding simultaneous measure-
ments. However, what can we say at the theoretical level?
Proposition 1.4.10. In a distributive ortho-lattice there are no pairs of incompatible ele-
ments.
Proof. Straightforward computation relying on distributvity shows that
(a ∧ b) ∨ (a ∧ ¬b) ∨ (¬a ∧ b) ∨ (¬a ∧ ¬b) =
(a ∧ (b ∨ ¬b)) ∨ (¬a ∧ (b ∨ ¬b)) =
(a ∧ 1) ∨ (¬a ∧ 1)
a ∨ ¬a = 1.
However, if the lattice is not distributive, then we may have pairs that are incompatible,
which is exactly what happens in quantum physics. A reader who would like to find an
example for an ortho-lattice having incompatible pairs may look at the ortho-lattice of
example 1.4.5.
Exercises
E 1.1. Decide if the POSET given below in a graphycal manner is a lattice or not. If yes,
decide if it is a distributive one.
E 1.2. Let S be a POSET. Prove that there always exists a set X and an injective function
f : S → P(X) preserving the order relation;
x ≤ y ⇒ f (x) ⊂ f (y).
In other words, prove that every POSET can be embedded into the POSET given by the
power set of a set. Why this is not in contradiction with the fact that P(X), as a lattice,
is distributive, while in S distributivity may not hold?
E 1.3. Let L be a lattice. Prove that both inf(H) and sup(H) exist for any finite set of
elements H ⊂ L, |H| < ∞. (Does the same need to hold if H contains infinitely many
elements?) Use your proof to further conclude that in a lattice both ∧ and ∨ are associative
operations.
E 1.4. Let S be a POSET with finite many elements such that
• S has both a biggest and a smallest element,
• there exists a graphical representation of S on the plane with non-crossing edges.

Prove that S is a lattice.
E 1.5. Consider N as a POSET with partial ordering given by the divisibility relation. As
was discussed in this chapter, it is a lattice. Is it however also a distributive one? Give a
counter-example or prove that it is a distributive lattice.
E 1.6. Find all possible not-functions on the lattice given below in a graphical manner.
E 1.7. Let L be a distributive lattice. Prove that there exists at most one not function
¬ : L → L making L an ortho-lattice.
E 1.8. Let L be an ortho-lattice. Prove that for every pair of elements a, b ∈ L we have
¬(a ∧ b) = ¬a ∨ ¬b,
¬(a ∨ b) = ¬a ∧ ¬b;
i.e. that DeMorgan’s identities hold in every ortho-lattice.

Chapter 2
Probabilities: uncertanity versus lack

of information
Topics:
• probability functions,
• “lack of information” and convex combination of probability functions,
• pure and dispersion free probability functions,
• “intrinsic” uncertanity and distributivity.
19
20 CHAPTER 2. PROBABILITIES
2.1 Probability functions

After discussing the structure of events, we now want to talk about probabilities. In our
mathematical description probability should be a function assigning to each event a number
between 0 and 1. The following definition summarizes our natural requirements regarding
probabilities.
Definition 2.1.1. Let L be an ortho-lattice. A function p : L → [0, 1] satisfying
• p(1) = 1,
• p(a ∨ b) = p(a) + p(b) for every pair of exclusive elements a, b ∈ L, a ≤ ¬b,
is called a probability function.
One might feel that some further properties should be also included in the definition; for
example that p(a) ≤ p(b) whenever a ≤ b. However, as we shall see, we do not need to
require these properties, as they already follow from our definition.
Proposition 2.1.2. Let L be an ortho-lattice and p a probability function on L. Then
(i) p(O) = 0,
(ii) p(¬a) = 1 − p(a).
(iii) if a ≤ b then p(a) ≤ p(b),
Proof. We shall prove each point separetly.
(i) Since O and O are always exclusive, we have
p(O) = p(O ∨ O) = p(O) + p(O) = 2p(O)
implying that p(O) = 0.
(ii) Since a and ¬a are always exclusive and a ∨ ¬a = 1, we have that
1 = p(1) = p(a ∨ ¬a) = p(a) + p(¬a)
implying that p(a) = 1 − p(¬a).
(iii) If a ≤ b then a and ¬b are exclusive so
p(a ∨ ¬b) = p(a) + p(¬b) = p(a) + (1 − p(b)) = p(a) − p(b) + 1
where we have also used (ii) which we have already proved. Now on the left-hand side
we have a probability which must be smaller or equal than 1. Thus 1 ≥ p(a)−p(b)+1
implying that p(a) ≤ p(b).
2.1. PROBABILITY FUNCTIONS 21
Figure 2.1: Example of an ortho-lattice
Suppose we are given an ortho-lattice L. How can we find all possible probability functions
on L? We know that to 1, O we need to assign the values 1 and 0, respectively. So let
p : L → [0, 1] be a candidate for a probability function. We already have that p(1) = 1
and p(O) = 0. How to proceed?
Following definiton, we will need to list all pairs of elements in L that are exclusive.
Each such pair, by the additivity requirement, gives a restriction on p. Of course (, a) is
an exclusive pair for all a ∈ L. However, such pairs do not give further restrictions, since
O ∨ a = a and so the equation
p(O ∨ a) = p(O) + p(a)
is automatically satisfied as p(O) was already set to be zero. So infact we only need to list
(and take account of the resulting equations) of the nontrivial exclusive pairs.
Example 2.1.3. It is quite easy to check that the following diagram indeed defines an
ortho-lattice. We have the following 3 nontrivial exclusive pairs: (a, ¬a), (b, ¬b) and (b, ¬a).
The resulting equations are:
p(a) + p(¬a) = p(a ∨ ¬a) = p(1) = 1,

p(b) + p(¬b) = p(b ∨ ¬b) = p(1) = 1,
p(b) + p(¬a) = p(b ∨ ¬a) = p(1) = 1.
The first two is equivalent to saying that p(¬x) = 1 − p(x) for x = a, b; something which
we have already noted in our previous proposition. The last one, on the other hand,
together with the second one, tells that p(¬a) = p(¬b) and hence that p(a) = p(b). We
have no further restrictions, so probability functions in this lattice are in a one-to-one

correspondence with values t ∈ [0, 1] via the formula
x = 1,


 1 if
t if x = a, b,

p(x) =
 1−t if x = ¬a, ¬b,
x = O.

0 if

Remark. In the above example b is more restrictive than a, yet for every probability
function p we have p(a) = p(b). That is, in words, we have 2 events such that though one
of them is a more restrictive, they always happen with the same frequency. Since when
the stricter event happens, so does the more general one (as this is exactly the meaning
of the relation ≤), the equality of the frequencies mean that they actually always happen
together. Thus, though in our lattice a and b stand for two different elements, on physical
grounds we would identify them and consider them to stand for the same event. (Similarly
we should identify ¬a and ¬b.) That is, for modelling a system, instead of this lattice,
we would use another one having only 4 elements. In other words, for an ortho-lattice to
“make sense” as the lattice of events of a system, one may require to have “enough many”
probability functions.
A precise concept of “enough many” could be the following: for every pair of elements
a, b there is a probability function p such that p(a) 6= p(b). When such a condition is
satisfied, in mathematics we would say that the probablity functions separate the elements
of the ortho-lattice in question.
2.2 The convex structure of probability functions

Consider casting a die. As was discussed, in this case the ortho-lattice of events is simply
the power set P({1, 2, 3, 4, 5, 6}), where the operations ∧ and ∨ are the set intersection
and set union, respectively, and the not-function is the complement. It is easy to check
that both
1
p(A) := |A|,
6
(where |A| stands for the number of elements in A) and

1 if 6∈A
q(A) :=
0 otherwise,
define probability functions. The probability function p describes a fair die, whereas q
stands for a die which always ends up with the number 6 on its upside face. Now suppose
we have a hat with 3 dice in it, of which 2 are fair and one is biased giving always a 6.
Someone draws a die from the hat and casts it. What is the probability that the
outcome will be an even number, i.e. that the outcome will fall into the set A := {2, 4, 6}?
If the die drawn was a fair one, then it would give an even outcome with probability
2.2. THE CONVEX STRUCTURE OF PROBABILITY FUNCTIONS 23
p(A) = 1/2. If the die drawn was the biased one, then the outcome would be even with a
probability of q(A) = 1. By high school mathematics, we have a 2/3 chance that the die in
question is a fair one, and a 1/3 chance that it is the biased one and thus the probability
of an even outcome is
2 1 21 1 2
p(A) + q(A) = + 1= .
3 3 32 3 3
And in general, the probability of the outcome being in an arbitrary set X is
m(X) := 2/3p(X) + 1/3q(X).
Thus the probability function which we would use in this case is some kind of mixture —
or perhaps better to say: weighted combination — of the probability functions p and q.
Proposition 2.2.1. Let L be an ortho-lattice and suppose that p and q are probability
functions on L and t ∈ [0, 1]. Then the convex combination
m := tp + (1 − t)q
is again a probability function on L.
Proof. It is rather straightforward to check that m satisfies the defining properties of

probability functions.
In a vector or (affine) space the set of points that can be obtained as convex combinations
of two fixed points p, q is the line segment [p, q] between the two points. A set of points
C such that for every p, q ∈ C the line segment [p, q] is contained in C is called a convex
set. So from a geometrical point of view, probability functions of an ortho-lattice form a
convex set. In what follows we shall be interested by the convex structure of this convex
set: how certain probability functions can be written as convex combination of others.
Of course, every probability function p can be “decomposed” for example into the
convex combination (1/3)p + (2/3)p. However, this is a trivial decomposition. What we
are really interested by are the nontrivial decompositions.
Definition 2.2.2. A convex combination tp + (1 − t)q where p = q or where t ∈ {0, 1} is

called a trivial combination. The combination is nontrivial when t is strictly between 0
and 1, and p 6= q.
Definition 2.2.3. A probability function that can be written as a nontrivial convex com-
bination of probability functions is said to be mixed. If it cannot be written as a nontrivial
combination of probability functions it is said to be pure or extremal.
In the example concerning the 3 dice in the hat, the probability function m was a convex
combination of the probability functions p and q. Why did we need to use such a convex
combination? In some sense the answer is that because we had a lack of information: we
did not know that the die drawn was a fair one or a biased one. The weights (or coefficients)
reflected our knowledge. Since all we knew was that the die was drawn from a collection
of 3 in which 2 were fair, we gave a 2/3 weight to the probability function p and a 1/3
weight to the probability function q.
Now what if a certain probability function cannot be written as a nontrivial convex
combination of probability functions? As was discussed, the use of a nontrivial convex
combination may be due to a lack of information. However, if a probability function
cannot be written as a nontrivial convex combination of probability functions, then the
probability values given by this function cannot be “explained” by a lack of information.
Suppose that this probability function gives a probability for a certain event x which is
neither 0 nor 1 — say it is 1/2. Since we cannot interpret this as a probability reflecting our
knowledge of the system, in this case we would be forced to accept that there are “intrinsic
uncertanities” in the system. It is not that only we do not know, whether x will happen or
not — nature itself is “hesitating”. So if this situation occurs we must regard our system
to be non-deterministic. In the next section we shall investigate how determinism depends
on the choice of ortho-lattice modelling the structure of events.
2.3 Distributivity and determinism

Our framework of probabilities may well be used in a completely deterministic situation,
too. We can still talk about probabilities, only that being deterministic, an event will
either happen with certanity (i.e. with probability equal to 1), or will surely not happen
(i.e. will happen with a probability equal to 0).
Definition 2.3.1. A probability function p that takes values only in the two-element set
{1, 0} is said to be dispersion free. Such a probability function describes a situation
when the outcomes are known with certanity.
Proposition 2.3.2. A dispersion free probability function is always pure/extremal.
Proof. Let p, q be probability functions on the same ortho-lattice L and t ∈ [0, 1], and
assume that the convex combination m := tp + (1 − t)q is dispersion free. Our aim is to
show that this convex combination must be a trivial one.
If t = 0 or t = 1, we are finished. So assume 0 < t < 1, and consider an element x ∈ L.
If p(x) 6= q(x), say p(x) < q(x); then
p(x) = tp(x) + (1 − t)p(x) < m(x) < tq(x) + (1 − t)q(x) = q(x),
with strict relations (i.e. with < and not ≤) since both t and 1 − t are strictly positive.
Similarly, if p(x) > q(x) we get that p(x) > m(x) > q(x). However, m was assumed
to be dispersion free, so m(x) is either zero or one, and both p(x) and q(x) are in the
interval [0, 1]. Thus m(x) cannot be strictly between p(x) and q(x) and hence p(x) cannot
be neither strictly smaller, nor strictly bigger than q(x): it must be equal to it. As x was
an arbitrary element, this means that p = q and hence the combination tp + (1 − t)q is a
trivial one.
2.3. DISTRIBUTIVITY AND DETERMINISM 25
The above proposition, in some sense, is a triviality. It simply tells that if we know the
exact outcome (that is, if our probability function is dispersion free), then we do not lack
any information.
Now how about the other way around? Is it true, that if we know everything which is
possible to know, then we can work out the outcome? So is it true, that a pure probability
function is automatically dispersion free? In our classical world, this is indeed so.
Proposition 2.3.3. On a distributive ortho-lattice a probability function is dispersion free

if and only if it is pure/extremal.
Proof. By the last proposition, one of the implications always holds. What we have to
show now — with the extra assumption of distributivity — is the other direction, which
we shall do by contradiction.
So suppose that L is a distributive ortho-lattice, p is a pure probability function on L
and yet there exists an x ∈ L such that p(x) 6= 0, 1. Then both p(x) and 1 − p(x) are
positive; in particular we can devide by them and so the following formulas are indeed
well-defined:
1
px (y) := p(y ∧ x)
1 − r(a)
1
p¬x (y) := p(y ∧ ¬x)
1 − p(x)
With a careful use of distributivity, one can prove (see exercise 2.1) that both px and p¬x
are probability functions on L. On the other hand, y and y ∧ ¬x are always exlusive.
Indeed,
(y ∧ ¬x) ≤ ¬x ≤ (¬y ∨ ¬x) = ¬(y ∧ x)
where we have also used DeMorgan’s identity (see exercise 1.8). Then using the additivity
of probability functions and the assumed distributivity of our lattice we have that
p(y ∧ x) + p(y ∧ ¬x) = p((y ∧ x) ∨ (y ∧ ¬x)) = p(y ∧ (x ∨ ¬x))

= p(y ∨ 1) = p(y)
for all y ∈ L. It follows that p can be written as the convex combination
p = tpx + (1 − t)p¬x
where t = p(x). However, by assumption 0 < p(x) < 1 and moreover px 6= p¬x since in
particular px (x) = p(x ∧ x)/p(x) = p(x)/p(x) = 1 whereas
1 1
p¬x (x) = p(x ∧ ¬x) = p(O) = 0.
1 − p(x) 1 − p(x)
So the convex combination in question cannot be trivial, in contradiction with the assump-
tion of purity of p.
The above proposition confirms that in a classical system, unless we have a lack of infor-
mation, outcomes can be predected with certanity. However, in quantum physics, as was
mentioned, the lattice of events is not distributve. Infact, in quantum physics even if we
have no lack of information, we cannot predict everything with certanity.
In this chapter we shall not discuss the exact lattice which is used in quantum physics,
but we note that actually in that lattice there are no dispersion free probability functions
at all. That is, there is always some uncertanity in quantum system.
Though as was said, we do not discuss the exact lattice used in quantum physics, the
reader might be interested to see an actual example for an ortho-lattice where the purity
of a probability function does not imply that it is dispersion free. For such an example we
refer to exercise 2.4.
Exercises
E 2.1. Let L be a distributive ortho-lattice and p a probability function on L. Prove that
if x ∈ L is such that p(x) 6= 0, 1 then the functions defined by the formulas
1 1
px (z) := p(y ∧ x), and p¬x (z) := p(y ∧ ¬x)
p(x) 1 − p(x)
are both probability functions. How are px and p¬x called in classical probability theory?
2.3. DISTRIBUTIVITY AND DETERMINISM 27
E 2.2. Let X be a finite set and consider its power set P(X) as an ortho-lattice. Prove
P on P(X) are in a one-to-one correspondence with functions
that probability functions
f : X → [0, 1] satisfying x∈X f (x) = 1 via the formula
X
pf (A) = f (x).
x∈A
E 2.3. Consider the POSET given in a graphical manner below. Verify that it is a lattice
and that it is infact an ortho-lattice with the following not-function:
• ¬1 = O and ¬O = 1,
• for an element x on the “second level”, ¬x is the element sitting right above x at the
“third level”,
• for an element x on the “third level”, ¬x is the element sitting right below x at the
“second level”.
Find all probability functions on this ortho-lattice and decide which are pure and which
are dispersion free.
E 2.4. Consider the POSET given in a graphical manner below. Verify that it is a lattice
and that it is infact an ortho-lattice with the following not-function:
• ¬1 = O and ¬O = 1,
• for an element x on the “second level”, ¬x is the element sitting right above x at the
“third level”,
• for an element x on the “third level”, ¬x is the element sitting right below x at the
“second level”.
Find all probability functions on this ortho-lattice and decide which are pure and which
are dispersion free.
E 2.5. Let L be an ortho-lattice with a finite number of elements: n := |L| < ∞. Give an
upper limit for the number of pure and an upper limit for the number of dispersion free
probability functions on L.

Quantum Notes: Life Beyond Classical

Uploaded by

Copyright:

Available Formats

Quantum Notes: Life Beyond Classical

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantum Notes: Life Beyond Classical

Uploaded by

Copyright:

Available Formats

QUANTUM NOTES:

life beyond classical probability theory

Mihály Weiner, Budapest 2014.

1 The structure of events 3

The structure of events

• motivations for studying the non-distributive case; the double-slit experiment,

• partial ordering and POSETs;

• “and” (∧) and “or” (∨) in a POSET,

• “not”-functios and ortho-lattices,

• compatible and exclusive events.

1.1 The structure of events of a classical system

A: “the outcome is even”,

B: “the outcome is a prime number”,

C: “the outcome is greater than 3”,

D: “the outcome is exactly 5”.

1.2 The need for a more general probability theory

Figure 1.1: The double-slit experiment

Figure 1.2: The double-slit experiment with slit A blocked

Figure 1.3: The double-slit experiment with slit B blocked

• some would pass at slit A, some would pass at slit B,

A: “the photon passes at slit A”,

B: “the photon passes at slit B”,

C: “the photon hits the screen at C”.

Figure 1.4: Double-slit experiment: what we would expect by “classical” reasoning

Figure 1.5: Double-slit experiment: the real-life distribution

(passes at A and hits at C) or (passes at B and and hits at C)

(passes at A or B) and hits at C

1.3 Partially ordered sets and lattices

• X≤X for every event X,

Definition 1.3.1. A relation ≤ on a set S satisfying for all a, b, c ∈ S

• anti-symmetry: if a ≤ b and b ≤ a, then a = b,

• transitivity: if a ≤ b and b ≤ c, then a ≤ c,

Figure 1.6: Graphical representation of {1,2,3,4,5,6} with the divisibility relation

Graphical representation of POSETs. Sometimes it is helpful to “visualize” a POSET. A

Example 1.3.3. A graphical representation of the set H := {1, 2, 3, 4, 5, 6} considered as

• “A and B” is more restrictive than both A and B; in notations (A∧B)≤A,B

• both A and B are more restrictive than “A or B”; in notations A,B≤(A∨B),

• if C≤A and C≤B, then C≤(A∧B),

• if A≤C and B≤C, then (A∨B)≤C.

Definition 1.3.4. Let S be a POSET with partial ordering ≤. By the supremum of a

Similarly to the supremum, inf(H), if exists, is unique.

Figure 1.7: Example for the non-existence of a supremum

(without any parethesis) make sense; see exercise 1.3.

Figure 1.8: Example for the non-distributivity of ∧ over ∨

on the other hand

1.4 Not-functions and ortho-lattices

Figure 1.9: Example of an ortho-lattice admitting several not-functions

• σ has no fixed points: σ(x) 6= x,

¬x := σ(x), if x ∈ {a, b, c, d},

Definition 1.4.6. A lattice with a given not-function is called an ortho-lattice.

Definition 1.4.7. Let L be an ortho-lattice. Two elements a, b ∈ L such that a ≤ ¬b are

Proposition 1.4.8. Let L be an ortho-lattice and a, b ∈ L. If a and b exclude each other

Proof. We always have that a ∧ b ≤ a, b. However, if in addition a ≤ ¬b, then

so a ∧ b is smaller or equal than both ¬b and b. Hence by the definition of infimum

• there exists a graphical representation of S on the plane with non-crossing edges.

i.e. that DeMorgan’s identities hold in every ortho-lattice.

Probabilities: uncertanity versus lack

• “lack of information” and convex combination of probability functions,

• pure and dispersion free probability functions,