Pro Con List Linear Regression
Pro Con List Linear Regression
Pro Con List Linear Regression
https://www.chrisstucchio.com/blog/2014/equal...
Tweet
1 of 12
https://www.chrisstucchio.com/blog/2014/equal...
Get Notifications
In order to make such a choice, I wish to construct a ranking function - a function which
takes as input the characteristics of a woman and returns as output a single number.
This ranking function is meant to approximate my utility function
(http://en.wikipedia.org/wiki/Utility) - a higher number means that by making this
choice I will be happier. If the ranking closely approximates utility, then I can use the
ranking function as an effective decisionmaking tool.
In concrete terms, I want to build a function f : Women which approximately
predicts my happiness. If f (Svetlana) > f (Elise) I will choose Svetlana, and vice versa
if the reverse inequality holds.
One of the simplest procedures for building a ranking function dates back to 1772, and
was described by Benjamin Franklin (http://www.procon.org/view.backgroundresource.php?resourceID=1474):
...my Way is, to divide half a Sheet of Paper by a Line into two Columns,
writing over the one Pro, and over the other Con. Then...I put down under
the different Heads short Hints of the different Motives...I find at length
where the Ballance lies...I come to a Determination accordingly.
The mathematical name for this technique is unit weighted regression
(http://en.wikipedia.org/wiki/Unit-weighted_regression), and the more commonplace
2 of 12
https://www.chrisstucchio.com/blog/2014/equal...
Get Notifications
I present the method in a slightly different format - in each column a different choice is
listed. Each row represents a characteristic, all of which are pros. A con is transformed
into a pro by negation - rather than treating "Fat" as a con, I treat "Not Fat" as a pro. If
one of the choices possesses the characteristic under discussion, a +1 is assigned to the
relevant row/column, otherwise 0 is assigned:
https://www.chrisstucchio.com/blog/2014/equal...
probably asking why I would ever use unit-weighted regression, as opposed to one of
these techniques? Why not use linear regression, rather than forcing all the coefficients
to be +1?
Elise = [0, 1, 1, 0, 1, 1, 0]
Then the predictor function uses a set of weights which can take on values other than
+1:
f (x) =
hi xi
Get Notifications
I'll be concrete, and consider the case of linear regression in particular. Linear regression
is a lot like a pro/con list, except that the weight of each feature is allowed to vary. In
mathematical terms, we represent each possible choice as a binary vector - for
example:
The individual weights fi represent how important each variable is. For example, "Smart"
might receive a weight of +3.3, "Not fat" a weight of +3.1 and "Black" a weight of +0.9.
The weights can be determined with a reasonable degree of accuracy by taking past
data and choosing the weights which minimize the difference between the "true" value
and the approximate value - this is what least squares (http://en.wikipedia.org
/wiki/Least_squares) does.
The difficulty with using a fancier learning tool is that it only works when you have
sufficient data. To robustly fit a linear model, you'll need tens to hundreds of data points
per feature. If you have too few data points, you run into a real danger of overfitting building a model which accurately memorizes the past, but fails to predict the future.
You can even run into this problem if you have lots of data points, but those data points
don't represent all the features in question.
It also requires more programming sophistication to build, and more mathematical
sophistication to recognize when you are running into trouble.
For the rest of this post I'll be comparing a Pro/Con list to Linear Regression
(http://en.wikipedia.org/wiki/Linear_regression), since this will make the theoretical
comparison tractable and keep the explanation simple. Let me emphasize that I'm not
pushing a pro/con list as a solution to all the ranking problems - I'm just pushing it as a
nice simple starting point.
h.
https://www.chrisstucchio.com/blog/2014/equal...
An error is made whenever the pro/con list and linear regression rank two vectors
differently - i.e., linear regression says "choose Elise" while the pro/con list says "choose
Svetlana". The error rate of the pro/con list is the probability of making an error given
two random feature vectors xand y, i.e.:
Get Notifications
u (x y)])
< 0)
error rate(h) = P(sign([h (x y)][
1/4. There are of course vectors hfor which the error rate is higher, and others for which
it is lower. But on average, the error rate is bounded by 1/4.
In this sense, the pro/con list is 75% as good as linear regression.
We can confirm this fact by computer simulation - generating a random ensemble of
vectors h, and then measuring how accurately unit-weighted regression agrees with it.
The result:
5 of 12
|hi | = 1.0
https://www.chrisstucchio.com/blog/2014/equal...
and
i, hi 0
Get Notifications
Mathematical results
I don't know how to prove how closely a Pro/Con list approximates linear regression for
binary feature vectors. However, if we assume that the feature vectors xand yare
normally distributed (http://en.wikipedia.org/wiki/Normal_distribution) instead, I can
prove the following theorem:
Theorem: Suppose his drawn from a uniform Dirichlet distribution and x, y have
components which are independent identical normally distributed variables. Then:
E[error rate(h)]
arctan(
(N
1)/(N + 1))
1
<
This means that averaged over all vectors h, the error rate is bounded by 1/4. There are
of course individual vectors
is 1/4.
hwith a higher or lower error rate, but the typical error rate
Unfortunately I don't know how to prove this is true for Bernoulli (binary) vectors
Any suggestions would be appreciated.
x, y.
If we run a Monte Carlo simulation, we can see that this theorem appears roughly
correct:
6 of 12
https://www.chrisstucchio.com/blog/2014/equal...
Get Notifications
example, h = [0.9, 0.05, 0.05] - a bit of an extreme case, but reasonable. In this
example, what sorts of vectors x, y will result in unit-weighted regression disagreeing
with the true ranking? Here is one example:
x = [1, 0, 0]
y = [0, 1, 1]
In this case,
d = x y, and note that di N(0, 1). This is an unrealistic assumption, but one which
is mathematically tractable. We want to compute the probability:
7 of 12
https://www.chrisstucchio.com/blog/2014/equal...
Get Notifications
Define
u d N(0, N 1 )
where
p d N (0,
p
i)
2
Note: Obtaining this statistical independence is why we needed to assume the feature
vectors were normal - showing statistical independence in the case of binary vectors is
harder. A potentially easier test case than binary vectors might be random vectors
chosen uniformly from the unit ball in l , aka vectors for which maxi |xi| < 1 .
We've now reduced the problem to simple calculus. Define
Let
v = u d and w = p d . Then:
v
Here 0
8 of 12
u2 = N 1
and
p2 = pi .
v2
w2
C exp
+
dwdv
( 2u2
2p2 )
/2
v2
w2
0 /2
r 2
C exp
+
dwdv
=
C
e
rdrd
=
( 2u2
0 0
2
2p2 )
= arccot(u /p ), so:
Tuesday 18 October 2016 03:46 PM
https://www.chrisstucchio.com/blog/2014/equal...
/2 arccot(p /u )
arctan(p /u )
arctan(N
p )
=
=
2
2
2
So
Get Notifications
N1
|p|2 = |h u|2 =
= p2
N
p = (N
1)/N while u = 1/N
N
1) as N . This
, and arccot(1/
implies:
arctan(
N
1)
1
2
4
This means that in the worst case, unit-weighted regression is no better than chance.
2
=
p = |h u|2 :
arctan(N
|h u|2 )
2
dh
arctan(N
1)/(N(N + 1)))
(N
2
arctan(
(N
1)/(N + 1))
2
https://www.chrisstucchio.com/blog/2014/equal...
z arctan(N
z) is a concave function.
For large N this quantity approaches arctan(1)/2 = (/4)/(2) = 1/8.
/wiki/Jensen's_inequality) since
Note: I believe that the reason the Bernoulli feature vectors appear to have lower error
than the Gaussian feature vectors for small N appears to be caused by the fact that for
small N, there is a significant possibility that a feature vector might be 0 in the relevant
components. The net result of this is that h (x y) = 0 fairly often, meaning that
many vectors have equal rank. This effect becomes improbable as more features are
introduced.
Get Notifications
Thus, we have shown that the average error-rate of unit-weighted regression is bounded
above by 1/4. It also shows that treating feature vectors as Gaussian rather than
Boolean vectors appears to be a reasonable approximation to the problem - if anything it
introduces extra error.
All the pre-readers I shared this with had two major but tangential questions which are
worth answering once and for all.
First, Olga Kurylenko (https://www.google.com/search?q=olga+kurylenko&
oq=olga+kurylenko) and Oluchi Onweagba (https://www.google.com
/search?q=oluchi+onweagba).
Second, I didn't waste time with gimp. Imagemagick was more than sufficient:
# -resize x594 will shrink height to 594, preserve aspect ratio
$ convert olga-kurylenko-too-big.jpg -resize 'x594' olga-kurylenko.jpg;
# -tile x1 means tile the images with 1 row, however many columns are needed
$ montage -mode concatenate -tile x1 olga-kurylenko.jpg oluchi-onweagba.jpg composite.j
pg
Comments
10 of 12
10 Comments
Recommend
Share
https://www.chrisstucchio.com/blog/2014/equal...
Login
Sort by Best
Personally, I appreciate the example. I don't know if it was an intentional troll against how overly
sensitive the tech crowd is around race and gender, but it worked that way nonetheless. There's
nothing even remotely sexist or racist here. Thanks for the informative article, I learned so much
about so many things.
11
Reply Share
Get Notifications
Very unfortunately chosen example... people will very easily dismiss it as sexist (plus look at smart
and black criteria...) Oh well.... BTW a simple Pro/Con list to evaluate the pros and cons of a Pro/Con
list vs. Linear Regression would have been 75% as good as your detailed analysis :)
6
Reply Share
stucchio
Mod
Reply Share
A developer 2 years ago
Interesting article Chris. I think your example is bit too overtly sexist though. While it may resonate
with a male audience, it would be a major put off to a female audience.
That said I found the article informative and fascinating.
3
Reply Share
NotSexist > A developer 2 years ago
An article isn't sexist just because it uses a subset of the sexes as an example. That peopl
jump to such a conclusion is more telling about those people and the state of our culture. The
existence of such people surely means this article is higher risk, but its not an inherently "evil"
11 of 12
https://www.chrisstucchio.com/blog/2014/equal...
Back to
top
Get Notifications
12 of 12