0% found this document useful (0 votes)

92 views28 pages

A Tutorial On MM Algorithms

Uploaded by

mengxiangxi163

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views28 pages

A Tutorial On MM Algorithms

Uploaded by

mengxiangxi163

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

A Tutorial on MM Algorithms

David R. Hunter1
Kenneth Lange2

Department of Statistics1
Penn State University
University Park, PA 16802-2111

Departments of Biomathematics and Human Genetics2

David Geffen School of Medicine at UCLA
Los Angeles, CA 90095-1766

Research supported in part by USPHS grants GM532752 and MH594902

December 10, 2003

Abstract
Most problems in frequentist statistics involve optimization of a function
such as a likelihood or a sum of squares. EM algorithms are among the most
effective algorithms for maximum likelihood estimation because they consis-
tently drive the likelihood uphill by maximizing a simple surrogate function for
the loglikelihood. Iterative optimization of a surrogate function as exemplified
by an EM algorithm does not necessarily require missing data. Indeed, every
EM algorithm is a special case of the more general class of MM optimization
algorithms, which typically exploit convexity rather than missing data in ma-
jorizing or minorizing an objective function. In our opinion, MM algorithms
deserve to part of the standard toolkit of professional statisticians. The current
article explains the principle behind MM algorithms, suggests some methods
for constructing them, and discusses some of their attractive features. We
include numerous examples throughout the article to illustrate the concepts
described. In addition to surveying previous work on MM algorithms, this ar-
ticle introduces some new material on constrained optimization and standard
error estimation.

Key words and phrases: constrained optimization, EM algorithm, majorization,

minorization, Newton-Raphson

1
1 Introduction

Maximum likelihood and least squares are the dominant forms of estimation in fre-

quentist statistics. Toy optimization problems designed for classroom presentation

can be solved analytically, but most practical maximum likelihood and least squares

estimation problems must be solved numerically. In the current article, we discuss an

optimization method that typically relies on convexity arguments and is a general-

ization of the well-known EM algorithm method (Dempster et al., 1977; McLachlan

and Krishnan, 1997). We call any algorithm based on this iterative method an MM

algorithm.

To our knowledge, the general principle behind MM algorithms was first enun-

ciated by the numerical analysts Ortega and Rheinboldt (1970) in the context of

line search methods. de Leeuw and Heiser (1977) present an MM algorithm for

multidimensional scaling contemporary with the classic Dempster et al. (1977) pa-

per on EM algorithms. Although the work of de Leeuw and Heiser did not spark

the same explosion of interest from the statistical community set off by the Demp-

ster et al. (1977) paper, steady development of MM algorithms has continued. The

MM principle reappears, among other places, in robust regression (Huber, 1981),

in correspondence analysis (Heiser, 1987), in the quadratic lower bound principle of

Böhning and Lindsay (1988), in the psychometrics literature on least squares (Bi-

jleveld and de Leeuw, 1991; Kiers and Ten Berge, 1992), and in medical imaging

(De Pierro, 1995; Lange and Fessler, 1995). The recent survey articles of de Leeuw

(1994), Heiser (1995), Becker et al. (1997), and Lange et al. (2000) deal with the

general principle, but it is not until the rejoinder of Hunter and Lange (2000a) that

the acronym MM first appears. This acronym pays homage to the earlier names

2
“majorization” and “iterative majorization” of the MM principle, emphasizes its

crucial link to the better-known EM principle, and diminishes the possibility of con-

fusion with the distinct subject in mathematics known as majorization (Marshall

and Olkin, 1979). Recent work has demonstrated the utility of MM algorithms in a

broad range of statistical contexts, including quantile regression (Hunter and Lange,

2000b), survival analysis (Hunter and Lange, 2002), paired and multiple compar-

isons (Hunter, 2004), variable selection (Hunter and Li, 2002), and DNA sequence

analysis (Sabatti and Lange, 2002).

One of the virtues of the MM acronym is that it does double duty. In min-

imization problems, the first M of MM stands for majorize and the second M for

minimize. In maximization problems, the first M stands for minorize and the second

M for maximize. (We define the terms “majorize” and “minorize” in Section 2.) A

successful MM algorithm substitutes a simple optimization problem for a difficult

optimization problem. Simplicity can be attained by (a) avoiding large matrix in-

versions, (b) linearizing an optimization problem, (c) separating the parameters of

an optimization problem, (d) dealing with equality and inequality constraints grace-

fully, or (e) turning a nondifferentiable problem into a smooth problem. Iteration is

the price we pay for simplifying the original problem.

In our view, MM algorithms are easier to understand and sometimes easier to

apply than EM algorithms. Although we have no intention of detracting from EM

algorithms, their dominance over MM algorithms is a historical accident. An EM

algorithm operates by identifying a theoretical complete data space. In the E step

of the algorithm, the conditional expectation of the complete data loglikelihood is

calculated with respect to the observed data. The surrogate function created by the

E step is, up to a constant, a minorizing function. In the M step, this minorizing

3
function is maximized with respect to the parameters of the underlying model;

thus, every EM algorithm is an example of an MM algorithm. Construction of an

EM algorithm sometimes demands creativity in identifying the complete data and

technical skill in calculating an often complicated conditional expectation and then

maximizing it analytically.

In contrast, typical applications of MM revolve around careful inspection of a

loglikelihood or other objective function to be optimized, with particular attention

paid to convexity and inequalities. Thus, success with MM algorithms and success

with EM algorithms hinge on somewhat different mathematical maneuvers. How-

ever, the skills required by most MM algorithms are no harder to master than the

skills required by most EM algorithms. The purpose of this article is to present

some strategies for constructing MM algorithms and to illustrate various aspects of

these algorithms through the study of specific examples.

We conclude this section with a note on nomenclature. Just as EM is more a

prescription for creating algorithms than an actual algorithm, MM refers not to a

single algorithm but to a class of algorithms. Thus, this article refers to specific EM

and MM algorithms but never to “the MM algorithm” or “the EM algorithm”.

2 The MM Philosophy

Let θ(m) represent a fixed value of the parameter θ, and let g(θ | θ(m) ) denote a

real-valued function of θ whose form depends on θ(m) . The function g(θ | θ(m) ) is

said to majorize a real-valued function f (θ) at the point θ(m) provided

g(θ | θ(m) ) ≥ f (θ) for all θ,

(1)
g(θ(m) | θ(m) ) = f (θ(m) ).

4
In other words, the surface θ 7→ g(θ | θ(m) ) lies above the surface f (θ) and is tangent

to it at the point θ = θ(m) . The function g(θ | θ(m) ) is said to minorize f (θ) at θ(m)

if −g(θ | θ(m) ) majorizes −f (θ) at θ(m) .

Ordinarily, θ(m) represents the current iterate in a search of the surface f (θ).

In a majorize-minimize MM algorithm, we minimize the majorizing function g(θ |

θ(m) ) rather than the actual function f (θ). If θ(m+1) denotes the minimizer of

g(θ | θ(m) ), then we can show that the MM procedure forces f (θ) downhill. Indeed,

the inequality

f (θ(m+1) ) = g(θ(m+1) | θ(m) ) + f (θ(m+1) ) − g(θ(m+1) | θ(m) )

≤ g(θ(m) | θ(m) ) + f (θ(m) ) − g(θ(m) | θ(m) ) (2)

= f (θ(m) )

follows directly from the fact g(θ(m+1) | θ(m) ) ≤ g(θ(m) | θ(m) ) and definition (1).

The descent property (2) lends an MM algorithm remarkable numerical stability.

With straightforward changes, the MM recipe also applies to maximization rather

than minimization: To maximize a function f (θ), we minorize it by a surrogate

function g(θ | θ(m) ) and maximize g(θ | θ(m) ) to produce the next iterate θ(m+1) .

2.1 Calculation of Sample Quantiles

As a one-dimensional example, consider the problem of computing a sample quantile

from a sample x1 , . . . , xn of n real numbers. One can readily prove (Hunter and

Lange, 2000b) that for q ∈ (0, 1), a qth sample quantile of x1 , . . . , xn minimizes the

function
n
X
f (θ) = ρq (xi − θ), (3)
i=1

5
(a) (b)

20
1.5
1.0

15
x
0.5

10
x x
0.0

5
x

−2 −1 0 1 2 0 1 2 3 4 5 6

Figure 1: For q = 0.8, (a) depicts the “vee” function ρq (θ) and its quadratic ma-
jorizing function for θ(m) = −0.75; (b) shows the objective function f (θ) that is
minimized by the 0.8 quantile of the sample 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 5, along with
its quadratic majorizer, for θ(m) = 2.5.

where ρq (θ) is the “vee” function

θ≥0

qθ
ρq (θ) =
−(1 − q)θ θ < 0.

When q = 1/2, this function is proportional to the absolute value function; for

q 6= 1/2, the “vee” is tilted to one side or the other. As seen in Figure 1(a), it is

possible to majorize the “vee” function at any nonzero point by a simple quadratic

function. Specifically, for a given θ(m) 6= 0, ρq (θ) is majorized at ±θ(m) by

( )
(m) 1 θ2
ζq (θ | θ ) = + (4q − 2)θ + |θ(m) | .
4 |θ(m) |

Fortunately, the majorization relation between functions is closed under the for-

mation of sums, nonnegative products, limits, and composition with an increasing

function. These rules permit us to work piecemeal in simplifying complicated ob-

jective functions. Thus, the function f (θ) of equation (3) is majorized at the point

6
θ(m) by
n
X
g(θ | θ(m) ) = ζq (xi − θ | xi − θ(m) ). (4)
i=1

The function f (θ) and its majorizer g(θ | θ(m) ) are shown in Figure 1(b) for a

particular sample of size n = 12.

Setting the first derivative of g(θ | θ(m) ) equal to zero gives the minimum point
Pn (m)
n(2q − 1) + i=1 wi xi
θ(m+1) = Pn (m)
, (5)
w
i=1 i

(m)
where the weight wi = |xi − θ(m) |−1 depends on θ(m) . A flaw of algorithm (5)
(m)
is that the weight wi is undefined whenever θ(m) = xi . In mending this flaw,

Hunter and Lange (2000b) also discuss the broader technique of quantile regression

introduced by Koenker and Bassett (1978). From a computational perspective, the

most fascinating thing about the quantile-finding algorithm is that it avoids sorting

and relies entirely on arithmetic and iteration. For the case of the sample median

(q = 1/2), algorithm (5) is found in Schlossmacher (1973) and is shown to be an

MM algorithm by Lange and Sinsheimer (1993) and Heiser (1995).

Because g(θ | θ(m) ) in equation (4) is a quadratic function of θ, expression (5)

coincides with the more general Newton-Raphson update

h i−1
θ(m+1) = θ(m) − ∇2 g(θ(m) | θ(m) ) ∇g(θ(m) | θ(m) ), (6)

where ∇g(θ(m) | θ(m) ) and ∇2 g(θ(m) | θ(m) ) denote the gradient vector and the

Hessian matrix of g(θ | θ(m) ) evaluated at θ(m) . Since the descent property (2)

depends only on decreasing g(θ | θ(m) ) and not on minimizing it, the update (6) can

serve in cases where g(θ | θ(m) ) lacks a closed-form minimizer, provided this update

decreases the value of g(θ | θ(m) ). In the context of EM algorithms, Dempster et

7
al. (1977) call an algorithm that reduces g(θ | θ(m) ) without actually minimizing it

a generalized EM (GEM) algorithm. The specific case of equation (6), which we

call a gradient MM algorithm, is studied in the EM context by Lange (1995a), who

points out that update (6) saves us from performing iterations within iterations and

yet still displays the same local rate of convergence as a full MM algorithm that

minimizes g(θ | θ(m) ) at each iteration.

3 Tricks of the Trade

In the quantile example of Section 2.1, the convex “vee” function admits a quadratic

majorizer as depicted in Figure 1(a). In general, many majorizing or minorizing

relationships may be derived from various inequalities stemming from convexity

or concavity. This section outlines some common inequalities used to construct

majorizing or minorizing functions for various types of objective functions.

3.1 Jensen’s Inequality

Jensen’s inequality states for a convex function κ(x) and any random variable X that

κ[E (X)] ≤ E [κ(X)]. Since − ln(x) is a convex function, we conclude for probability

densities a(x) and b(x) that

a(X) a(X)

− ln E ≤ − E ln .
b(X) b(X)

If X has the density b(x), then E [a(X)/b(X)] = 1, so the left hand side above

vanishes and we obtain

E [ln a(X)] ≤ E [ln b(X)] ,

which is sometimes known as the information inequality. It is this inequality that

guarantees that a minorizing function is constructed in the E-step of any EM algo-

8
rithm (de Leeuw, 1994; Heiser, 1995), making every EM algorithm an MM algorithm.

3.2 Minorization via Supporting Hyperplanes

Jensen’s inequality is easily derived from the supporting hyperplane property of a

convex function: Any linear function tangent to the graph of a convex function is a

minorizer at the point of tangency. Thus, if κ(θ) is convex and differentiable, then

κ(θ) ≥ κ(θ(m) ) + ∇κ(θ(m) )t (θ − θ(m) ), (7)

with equality when θ = θ(m) . This inequality is illustrated by the example of Section

7 involving constrained optimization.

3.3 Majorization via the Definition of Convexity

If we wish to majorize a convex function instead of minorizing it, then we can use

the standard definition of convexity; namely, κ(t) is convex if and only if

X X
κ αi ti ≤ αi κ(ti ) (8)
i i

for any finite collection of points ti and corresponding multipliers αi with αi ≥ 0

P
and i αi = 1. Application of definition (8) is particularly effective when κ(t) is

composed with a linear function xt θ. For instance, suppose for vectors x, θ, and
(m)
θ(m) that we make the substitution ti = xi (θi − θi )/αi + xt θ(m) . Inequality (8)

then becomes

xi

t
X (m)
κ(x θ) ≤ αi κ (θi − θi ) + xt θ(m) . (9)
i
αi

Alternatively, if all components of x, θ, and θ(m) are positive, then we may take
(m) (m)
ti = xt θ(m) θi /θi and αi = xi θi /xt θ(m) . Now inequality (8) becomes
X xi θ (m)
" #
xt θ(m) θi
κ(xt θ) ≤ i
κ . (10)
i
xt θ(m) (m)
θi

9
Inequalities (9) and (10) have been used to construct MM algorithms in the contexts

of medical imaging (De Pierro, 1995; Lange and Fessler, 1995) and least-squares

estimation without matrix inversion (Becker et al., 1997).

3.4 Majorization via a Quadratic Upper Bound

If a convex function κ(θ) is twice differentiable and has bounded curvature, then

we can majorize κ(θ) by a quadratic function with sufficiently high curvature and

tangent to κ(θ) at θ(m) (Böhning and Lindsay, 1988). In algebraic terms, if we can

find a positive definite matrix M such that M − ∇2 κ(θ) is nonnegative definite for

all θ, then
1
κ(θ) ≤ κ(θ(m) ) + ∇κ(θ(m) )t (θ − θ(m) ) + (θ − θ(m) )t M (θ − θ(m) )
2
provides a quadratic upper bound. For example, Heiser (1995) notes in the unidi-

mensional case that

1 1 θ − θ(m) (θ − θ(m) )2
≤ − +
θ θ(m) (θ(m) )2 c3
for 0 < c ≤ min{θ, θ(m) }. The corresponding quadratic lower bound principle for

minorization is the basis for the logistic regression example of Section 6.

3.5 The Arithmetic-Geometric Mean Inequality

The arithmetic-geometric mean inequality is a special case of inequality (8). Taking

κ(t) = et and αi = 1/m yields

m m
!
1 X 1 X
exp ti ≤ eti .
m i=1 m i=1

If we let xi = eti , then we obtain the standard form

v
um m
uY 1 X
m
t xi ≤ xi (11)
i=1
m i=1

10
of the arithmetic-geometric mean inequality. Because the exponential function is

strictly convex, equality holds if and only if all of the xi are equal. Inequality (11)

is helpful in constructing the majorizer

(m) (m)
x2 x1
x1 x2 ≤ x21 (m)
+ x22 (m)
(12)
2x1 2x2
of the product of two positive numbers. This inequality is used in the sports contest

model of Section 4.

3.6 The Cauchy-Schwartz Inequality

The Cauchy-Schwartz inequality for the Euclidean norm is a special case of inequal-

ity (7). The function κ(θ) = kθk is convex because it satisfies the triangle inequality
qP
and the homogeneity condition kαθk = |α| · kθk. Since κ(θ) = 2
i θi , we see that

∇κ(θ) = θ/kθk, and therefore inequality (7) gives

(θ − θ(m) )t θ(m) θt θ(m)

kθk ≥ kθ(m) k + = , (13)
kθ(m) k kθ(m) k

which is the Cauchy-Schwartz inequality. de Leeuw and Heiser (1977) and Groenen

(1993) use inequality (13) to derive MM algorithms for multidimensional scaling.

4 Separation of Parameters and Cyclic MM

One of the key criteria in judging minorizing or majorizing functions is their ease

of optimization. Successful MM algorithms in high-dimensional parameter spaces

often rely on surrogate functions in which the individual parameter components are

separated. In other words, the surrogate function mapping θ ∈ U ⊂ Rd → R reduces

to the sum of d real-valued functions taking the real-valued arguments θ1 through

θd . Since the d univariate functions may be optimized one by one, this makes the

surrogate function easier to optimize at each iteration.

11
4.1 Poisson Sports Model

Consider a simplified version of a model proposed by Maher (1982) for a sports

contest between two individuals or teams in which the number of points scored by

team i against team j follows a Poisson process with intensity eoi −dj , where oi is an

“offensive strength” parameter for team i and dj is a “defensive strength” parameter

for team j. If tij is the length of time that i plays j and pij is the number of points

that i scores against j, then the corresponding Poisson loglikelihood function is

`ij (θ) = pij (oi − dj ) − tij eoi −dj + pij ln tij − ln pij !, (14)

where θ = (o, d) is the parameter vector. Note that the parameters should satisfy a
P P
linear constraint, such as i oi + j dj = 0, in order for the model be identifiable;

otherwise, it is clearly possible to add the same constant to each oi and dj without

altering the likelihood. We make two simplifying assumptions. First, different games

are independent of each other. Second, each team’s point total within a single game

is independent of its opponent’s point total. The second assumption is more suspect

than the first since it implies that a team’s offensive and defensive performances are

somehow unrelated to one another; nonetheless the model gives an interesting first

approximation to reality. Under these assumptions, the full data loglikelihood is

obtained by summing `ij (θ) over all pairs (i, j). Setting the partial derivatives of

the loglikelihood equal to zero leads to the equations

P
pij
P
−dˆj pij j
e = P i ô and eôi
= P
i tij e
i −dˆj
j tij e

ˆ These equations do not admit

satisfied by the maximum likelihood estimate (ô, d).

a closed-form solution, so we turn to an MM algorithm.

12
Because the task is to maximize the loglikelihood (14), we need a minorizing

function. Focusing on the −tij eoi −dj term, we may use inequality (12) to show that

tij e2oi tij (m) (m)

−tij eoi −dj ≥ − (m) (m)
− e−2dj eoi +dj . (15)
2 eoi +dj 2

Although the right side of the above inequality may appear more complicated than

the left side, it is actually simpler in one important respect — the parameter com-

ponents oi and dj are separated on the right side but not on the left. Summing the

loglikelihood (14) over all pairs (i, j) and invoking inequality (15) yields the function
" #
tij e2oi tij (m) (m)
− e−2dj eoi +dj
XX
(m)
g(θ | θ ) = pij (oi − dj ) − (m) (m)
i j
2 eoi +dj 2

minorizing the full loglikelihood up to an additive constant independent of θ. The

fact that the components of θ are separated by g(θ | θ(m) ) permits us to update

parameters one by one and substantially reduces computational costs. Setting the

partial derivatives of g(θ | θ(m) ) equal to zero yields the updates

 P   
j pij
P
(m+1) 1  
(m+1) 1  i pij

oi = ln P (m) (m)
, d j = − ln . (16)
2  t e−oi −dj  2  P t eo(m)
i
(m)
+dj 
j ij i ij

The question now arises as to whether one should modify algorithm (16) so that

updated subsets of the parameters are used as soon as they become available. For

instance, if we update the o vector before the d vector in each iteration of algorithm
(m+1)
(16), we could replace the formula for dj above by
 
P
(m+1) 1 
i pij

dj = − ln P (m+1) (m)
. (17)
2  oi +dj 
i tij e

In practice, an MM algorithm often takes fewer iterations when we cycle through

the parameters updating one at a time than when we update the whole vector

at once as in algorithm (16). We call such versions of MM algorithms cyclic MM

13
algorithms; they generalize the ECM algorithms of Meng and Rubin (1993). A cyclic

MM algorithm always drives the objective function in the right direction; indeed,

every iteration of a cyclic MM algorithm is simply an MM iteration on a reduced

parameter set.

Team ôi + dˆi Wins Team ôi + dˆi Wins

Cleveland -0.0994 17 Phoenix 0.0166 44
Denver -0.0845 17 New Orleans 0.0169 47
Toronto -0.0647 24 Philadelphia 0.0187 48
Miami -0.0581 25 Houston 0.0205 43
Chicago -0.0544 30 Minnesota 0.0259 51
Atlanta -0.0402 35 LA Lakers 0.0277 50
LA Clippers -0.0355 27 Indiana 0.0296 48
Memphis -0.0255 28 Utah 0.0299 47
New York -0.0164 37 Portland 0.0320 50
Washington -0.0153 37 Detroit 0.0336 50
Boston -0.0077 44 New Jersey 0.0481 49
Golden State -0.0051 38 San Antonio 0.0611 60
Orlando -0.0039 42 Sacramento 0.0686 59
Milwaukee -0.0027 42 Dallas 0.0804 60
Seattle 0.0039 40

Table 1: Ranking of all 29 NBA teams on the basis of the 2002-2003 regular season
according to their estimated offensive strength plus defensive strength. Each team
played 82 games.

4.2 Application to National Basketball Association Results

Table 1 summarizes our application of the Poisson sports model to the results of the

2002–2003 regular season of the National Basketball Association. In these data, tij

is measured in minutes. A regular game lasts 48 minutes, and each overtime period,
ˆ
if necessary, adds five minutes. Thus, team i is expected to score 48eôi −dj points

against team j when the two teams meet and do not tie. Team i is ranked higher

than team j if ôi − dˆj > ôj − dˆi , which is equivalent to ôi + dˆi > ôj + dˆj .

14
It is worth emphasizing some of the virtues of the model. First, the ranking of

the 29 NBA teams on the basis of the estimated sums ôi +dˆi for the 2002-2003 regular

season is not perfectly consistent with their cumulative wins; strength of schedule

and margins of victory are reflected in the model. Second, the model gives the point-

spread function for a particular game as the difference of two independent Poisson

random variables. Third, one can easily amend the model to rank individual players

rather than teams by assigning to each player an offensive and defensive intensity

parameter. If each game is divided into time segments punctuated by substitutions,

then the MM algorithm can be adapted to estimate the assigned player intensities.

This might provide a rational basis for salary negotiations that takes into account

subtle differences between players not reflected in traditional sports statistics.

Finally, the NBA data set sheds light on the comparative speeds of the original

MM algorithm (16) and its cyclic modification (17). The cyclic MM algorithm

converged in fewer iterations (25 instead of 28). However, because of the additional

work required to recompute the denominators in equation (17), the cyclic version

required slightly more floating-point operations as counted by MATLAB (301,157

instead of 289,998).

5 Speed of Convergence

MM algorithms and Newton-Raphson algorithms have complementary strengths.

On one hand, Newton-Raphson algorithms boast a quadratic rate of convergence as

they near a local optimum point θ∗ . In other words, under certain general conditions,

kθ(m+1) − θ∗ k
lim = c
m→∞ kθ (m) − θ ∗ k2

15
for some constant c. This quadratic rate of convergence is much faster than the

linear rate of convergence

kθ(m+1) − θ∗ k
lim = c < 1 (18)
m→∞ kθ (m) − θ ∗ k

displayed by typical MM algorithms. Hence, Newton-Raphson algorithms tend to

require fewer iterations than MM algorithms. On the other hand, an iteration of

a Newton-Raphson algorithm can be far more computationally onerous than an

iteration of an MM algorithm. Examination of the form

θ(m+1) = θ(m) − ∇2 f (θ(m) )−1 ∇f (θ(m) )

of a Newton-Raphson iteration reveals that it requires evaluation and inversion of the

Hessian matrix ∇2 f (θ(m) ). If θ has p components, then the number of calculations

needed to invert the p × p matrix ∇2 f (θ) is roughly proportional to p3 . By contrast,

an MM algorithm that separates parameters usually takes on the order of p or

p2 arithmetic operations per iteration. Thus, well-designed MM algorithms tend

to require more iterations but simpler iterations than Newton-Raphson. For this

reason MM algorithms sometimes enjoy an advantage in computational speed.

For example, the Poisson process scoring model for the NBA data set of Section

4 has 57 parameters (two for each of 29 teams minus one for the linear constraint).

A single matrix inversion of a 57 × 57 matrix requires roughly 387,000 floating point

operations according to MATLAB. Thus, even a single Newton-Raphson iteration re-

quires more computation in this example than the 300,000 floating point operations

required for the MM algorithm to converge completely in 28 iterations. Numerical

stability also enters the balance sheet. A Newton-Raphson algorithm can behave

poorly if started too far from an optimum point. By contrast, MM algorithms are

16
guaranteed to appropriately increase or decrease the value of the objective function

at every iteration.

Other types of deterministic optimization algorithms, such as Fisher scoring,

quasi-Newton methods, or gradient-free methods like Nelder-Mead, occupy a kind

of middle ground. Although none of them can match Newton-Raphson in required

iterations until convergence, each has its own merits. The expected information

matrix used in Fisher scoring is sometimes easier to evaluate than the observed

information matrix of Newton-Raphson. Scoring does not automatically lead to an

increase in the loglikelihood, but at least (unlike Newton-Raphson) it can always be

made to do so if some form of backtracking is incorporated. Quasi-Newton methods

mitigate or even eliminate the need for matrix inversion. The Nelder-Mead approach

is applicable in situations where the objective function is nondifferentiable. Because

of the complexities of practical problems, it is impossible to declare any optimization

algorithm best overall. In our experience, however, MM algorithms are often difficult

to beat in terms of stability and computational simplicity.

6 Standard Error Estimates

In most cases, a maximum likelihood estimator has asymptotic covariance matrix

equal to the inverse of the expected information matrix. In practice, the expected

information matrix is often well-approximated by the observed information matrix

−∇2 `(θ) computed by differentiating the loglikelihood `(θ) twice. Thus, after the

MLE θ̂ has been found, a standard error of θ̂ can be obtained by taking square roots

of the diagonal entries of the inverse of −∇2 `(θ̂). In some problems, however, direct

calculation of ∇2 `(θ̂) is difficult. Here we propose two numerical approximations to

this matrix that exploit quantities readily obtained by running an MM algorithm.

17
Let g(θ | θ(m) ) denote a minorizing function of the loglikelihood `(θ) at the point

θ(m) , and define

M (ϑ) = arg max g(θ | ϑ)

to be the MM algorithm map taking θ(m) to θ(m+1) .

6.1 Numerical Differentiation via MM

The two numerical approximations to −∇2 `(θ̂) are based on the formulas
h i
∇2 `(θ̂) = ∇2 g(θ̂ | θ̂) I − ∇M (θ̂) (19)
∂

= ∇2 g(θ̂ | θ̂) + ∇g(θ̂ | ϑ) , (20)
∂ϑ ϑ=θ̂

where I denotes the identity matrix. These formulas are derived in Lange (1999)

using two simple facts: First, the tangency of `(θ) and its minorizer imply that their

gradient vectors are equal at the point of minorization; and second, the gradient of

g(θ | θ(m) ) at its maximizer M (θ(m) ) is zero. Alternative derivations of formulas

(19) and (20) are given by Meng and Rubin (1991) and Oakes (1999), respectively.

Although these formulas have been applied to standard error estimation in the EM

algorithm literature — Meng and Rubin (1991) base their SEM idea on formula

(19) — to our knowledge, neither has been applied in the broader context of MM

algorithms.

Approximation of ∇2 `(θ̂) using equation (19) requires a numerical approximation

of the Jacobian matrix ∇M (θ), whose i, j entry equals

∂ Mi (θ + δej ) − Mi (θ)
Mi (θ) = lim , (21)
∂θj δ→0 δ

where the vector ej is the jth standard basis vector having a one in its jth com-

ponent and zeros elsewhere. Since M (θ̂) = θ̂, the jth column of ∇M (θ̂) may be

18
approximated using only output from the corresponding MM algorithm by (a) it-

erating until θ̂ is found, (b) altering the jth component of θ̂ by a small amount δj ,

and (e) dividing by δj . Approximation of ∇2 `(θ̂) using equation (20) is analogous

except it involves numerically approximating the Jacobian of h(ϑ) = ∇g(θ̂ | ϑ). In

this case one may exploit the fact that h(θ̂) is zero.

6.2 An MM Algorithm for Logistic Regression

To illustrate these ideas and facilitate comparison of the various numerical methods,

we consider an example in which the Hessian of the loglikelihood is easy to compute.

Böhning and Lindsay (1988) apply the quadratic bound principle of Section 3.4 to

the case of logistic regression, in which we have an n×1 vector Y of binary responses

and an n×p matrix X of predictors. The model stipulates that the probability πi (θ)

that Yi = 1 equals exp{θt xi }/ 1 + exp{θt xi } , Straightforward differentiation of the

resulting loglikelihood function shows that

n
X
∇2 `(θ) = − πi (θ)[1 − πi (θ)]xi xti .
i=1
1
Since πi (θ)[1 − πi (θ)] is bounded above by 4, we may define the negative definite

matrix B = − 41 X t X and conclude that ∇2 `(θ)−B is nonnegative definite as desired.

Therefore, the quadratic function

1
g(θ | θ(m) ) = `(θ(m) ) + ∇`(θ(m) )t (θ − θ(m) ) + (θ − θ(m) )t B(θ − θ(m) )
2
minorizes `(θ) at θ(m) . The MM algorithm proceeds by maximizing this quadratic,

giving

θ(m+1) = θ(m) − B −1 ∇`(θ(m) )

= θ(m) − 4(X t X)−1 X t [Y − π(θ(m) )]. (22)

19
Since the MM algorithm of equation (22) needs to invert X t X only once, it en-

joys an increasing computational advantage over Newton-Raphson as the number

of predictors p increases (Böhning and Lindsay, 1988).

Standard errors based on:

Variable θ̂ Exact ∇2 `(θ̂) Eqn (19) Eqn (20)
Constant 0.48062 1.1969 1.1984 1.1984
AGE −0.029549 0.037031 0.037081 0.037081
LWT −0.015424 0.0069194 0.0069336 0.0069336
RACE2 1.2723 0.52736 0.52753 0.52753
RACE3 0.8805 0.44079 0.44076 0.44076
SMOKE 0.93885 0.40215 0.40219 0.40219
PTL 0.54334 0.34541 0.34545 0.34545
HT 1.8633 0.69754 0.69811 0.69811
UI 0.76765 0.45932 0.45933 0.45933
FTV 0.065302 0.17240 0.17251 0.17251

Table 2: Estimated coefficients and standard errors for the low birth weight logistic
regression example.

6.3 Application to Low Birth Weight Data

We now test the standard error approximations based on equations (19) and (20) on

the low birth weight dataset of Hosmer and Lemeshow (1989). This dataset involves

189 observations and eight maternal predictors. The response is 0 or 1 according

to whether an infant is born underweight, defined as less than 2.5 kilograms. The

predictors include mother’s age in years (AGE), weight at last menstrual period

(LWT), race (RACE2 and RACE3), smoking status during pregnancy (SMOKE),

number of previous premature labors (PTL), presence of hypertension history (HT),

presence of uterine irritability (UI), and number of physician visits during the first

trimester (FTV). Each of these predictors is quantitative except for race, which is

20
a 3-level factor with level 1 for whites, level 2 for blacks, and level 3 for other races.

Table 2 shows the maximum likelihood estimates and asymptotic standard errors for

the 10 parameters. The differentiation increment δj was θ̂j /1000 for each parameter

θj . The standard error approximations in the two rightmost columns turn out to be

the same in this example, but in other models they will differ. The close agreement

of the approximations with the “gold standard” based on the exact value ∇2 `(θ̂) is

clearly good enough for practical purposes.

7 Handling Constraints

Many optimization problems impose constraints on parameters. For example, pa-

rameters are often required to be nonnegative. Here we discuss a majorization

technique that in a sense eliminates inequality constraints. For this adaptive barrier

method (Censor and Zenios, 1992; Lange, 1994) to work, an initial point θ(0) must

be selected with all inequality constraints strictly satisfied. The barrier method

confines subsequent iterates to the interior of the parameter space but allows strict

inequalities to become equalities in the limit.

Consider the problem of minimizing f (θ) subject to the constraints vj (θ) ≥ 0

for 1 ≤ j ≤ q, where each vj (θ) is a concave, differentiable function. Since −vj (θ) is

convex, we know from inequality (7) that

vj (θ(m) ) − vj (θ) ≥ ∇vj (θ(m) )t (θ(m) − θ).

Application of the similar inequality ln s − ln t ≥ s−1 (t − s) implies that

h i
vj (θ(m) ) − ln vj (θ) + ln vj (θ(m) ) ≥ vj (θ(m) ) − vj (θ).

21
Adding the last two inequalities, we see that
h i
vj (θ(m) ) − ln vj (θ) + ln vj (θ(m) ) + ∇vj (θ(m) )t (θ − θ(m) ) ≥ 0,

with equality when θ = θ(m) . Summing over j and multiplying by a positive tuning

parameter ω, we construct the function

q h
(m)
X vj (θ(m) ) i
g(θ | θ ) = f (θ) + ω vj (θ(m) ) ln + (θ − θ(m) )t ∇vj (θ(m) ) (23)
j=1
vj (θ)

majorizing f (θ) at θ(m) . The presence of the term ln vj (θ) in equation (23) prevents

vj (θ(m+1) ) ≤ 0 from occurring. The multiplier vj (θ(m) ) of ln vj (θ) gradually adapts

and allows vj (θ(m+1) ) to tend to 0 if it is inclined to do so. When there are equality

constraints Aθ = b in addition to the inequality constraints vj (θ) ≥ 0, these should

be enforced during the minimization of g(θ | θ(m) ).

7.1 Multinomial Sampling

To gain a feel for how these ideas work in practice, consider the problem of max-

imum likelihood estimation given a random sample of size n from a multinomial

distribution. If there are q categories and ni observations fall in category i, then

P
the loglikelihood reduces to i ni ln θi plus a constant. The components of the pa-

rameter vector θ satisfy θi ≥ 0 and

P
i θi = 1. Although it is well known that the

maximum likelihood estimates are given by θ̂i = ni /n, this example is instructive

because it is explicitly solvable and demonstrates the linear rate of convergence of

the proposed MM algorithm.

To minimize the negative loglikelihood f (θ) = −

P
i ni ln θi subject to the q

inequality constraints vi (θ) = θi ≥ 0 and the equality constraint

P
i θi = 1, we

22
construct the majorizing function
q q
X (m) X
g(θ | θ(m) ) = f (θ) − ω θi ln θi + ω θi
i=1 i=1

suggested in equation (23), omitting irrelevant constants. We minimize g(θ | θ(m) )

P
while enforcing i θi = 1 by introducing a Lagrange multiplier and looking for a

stationary point of the Lagrangian

X
h(θ) = g(θ | θ(m) ) + λ θ−1 .
i

Setting ∂h(θ)/∂θi equal to zero and multiplying by θi gives

(m)
−ni − ωθi + ωθi + λθi = 0.

Summing on i reveals that λ = n and yields the update

(m+1) ni + ωθ(m)
θi = .
n+ω

Hence, all iterates have positive components if they start with positive components.

The final rearrangement

ni ω ni

(m+1) (m)
θi − = θ − .
n n+ω i n

demonstrates that θ(m) approaches the estimate θ̂ at the linear rate ω/(n + ω),

regardless of whether θ̂ occurs on the boundary of the parameter space where one

or more of its components θ̂i equal zero.

8 Discussion

This article is meant to whet readers’ appetites, not satiate them. We have omitted

much. For instance, there is a great deal known about the convergence properties

23
of MM algorithms that is too mathematically demanding to present here. Fortu-

nately, almost all results from the EM algorithm literature (Wu, 1983; Lange, 1995a;

McLachlan and Krishnan, 1997; Lange, 1999) carry over without change to MM al-

gorithms. Furthermore, there are several methods for accelerating EM algorithms

that are also applicable to accelerating MM algorithms (Heiser, 1995; Lange, 1995b;

Jamshidian and Jennrich, 1997; Lange et al., 2000).

Although this survey article necessarily reports much that is already known,

there are some new results here. Our MM treatment of constrained optimization

in Section 7 is more general than previous versions in the literature (Censor and

Zenios, 1992; Lange, 1994). The application of equation (20) to the estimation of

standard errors in MM algorithms is new, as is the extension of the SEM idea of

Meng and Rubin (1991) to the MM case.

There are so many examples of MM algorithms in the literature that we are

unable to cite them all. Readers should be on the lookout for these and for known

EM algorithms that can be explained more simply as MM algorithms. Even more

importantly, we hope this article will stimulate readers to discover new MM algo-

rithms.

References

M. P. Becker, I. Yang, and K. Lange (1997), EM algorithms without missing data,

Stat. Methods Med. Res., 6, 38–54.

C. C. J. H. Bijleveld and J. de Leeuw (1991), Fitting longitudinal reduced-rank

regression models by alternating least squares, Psychometrika, 56, 433–447.

24
D. Böhning and B. G. Lindsay (1988), Monotonicity of quadratic approximation

algorithms, Ann. Instit. Stat. Math., 40, 641–663.

Y. Censor and S. A. Zenios (1992), Proximal minimization with D-functions, J.

Optimization Theory Appl. 73, 451–464.

J. de Leeuw (1994), Block relaxation algorithms in statistics, in Information Systems

and Data Analysis (ed. H. H. Bock, W. Lenski, and M. M. Richter), pp. 308–325.

Berlin: Springer-Verlag.

J. de Leeuw and W. J. Heiser (1977), Convergence of correction matrix algorithms

for multidimensional scaling, in Geometric Representations of Relational Data

(ed. J. C. Lingoes, E. Roskam, and I. Borg), pp. 735–752. Ann Arbor: Mathesis

Press.

A. P. Dempster, N. M. Laird, and D. B. Rubin (1977), Maximum likelihood from

incomplete data via the EM algorithm, J. Roy. Stat. Soc. B, 39, 1–38.

A. R. De Pierro (1995), A modified expectation maximization algorithm for penal-

ized likelihood estimation in emission tomography, IEEE Trans. Med. Imaging,

14, 132–137.

P. J. F. Groenen (1993), The Majorization Approach to Multidimensional Scaling:

Some Problems and Extensions, DSWO Press, Leiden, the Netherlands.

W. J. Heiser (1987), Correspondence analysis with least absolute residuals, Comput.

Stat. Data Analysis, 5, 337–356.

W. J. Heiser (1995), Convergent computing by iterative majorization: theory and

applications in multidimensional data analysis, in Recent Advances in Descrip-

25
tive Multivariate Analysis (ed. W. J. Krzanowski), pp. 157–189. Oxford: Claren-

don Press.

D. W. Hosmer and S. Lemeshow (1989), Applied Logistic Regression, Wiley, New

York.

P. J. Huber (1981), Robust Statistics, Wiley, New York.

D. R. Hunter (2004), MM algorithms for generalized Bradley-Terry models, Annals

Stat., to appear.

D. R. Hunter and K. Lange (2000a), Rejoinder to discussion of “Optimization trans-

fer using surrogate objective functions”, J. Comput. Graphical Stat. 9, 52–59.

D. R. Hunter and K. Lange (2000b), Quantile regression via an MM algorithm, J.

Comput. Graphical Stat. 9, 60–77.

D. R. Hunter and K. Lange (2002), Computing estimates in the proportional odds

model, Ann. Inst. Stat. Math. 54, 155–168.

D. R. Hunter and R. Li (2002), A connection between variable selection and EM-

type algorithms, Pennsylvania State University statistics department technical

report 0201.

M. Jamshidian and R. I. Jennrich (1997), Quasi-Newton acceleration of the EM

algorithm, J. Roy. Stat. Soc. B 59, 569–587.

H. A. L. Kiers and J. M. F. Ten Berge (1992), Minimization of a class of matrix

trace functions by means of refined majorization, Psychometrika, 57, 371–382.

R. Koenker and G. Bassett (1978), Regression quantiles, Econometrica, 46, 33–50.

26
K. Lange (1994), An adaptive barrier method for convex programming, Methods

Applications Analysis, 1, 392–402.

K. Lange (1995a), A gradient algorithm locally equivalent to the EM algorithm, J.

Roy. Stat. Soc. B, 57, 425–437.

K. Lange (1995b), A quasi-Newton acceleration of the EM algorithm, Statistica

Sinica, 5, 1–18.

K. Lange (1999), Numerical Analysis for Statisticians, Springer-Verlag, New York.

K. Lange and J. A. Fessler (1995), Globally convergent algorithms for maximum a

posteriori transmission tomography, IEEE Trans. Image Processing, 4, 1430–

1438.

K. Lange, D. R. Hunter, and I. Yang (2000), Optimization transfer using surrogate

objective functions (with discussion), J. Comput. Graphical Stat. 9, 1–20.

K. Lange and J. S. Sinsheimer (1993) Normal/independent distributions and their

applications in robust regression. J. Comput. Graphical Stat. 2, 175-198

D. G. Luenberger (1984), Linear and Nonlinear Programming, 2nd ed., Addison-

Wesley, Reading, MA.

M. J. Maher (1982), Modelling association football scores, Statistica Neerlandica,

36: 109–118.

A. W. Marshall and I. Olkin (1979), Inequalities: Theory of Majorization and its

Applications, Academic, San Diego.

27
G. J. McLachlan and T. Krishnan (1997), The EM Algorithm and Extensions, Wiley,

New York.

X-L Meng and D. B. Rubin (1991), Using EM to obtain asymptotic variance-

covariance matrices: The SEM algorithm, J. Amer. Stat. Assoc., 86, 899–909.

X-L Meng and D. B. Rubin (1993), Maximum likelihood estimation via the ECM

algorithm: a general framework, Biometrika, 80, 267–278.

D. Oakes (1999), Direct calculation of the information matrix via the EM algorithm,

J. Roy. Stat. Soc. B, 61, Part 2, 479–482.

J. M. Ortega and W. C. Rheinboldt (1970), Iterative Solutions of Nonlinear Equa-

tions in Several Variables, Academic, New York, pp. 253–255.

C. Sabatti and K. Lange (2002), Genomewide motif identification using a dictionary

model, Proceedings IEEE 90, 1803–1810.

E. J. Schlossmacher (1973), An iterative technique for absolute deviations curve

fitting, J. Amer. Stat. Assoc., 68, 857–859.

C. F. J. Wu (1983). On the convergence properties of the EM algorithm, Annals

Stat., 11, 95–103.

The Terror: About This Text
0% (2)
The Terror: About This Text
6 pages
Antenna Pattern Design Based On Minimax Algorithm: The Aerospace Corporation, El Segundo, California 90009, United States
No ratings yet
Antenna Pattern Design Based On Minimax Algorithm: The Aerospace Corporation, El Segundo, California 90009, United States
33 pages
Activity No.1: Persons and Careers (What Influence Your Career Choice/ Self-Assessment Tool)
100% (5)
Activity No.1: Persons and Careers (What Influence Your Career Choice/ Self-Assessment Tool)
2 pages
Chapter 12 Factory Over Head Planned Actual and Applied Variance Analysis
No ratings yet
Chapter 12 Factory Over Head Planned Actual and Applied Variance Analysis
29 pages
MM Algorithm
No ratings yet
MM Algorithm
28 pages
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
8 pages
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
9 pages
Lange Talk
No ratings yet
Lange Talk
40 pages
De Leeuw
No ratings yet
De Leeuw
50 pages
Generalized Majorization-Minimization
No ratings yet
Generalized Majorization-Minimization
10 pages
Sparsity and Its Mathematics
No ratings yet
Sparsity and Its Mathematics
44 pages
Summary SC Microeconometrics
No ratings yet
Summary SC Microeconometrics
20 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
TVDMM
No ratings yet
TVDMM
14 pages
ds11 2
No ratings yet
ds11 2
19 pages
16 Aos1435
No ratings yet
16 Aos1435
44 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Mathematical Optimization
No ratings yet
Mathematical Optimization
11 pages
Lecture 3 ML - Optimization
No ratings yet
Lecture 3 ML - Optimization
32 pages
3-2
No ratings yet
3-2
2 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Chapter 9.4 Allele Frequency Estimation
No ratings yet
Chapter 9.4 Allele Frequency Estimation
24 pages
02 Grad Desc
No ratings yet
02 Grad Desc
54 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lec 13
No ratings yet
Lec 13
27 pages
Multivariable Optimization
No ratings yet
Multivariable Optimization
48 pages
Beamer
No ratings yet
Beamer
34 pages
EM Algorithm and Variants: An Informal Tutorial
No ratings yet
EM Algorithm and Variants: An Informal Tutorial
17 pages
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
RL Unit 3,4,5
No ratings yet
RL Unit 3,4,5
19 pages
HMM Tutorial
No ratings yet
HMM Tutorial
15 pages
Figueiredo EM Algorithm
No ratings yet
Figueiredo EM Algorithm
35 pages
M-Estimators and Half-Quadratic Minimization
No ratings yet
M-Estimators and Half-Quadratic Minimization
10 pages
L01_Intro
No ratings yet
L01_Intro
70 pages
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
No ratings yet
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
15 pages
2B Naive Bayes
No ratings yet
2B Naive Bayes
90 pages
CS-6777 Liu Abs
100% (1)
CS-6777 Liu Abs
103 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
Expectation Maximization (EM) Algorithm
No ratings yet
Expectation Maximization (EM) Algorithm
47 pages
The Levenberg-Marquardt Algorithm: Ananth Ranganathan 8th June 2004
No ratings yet
The Levenberg-Marquardt Algorithm: Ananth Ranganathan 8th June 2004
5 pages
Patch-Based Image Super Resolution Using Generalized Gaussian Mixture Model
No ratings yet
Patch-Based Image Super Resolution Using Generalized Gaussian Mixture Model
4 pages
Lec - 2 Single Variable Opt1
No ratings yet
Lec - 2 Single Variable Opt1
22 pages
E-M
No ratings yet
E-M
12 pages
Gcmma
No ratings yet
Gcmma
23 pages
Chap01 Introduction
No ratings yet
Chap01 Introduction
21 pages
Powell - A View of Algorithms For Optimization Without Derivatives
No ratings yet
Powell - A View of Algorithms For Optimization Without Derivatives
12 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
Unit 3 ML
No ratings yet
Unit 3 ML
45 pages
Unit 2
No ratings yet
Unit 2
7 pages
Levenberg Marquardt Algorithm
100% (5)
Levenberg Marquardt Algorithm
5 pages
The Expectation Maximization Algorithm
No ratings yet
The Expectation Maximization Algorithm
7 pages
Convex Optimization - Introduction (S.l. Dr. Ing. Carmen Voicu)
No ratings yet
Convex Optimization - Introduction (S.l. Dr. Ing. Carmen Voicu)
32 pages
The EM Algorithm: Ajit Singh November 20, 2005
No ratings yet
The EM Algorithm: Ajit Singh November 20, 2005
4 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
ML Lecture16
No ratings yet
ML Lecture16
39 pages
Optim
No ratings yet
Optim
70 pages
Process Optimization
100% (1)
Process Optimization
70 pages
AI29
No ratings yet
AI29
3 pages
Convex Functions
No ratings yet
Convex Functions
13 pages
Internal
No ratings yet
Internal
25 pages
NO LINEALs
No ratings yet
NO LINEALs
61 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
TTA Classic Chart (C)
No ratings yet
TTA Classic Chart (C)
1 page
Social Studies Lesson Exemplar
100% (1)
Social Studies Lesson Exemplar
8 pages
Mechanical Properties of Solids
No ratings yet
Mechanical Properties of Solids
34 pages
Understanding Smart Cities - An Integrative Framework - Chourabi
No ratings yet
Understanding Smart Cities - An Integrative Framework - Chourabi
9 pages
Lennox Manual
No ratings yet
Lennox Manual
12 pages
23.1 Thermal Processess Cie Igcse Physics Ext Theory QP 1
No ratings yet
23.1 Thermal Processess Cie Igcse Physics Ext Theory QP 1
13 pages
Web Guide
No ratings yet
Web Guide
60 pages
Buurtzorg Model
No ratings yet
Buurtzorg Model
10 pages
Handbook of Ethics in Quantitative Methodology 1st Edition A. T. Panter All Chapters Instant Download
100% (4)
Handbook of Ethics in Quantitative Methodology 1st Edition A. T. Panter All Chapters Instant Download
84 pages
HGI Development Deck 2021
No ratings yet
HGI Development Deck 2021
57 pages
Final IInd Year Syllabus of BAMS
67% (3)
Final IInd Year Syllabus of BAMS
22 pages
Lesson 2 - Stem Cells PPT Notes
No ratings yet
Lesson 2 - Stem Cells PPT Notes
8 pages
Experiancing God Unit 10
No ratings yet
Experiancing God Unit 10
2 pages
Interlocking Paver Block Making Cost: Top Layer 500
No ratings yet
Interlocking Paver Block Making Cost: Top Layer 500
5 pages
Parts Manual Parts Manual Parts Manual Parts Manual: Mfg. No: 122Q02-0001-H1
No ratings yet
Parts Manual Parts Manual Parts Manual Parts Manual: Mfg. No: 122Q02-0001-H1
25 pages
Week 5
No ratings yet
Week 5
8 pages
NEUPANE - Richa - Biochar Production Process Optimisation and Product Characterisation
No ratings yet
NEUPANE - Richa - Biochar Production Process Optimisation and Product Characterisation
114 pages
Anand Sahib-Sikh Prayer-Simple Punjabi
No ratings yet
Anand Sahib-Sikh Prayer-Simple Punjabi
124 pages
6th Eng Term3-Guide - PDF
No ratings yet
6th Eng Term3-Guide - PDF
36 pages
Top Ten Ways of Handling Guest Complaints
No ratings yet
Top Ten Ways of Handling Guest Complaints
6 pages
2 Modulepattern
No ratings yet
2 Modulepattern
2 pages
Service Manual: Finisher
No ratings yet
Service Manual: Finisher
235 pages
Dental Manpower
No ratings yet
Dental Manpower
24 pages
Cisco Stealthwatch: Cisco Threat Response Integration Guide 7.1.2
No ratings yet
Cisco Stealthwatch: Cisco Threat Response Integration Guide 7.1.2
23 pages
RWS Vol. 29 No. 2 113 146 Presto 2020 Revisiting Intersectional Identities - Voices of Poor Bakla Youth in Rural Philippines
No ratings yet
RWS Vol. 29 No. 2 113 146 Presto 2020 Revisiting Intersectional Identities - Voices of Poor Bakla Youth in Rural Philippines
34 pages
Ivey Business School Private Equity - Bus9452 Course Syllabus and Outline MBA 2021 5 Elective Period
No ratings yet
Ivey Business School Private Equity - Bus9452 Course Syllabus and Outline MBA 2021 5 Elective Period
5 pages
Mutus Liber Images
No ratings yet
Mutus Liber Images
15 pages