0% found this document useful (0 votes)
294 views14 pages

2023_hw3sol

Stanford Convex Optimization HW 3 Solution

Uploaded by

Kushagra Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
294 views14 pages

2023_hw3sol

Stanford Convex Optimization HW 3 Solution

Uploaded by

Kushagra Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

EE364a, Winter 2022-23 Prof. S.

Boyd

EE364a Homework 3 solutions


4.1, 4.8bcd 4.15, A3.53, A3.54, A3.57, A3.59, A4.2, A4.28, A20.9
4.1 Consider the optimization problem
minimize f0 (x1 , x2 )
subject to 2x1 + x2 ≥ 1
x1 + 3x2 ≥ 1
x1 ≥ 0, x2 ≥ 0.
Make a sketch of the feasible set. For each of the following objective functions, give
the optimal set and the optimal value.
(a) f0 (x1 , x2 ) = x1 + x2 .
(b) f0 (x1 , x2 ) = −x1 − x2 .
(c) f0 (x1 , x2 ) = x1 .
(d) f0 (x1 , x2 ) = max{x1 , x2 }.
(e) f0 (x1 , x2 ) = x21 + 9x22 .
Solution. The feasible set is shown in the figure.
x2

v3

v2

v1 x1

(a) x? = (2/5, 1/5).


(b) Unbounded below.
(c) Xopt = {(0, x2 ) | x2 ≥ 1}.
(d) x? = (1/3, 1/3).
(e) x? = (1/2, 1/6). This is optimal because it satisfies 2x1 +x2 = 7/6 > 1, x1 +3x2 =
1, and
∇f0 (x? ) = (1, 3)
is perpendicular to the line x1 + 3x2 = 1.

1
4.8 Some simple LPs. Give an explicit solution of each of the following LPs.
(b) Minimizing a linear function over a halfspace.
minimize cT x
subject to aT x ≤ b,
where a 6= 0.
Solution. This problem is always feasible. The vector c can be decomposed into
a component parallel to a and a component orthogonal to a:
c = aλ + ĉ,
with aT ĉ = 0.
• If λ > 0, the problem is unbounded below. Choose x = −ta, and let t go to
infinity:
cT x = −tcT a = −tλaT a → −∞
and
aT x − b = −taT a − b ≤ 0
for large t, so x is feasible for large t. Intuitively, by going very far in the
direction −a, we find feasible points with arbitrarily negative objective values.
• If ĉ 6= 0, the problem is unbounded below. Choose x = aTb a a − tĉ and let t go
to infinity.
• If c = aλ for some λ ≤ 0, the optimal value is λb; any point x with aT x = b
is optimal.
In summary, the optimal value is

? λb c = aλ for some λ ≤ 0
p =
−∞ otherwise.

(c) Minimizing a linear function over a rectangle.


minimize cT x
subject to l  x  u,
where l and u satisfy l  u.
Solution. The objective and the constraints are separable: The objective is a
sum of terms ci xi , each dependent on one variable only; each constraint depends
on only one variable. We can therefore solve the problem by minimizing over
each component of x independently. The optimal x?i minimizes ci xi subject to the
constraint li ≤ xi ≤ ui . If ci > 0, then x?i = li ; if ci < 0, then x?i = ui ; if ci = 0,
then any xi in the interval [li , ui ] is optimal. Therefore, the optimal value of the
problem is
p? = lT c+ − uT c− ,

where c+
i = max{ci , 0} and ci = max{−ci , 0}.

2
(d) Minimizing a linear function over the probability simplex.
minimize cT x
subject to 1T x = 1, x  0.
What happens if the equality constraint is replaced by an inequality 1T x ≤ 1?
We can interpret this LP as a simple portfolio optimization problem. The vector
x represents the allocation of our total budget over different assets, with xi the
fraction invested in asset i. The return of each investment is fixed and given by
−ci , so our total return (which we want to maximize) is −cT x. If we replace the
budget constraint 1T x = 1 with an inequality 1T x ≤ 1, we have the option of not
investing a portion of the total budget.
Solution. Suppose the components of c are sorted in increasing order with
c1 = c2 = · · · = ck < ck+1 ≤ · · · ≤ cn .
We have
cT x ≥ c1 (1T x) = cmin
for all feasible x, with equality if and only if
x1 + · · · + xk = 1, x1 ≥ 0, . . . , xk ≥ 0, xk+1 = · · · = xn = 0.
We conclude that the optimal value is p? = c1 = cmin . In the investment interpre-
tation this choice is quite obvious. If the returns are fixed and known, we invest
our total budget in the investment with the highest return.
If we replace the equality with an inequality, the optimal value is equal to
p? = min{0, cmin }.
(If cmin ≤ 0, we make the same choice for x as above. Otherwise, we choose
x = 0.)
4.15 Relaxation of Boolean LP. In a Boolean linear program, the variable x is constrained
to have components equal to zero or one:
minimize cT x
subject to Ax  b (1)
xi ∈ {0, 1}, i = 1, . . . , n.
In general, such problems are very difficult to solve, even though the feasible set is
finite (containing at most 2n points).
In a general method called relaxation, the constraint that xi be zero or one is replaced
with the linear inequalities 0 ≤ xi ≤ 1:
minimize cT x
subject to Ax  b (2)
0 ≤ xi ≤ 1, i = 1, . . . , n.

3
We refer to this problem as the LP relaxation of the Boolean LP (1). The LP relaxation
is far easier to solve than the original Boolean LP.

(a) Show that the optimal value of the LP relaxation (2) is a lower bound on the
optimal value of the Boolean LP (1). What can you say about the Boolean LP if
the LP relaxation is infeasible?
(b) It sometimes happens that the LP relaxation has a solution with xi ∈ {0, 1}.
What can you say in this case?

Solution.

(a) The feasible set of the relaxation includes the feasible set of the Boolean LP. It
follows that the Boolean LP is infeasible if the relaxation is infeasible, and that
the optimal value of the relaxation is less than or equal to the optimal value of
the Boolean LP.
(b) The optimal solution of the relaxation is also optimal for the Boolean LP.

A3.53 Curvature of functions. Are the following functions affine, convex, concave, quasi-
convex, quasi-linear, or none of these?

(a) f (x) = max(1/2, x, x2 ).


Solution. This function is a convex, increasing function composed of a constant,
affine, and convex functions of x, so it is convex (and quasiconvex) according to
the composition rules. It is neither concave nor quasiconcave.
(b) f (x) = min(1/2, x, x2 ).
Solution. This function is a concave, increasing function composed of a constant,
affine, and convex functions of x, so it is not possible to analyze with composition
rules.
To show this is not convex, we construct a counterexample with θ = 1/2, x = 2
and y = 1/2. f (θx + (1 − θ)y) = 1/2, while θf (x) + (1 − θ)f (y) = 3/8. This does
not satisfy Jensen’s inequality for convex functions.
To show this is not concave, we construct a counterexample with θ = 1/2, x = 1/4
and y = 1/2. f (θx + (1 − θ)y) = 9/64, while θf (x) + (1 − θ)f (y) = 5/32. This
does not satisfy Jensen’s inequality for concave functions.
This function is equivalent to the following monotonically increasing function:

 x x≤0 p
2
f (x) = x 0 < xp≤ 1/2
1/2 x > 1/2.

We can see the sublevel and superlevel sets are convex sets of x, since they are
intervals. Therefore, this function is quasilinear.

4
A3.54 Square and reciprocal of convex and concave functions. For each of the following,
determine if the function f is convex, concave, or neither.

(a) f (x) = g(x)2 , where g is convex and nonnegative.


Solution. Using composition rules, we can see that f (x) = h(g(x)) if h(x) = x2 .
h(x) is convex and increasing with nonnegative input, and since g is convex and
nonnegative, f (x) is convex by the composition rule.
(b) f (x) = 1/g(x), where g is concave and positive.
Solution. Using composition rules, we can see that f (x) = h(g(x)) if h(x) = 1/x.
h(x) is convex and decreasing with positive input, and since g is concave and
positive, f (x) is convex by the composition rule.

A3.57 Conjugate of the positive part function. Let f (x) = (x)+ = max{0, x} for x ∈ R. (This
function has various names, such as the positive part of x, or ReLU for Rectified Linear
Unit in the context of neural networks.) What is f ∗ ?
Solution. We have f ∗ (y) = maxx (xy −f (x)). As a function of x, xy −f (x) is piecewise
linear, with one kink at x = 0. Its slope for x < 0 is y and its slope for x > 0 is y − 1.
For the max to be finite, we need y ≥ 0 (left slope is nonnegative) and y − 1 ≤ 0 (right
slope is nonpositive). When this occurs, the max is 0. So we have

∗ 0 0≤y≤1
f (y) =
∞ otherwise.

This is the indicator function of the set [−1, 1].

A3.59 Distances between finite probability distributions. We describe a probability distribution


on n outcomes as a vector p ∈ Rn+ with 1T p = 1. Suppose p and q are two such
probability distributions. There are several ways to define a distance or deviation
d(p, q) between p and q. Show that each of the metrics below is a convex function of
(p, q).

(a) Max difference in probability. We take

dmp (p, q) = max{| prob(S; p) − prob(S; q)| | S ⊆ {1, . . . , n}},

where prob(S;Pp) is the probability of the event S under the distribution p, i.e.,
prob(S; p) = i∈S pi . In addition to showing that dmp is convex, express it in a
simple explicit form involving a norm of p − q.
P
Solution. The probability of the event S under distribution p, i∈S pi , is linear
in p, with the same holding for prob(S; q) and q, so the difference in probabilities,
prob(S; p)−prob(S; q) is a linear function of (p, q). Its absolute value is a convex
function of (p, q). The distance dmp is a maximum of 2n of these convex functions,
one for each of the 2n subsets S ⊆ {1, . . . , n}, and so is convex in (p, q).

5
Now we derive a simple analytical expression for dmp . Let’s fix p and q, and work
out what subset S ⊆ {1, . . . , n} maximizes prob(S; p) − prob(S; q). Let’s encode
S as a Boolean vector s ∈ {0, 1}n , with si = 1 meaning i ∈ S. With this notation
we have prob(S; p) = sT p, so

prob(S; p) − prob(S; q) = sT (p − q).

We can see that this is maximized by the choice si = 1 when pi ≥ qi and si = 0


when pi < qi , with maximum value

max (prob(S; p) − prob(S; q)) = 1T (p − q)+ .


S

Appling the same argument swapping p and q we obtain

max (prob(S; q) − prob(S; p)) = 1T (q − p)+ = 1T (p − q)− ,


S

where (a)− = max{−a, 0}. So we have

dmp = max{1T (p − q)+ , 1T (p − q)− }.

That’s a nice analytical expression, but we can simplify even a bit more.
We have 1T (p − q) = 0, since 1T p = 1T q = 1. From

1T (p − q) = 1T (p − q)+ − 1T (p − q)−

we see that 1T (p − q)+ = 1T (p − q)− so the two terms in the max above are
equal. Their sum is 1T (p − q)+ + 1T (p − q)− = kp − qk1 , so each of the terms is
(1/2)kp − qk1 , and we have
1
dmp = kp − qk1 .
2
(This expression makes it very clear that it is convex in (p, q).)
(b) Hellinger distance. The Hellinger distance is defined as
n
X √ √
dhe (p, q) = ( pi − q i )2 .
i=1

Solution. We expand the squares in the Hellinger distance to get


n n
X √ X√
dhe (p, q) = (pi − 2 pi qi + qi ) = 2 − 2 pi qi .
i=1 i=1

Each term pi qi is concave, so the sum is. Multiplying by −2 yields a convex
function, and adding 2 preserves convexity.

6
Remark. There are many others, for example the usual `2 -norm kp − qk2 or the
Kullback-Leibler divergence
n
X
kl
d (p, q) = pi log(pi /qi ),
i=1

which are also convex in (p, q). (See Convex Optimization, Example 3.19.)

A4.2 ‘Hello World’ in CVX*. Use CVX* to verify the optimal values you obtained (analyt-
ically) for exercise 4.1 in Convex Optimization.
Solution.

(a) p? = 0.6
(b) p? = −∞
(c) p? = 0
1
(d) p? = 3
1
(e) p? = 2

Here’s the code in Python/CVXPY:

import cvxpy as cvx


import numpy as np

x1 = cvx.Variable()
x2 = cvx.Variable()
constraints = [2*x1 + x2 >= 1, x1 + 3*x2 >= 1, x1 >= 0, x2 >= 0]

# (a)
objective = cvx.Minimize(x1+x2)
prob = cvx.Problem(objective, constraints)
prob.solve()
print(’With obj. x1+x2, p* is %.2f, optimal x1 is %.2f, optimal x2 is %.2f’ \
% (prob.value, x1.value, x2.value))

# (b)
objective = cvx.Minimize(-x1-x2)
prob = cvx.Problem(objective, constraints)
prob.solve()
print(’With obj. -x1-x2, status is ’ + prob.status)

# (c)
objective = cvx.Minimize(x1)

7
prob = cvx.Problem(objective, constraints)
prob.solve()
print(’With obj. x1, p* is %.2f, x1* is %.2f, x2* is %.2f’ \
% (prob.value, x1.value, x2.value))

# (d)
objective = cvx.Minimize(cvx.maximum(x1, x2))
prob = cvx.Problem(objective, constraints)
prob.solve()
print(’With obj. max(x1,x2), p* is %.2f, x1* is %.2f, x2* is %.2f’ \
% (prob.value, x1.value, x2.value))

# (e)
objective = cvx.Minimize(cvx.square(x1) + 9*cvx.square(x2))
prob = cvx.Problem(objective, constraints)
prob.solve()
print(’With obj. x1^2+9x2^2, p* is %.2f, x1* is %.2f, x2* is %.2f’ \
% (prob.value, x1.value, x2.value))

A4.28 Probability bounds. Consider random variables X1 , X2 , X3 , X4 that take values in {0, 1}.
We are given the following marginal and conditional probabilities:

prob(X1 = 1) = 0.9,
prob(X2 = 1) = 0.9,
prob(X3 = 1) = 0.1,
prob(X1 = 1, X4 = 0 | X3 = 1) = 0.7,
prob(X4 = 1 | X2 = 1, X3 = 0) = 0.6.

Explain how to find the minimum and maximum possible values of prob(X4 = 1), over
all (joint) probability distributions consistent with the given data. Find these values
and report them.
Hints. (You should feel free to ignore these hints.)

• Matlab:
– CVX supports multidimensional arrays; for example, variable p(2,2,2,2)
declares a 4-dimensional array of variables, with each of the four indices taking
the values 1 or 2.
– The function sum(p,i) sums a multidimensional array p along the ith index.
– The expression sum(a(:)) gives the sum of all entries of a multidimensional
array a. You might want to use the function definition sum_all = @(A) sum( A(:));,
so sum_all(a) gives the sum of all entries in the multidimensional array a.

8
• Python:
– Create a 1-dimensional Variable and manually index the entries. You should
come up with a reasonable scheme to avoid confusion.
• Julia:
– You can create a multidimensional array of variables in Convex.jl. For exam-
ple, the following creates a 4-dimensional array of variables, with each of the
four indices taking the values 1 or 2.
p = [Variable() for i in 1:16];
p = reshape(p, 2, 2, 2, 2)
– You can use the function sum to sum over various indices in the multidime-
sional array.
sum(p[:,:,:,:]) # sum all entries
sum(p[1,:,2,:]) # fix first and third indices
– To create constraints with the variables in the array, you need to access each
variable independently. Something like p >= 0 will not work.

Solution. The outcome space for the random variables has 24 = 16 elements, corre-
sponding to all possible 0/1 assignments to the variables. The variable is a probability
disitribution on this set, which we can represent as a vector p ∈ R16 . There are sev-
eral reasonable choices for how we index the vector. One is to index the vector p by
i = 1, . . . , 16, with some encoding of the index, such as

i = 1 + X1 + 2X2 + 4X3 + 8X4 .

Another is to encode the index as 4-tuple pijkl , where i, j, k, l ∈ {1, 2}. (These are
for Matlab encodings; in a real language, we’d start the indices at 0, or represent the
probability distribution as a dictionary, i.e., a set of key-value pairs.)
Now for any event A ⊆ {0, 1}4 , prob(A) is a linear function of p. Indeed, the coefficient
of each index is 0 if the index is not in A and 1 if the index is in A. So for any A,
prob(A) is a linear function of p.
It follows that the first three constraints, on marginal probabilities, are linear equalities
on p: we simply sum the entries of p corresponding to the event (such as X1 = 1) and
constrain the sum to equal the given righthand side value. Conditional probabilities
are ratios of probabilities of events, and so are linear-fractional functions of p. We can
write the constraint
prob(A ∩ B)
prob(A | B) = =α
prob(B)
as
prob(A ∩ B) = α prob(B),
which is a linear equaility constraint on p. Since prob(X4 = 1) is a linear function of
p, maximizing and minimizing it subject to the constraints is just an LP.

9
We find that the range of possible values of prob(X4 = 1) is between 0.48 and 0.61.
The following code solves the problem in Python.

import cvxpy as cvx

for direction in [-1, 1]:


# p(k) corresponds to the arrangement x_1 x_2 x_3 x_4
# where k = x_1 + 2 x_2 + 4 x_3+ 8 x_4
p = cvx.Variable(16)
obj = direction*cvx.sum(p[8:])
cons = [cvx.sum(p) == 1, p >= 0]
cons += [cvx.sum(p[1::2]) == .9]
cons += [cvx.sum(p[2::4]) + cvx.sum(p[3::4]) == .9]
cons += [cvx.sum(p[4:8]) + cvx.sum(p[12:16]) == .1]

cons += [p[5]+p[7] == .7 * (cvx.sum(p[4:8]) + cvx.sum(p[12:16]))]


cons += [p[10]+p[11] == .6 * (p[2]+p[3]+p[10]+p[11])]

cvx.Problem(cvx.Maximize(obj), cons).solve()
print(cvx.sum(p[8:]).value)

A20.9 Energy storage trade-offs. We consider the use of a storage device (say, a battery) to
reduce the total cost of electricity consumed over one day. We divide the day into T
time periods, and let pt denote the (positive, time-varying) electricity price, and ut
denote the (nonnegative) usage or consumption, in period t, for t = 1, . . . , T . Without
the use of a battery, the total cost is pT u.
Let qt denote the (nonnegative) energy stored in the battery in period t. For simplicity,
we neglect energy loss (although this is easily handled as well), so we have qt+1 = qt +ct ,
t = 1, . . . , T − 1, where ct is the charging of the battery in period t; ct < 0 means the
battery is discharged. We will require that q1 = qT + cT , i.e., we finish with the same
battery charge that we start with. With the battery operating, the net consumption
in period t is ut + ct ; we require this to be nonnegative (i.e., we do not pump power
back into the grid). The total cost is then pT (u + c).
The battery is characterized by three parameters: The capacity Q, where qt ≤ Q; the
maximum charge rate C, where ct ≤ C; and the maximum discharge rate D, where
ct ≥ −D. (The parameters Q, C, and D are nonnegative.)

(a) Explain how to find the charging profile c ∈ RT (and associated stored energy
profile q ∈ RT ) that minimizes the total cost, subject to the constraints.
(b) Solve the problem instance with data p and u given in storage_tradeoff_data.*,
Q = 35, and C = D = 3. Plot ut , pt , ct , and qt versus t.

10
(c) Storage trade-offs. Plot the minimum total cost versus the storage capacity Q,
using p and u from storage_tradeoff_data.*, and charge/discharge limits C =
D = 3. Repeat for charge/discharge limits C = D = 1. (Put these two trade-off
curves on the same plot.) Give an interpretation of the endpoints of the trade-off
curves.

Solution.
(a) The problem is an LP,
minimize pT (u + c)
subject to −D1  c  C1, u + c  0, 0  q  Q1
qt+1 = qt + ct , t = 1, . . . , T
q 1 = q T + cT
with variables c, q ∈ RT .
(b) The code for the problem is given in part (c). The results are shown below.

10
u
5 c
uc

−5
0 5 10 15 20 25
t
3

2
pt

0
0 5 10 15 20 25
t
40

30
qt

20

10

0
0 5 10 15 20 25
t

(c) We vary the range of Q from 1 to 150 and solve the LP for the two cases of high
charge/discharge limit and low charge/discharge limit.
The following Python code solves the problem.

11
import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1)

T = 96
t = np.linspace(1, T, num=T).reshape(T,1)
p = np.exp(-np.cos((t-15)*2*np.pi/T)+0.01*np.random.randn(T,1))
u = 2*np.exp(-0.6*np.cos((t+40)*np.pi/T) - 0.7*np.cos(t*4*np.pi/T)+0.01*np.rand

plt.figure(1)
plt.plot(t/4, p);
plt.plot(t/4, u, ’r’);
plt.legend([’p’, ’u’])
plt.show()

# Solver will solve this convex problem given different constants


def solver(Q, C, D):
q = cvx.Variable(shape=(T, 1))
c = cvx.Variable(shape=(T, 1))
obj = p.T@(u+c)
cons = [c >= -D,
c <= C,
q >= 0,
q <= Q,
q[1:] == q[:T-1] + c[:T-1],
q[0] == q[T-1] + c[T-1],
u+c >= 0]
pstar = cvx.Problem(cvx.Minimize(obj), cons).solve()
return c.value, q.value, pstar

# b) Solve the convex problem and plot ut, ct, pt, qt


Q = 35
C, D = 3, 3
c_value, q_value, pstar = solver(Q, C, D)

plt.figure(2)
ts = np.linspace(1, T, num=T).reshape(T,1)/4
plot_list = [c_value, p, q_value]
plot_legend = [’uc’, ’pt’, ’qt’]
for i,ys in enumerate(plot_list):

12
plt.subplot(3, 1,i+1)
plt.plot(ts, ys)
plt.ylabel(plot_legend[i])
if i == 0:
plt.plot(ts, u)
plt.legend([’c’, ’u’])
plt.xlabel(’t’)

plt.savefig(’storage_tradeoff_time_trace.eps’)

# c) Plot the tradeoff curves


N = 31
Qs = np.linspace(0, 150, num=N).reshape(N,1)
Cs = [1,3]
Ds = [1,3]
costs = [np.zeros((N,1)), np.zeros((N,1))]

for j in range(len(Cs)):
C = Cs[j]
D = Ds[j]
for i in range(N):
Q = Qs[i]
_,_, pstar = solver(Q,C,D)
costs[j][i] = pstar

plt.figure(3)
plt.plot(Qs, costs[0], ’g--’)
plt.plot(Qs, costs[1], ’b.-’)
plt.legend([’cost 1’, ’cost 2’])
plt.xlabel(’Q’)
plt.ylabel(’cost’)
plt.savefig(’storage_tradeoff_curve.eps’)

plt.show()
The trade-off curves are shown below, where the blue solid curve corresponds to
C = D = 1 and the green dashed curve corresponds to C = D = 3. The inter-
section of the trade-off curves with the y axis (corresponding to Q = 0) gives the
total cost if there were no battery, pT u. On the right end of the trade-off curves,
the battery capacity constraint is no longer active, so no further reduction in total
cost is obtained. The total cost reduction here is limited by the charge/discharge
limits.

13
460

440

420

400

cost 380

360

340

320

300

280

260
0 50 100 150
Q

14

You might also like