0% found this document useful (0 votes)

101 views33 pages

Sparse Optimization Lecture: Basic Sparse Optimization Models

This document provides an overview of basic sparse optimization models including l1, l21, and nuclear norm models. It discusses applications of these models such as basis pursuit, basis pursuit denoising, and LASSO. It explains how to reformulate sparse optimization problems into standard conic programs that can be solved using conic programming solvers. Examples are provided of how l1 minimization, joint sparsity models, and nuclear norm minimization can be modeled as linear programs, second-order cone programs, or semidefinite programs. Key aspects of interior point methods for solving the resulting conic programs are also summarized.

Uploaded by

Catalin Toma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views33 pages

Sparse Optimization Lecture: Basic Sparse Optimization Models

Uploaded by

Catalin Toma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Sparse Optimization

Lecture: Basic Sparse Optimization Models

Instructor: Wotao Yin

July 2013

online discussions on piazza.com

Those who complete this lecture will know

basic `1 , `2,1 , and nuclear-norm models

some applications of these models
how to reformulate them into standard conic programs
which conic programming solvers to use

1 / 33

Examples of Sparse Optimization Applications

See online seminar at piazza.com

2 / 33

Basis pursuit
min{kxk1 : Ax = b}

find least `1 -norm point on the affine plane {x : Ax = b}

tends to return a sparse point (sometimes, the sparsest)

`1 ball
3 / 33

Basis pursuit
min{kxk1 : Ax = b}
find least `1 -norm point on the affine plane {x : Ax = b}
tends to return a sparse point (sometimes, the sparsest)

`1 ball touches the affine plane

4 / 33

Basis pursuit denoising, LASSO

min{kAx bk2 : kxk1 },

(1a)

min kxk1 + kAx bk22 ,

x
2

(1b)

min{kxk1 : kAx bk2 }.

(1c)

all models allow Ax 6= b

5 / 33

Basis pursuit denoising, LASSO

min{kAx bk2 : kxk1 },

min kxk1 +
x

kAx bk22 ,
2

min{kxk1 : kAx bk2 }.

(2a)
(2b)
(2c)

k k2 is most common for error but can be generalized to loss function L

(2a) seeks for a least-squares solution with bounded sparsity
(2b) is known as LASSO (least absolute shrinkage and selection operator).

it seeks for a balance between sparsity and fitting

(2c) is referred to as BPDN (basis pursuit denoising), seeking for a sparse

solution from tube-like set {x : kAx bk2 }

they are equivalent (see later slides)
in terms of regression, they select a (sparse) set of features (i.e., columns

of A) to linearly express the observation b

6 / 33

Sparse under basis / `1 -synthesis model

min{ksk1 : As = b}
s

(3)

signal x is sparsely synthesized by atoms from , so vector s is sparse

is referred to as the dictionary
commonly used dictionaries include both analytic and trained ones
analytic examples: Id, DCT, wavelets, curvelets, gabor, etc., also their

combinations; they have analytic properties, often easy to compute (for

example, multiplying a vector takes O(n log n) instead of O(n2 ))
can also be numerically learned from training data or partial signal
they can be orthogonal, frame, or general
7 / 33

Sparse under basis / `1 -synthesis model

If is orthogonal, problem (3) is equivalent to

min{k xk1 : Ax = b}
x

(4)

by change of variable x = s, equivalently s = x.

Related models for noise and approximate sparsity:
min{kAx bk2 : k xk1 },
x

min k xk1 +
x

kAx bk22 ,
2

min{k xk1 : kAx bk2 }.

8 / 33

Sparse after transform / `1 -analysis model

min{k xk1 : Ax = b}
x

(5)

Signal x becomes sparse under the transform (may not be orthogonal)

Examples of :
DCT, wavelets, curvelets, ridgelets, ....
tight frames, Gabor, ...
(weighted) total variation

When is not orthogonal, the analysis is more difficult

9 / 33

Example: sparsify an image

10 / 33

(a) DCT coefficients

(b) Haar wavelets

1000

300

800

250

(c) Local variation

200

600

150

400

100

200
10

0.5
1
1.5
2
2.5
sorted coefficient magnitudesx 105

(d) DCT coeffs decay

0
0

0.5
1
1.5
2
2.5
sorted coefficient magnitudesx 105

(e) Haar wavelets

0
0

0.5
1
1.5
2
2.5
sorted coefficient magnitudes x 105

(f) Local variation

Figure: the DCT and wavelet coefficients are scaled for better visibility.

11 / 33

Questions

1. Can we trust these models to return intended sparse solutions?

2. When will the solution be unique?
3. Will the solution be robust to noise in b?
4. Are constrained and unconstrained models equivalent? in what sense?
Questions 14 will be addressed in next lecture.
5. How to choose parameters?
(sparsity), (weight), and (noise level) have different meanings
applications determine which one is easier to set

generality: use a test data set, then scale parameters for real data
cross validation: reserve a subset of data to test the solution

12 / 33

Joint/group sparsity

Joint sparse recovery model:

min{kXk2,1 : A(X) = b}
X

where
kXk2,1 :=

m
X

(6)

k[xi1 xi,2 xin ]k2 .

i=1

`2 -norm is applied to each row of X

`2,1 -norm ball has sharp boundaries across different rows, which tend to

be touched by {X : A(X) = b}, so the solution tends to be row-sparse

also kXkp,q for 1 < p , affects magnitudes of entries on the same row
complex-valued signals are a special case

13 / 33

Joint/group sparsity

Decompose {1, . . . , n} = G1 G2 GS .
non-overlapping groups: Gi Gj = , i 6= j.
otherwise, groups may overlap (modeling many interesting structures).

Group-sparse recovery model:

min{kxkG,2,1 : Ax = b}
x

(7)

where
kxkG,2,1 =

S
X

ws kxGs k2 .

s=1

14 / 33

Auxiliary constraints

Auxiliary constraints introduce additional structures of the underlying signal

into its recovery, which sometimes significantly improve recovery quality
nonnegativity: x 0
bound (box) constraints: l x u
general inequalities: Qx q

They can be very effective in practice. They also generate corners.

15 / 33

Reduce to conic programs

Sparse optimization often has nonsmooth objectives.
Classic conic programming solvers do not handle nonsmooth functions.
Basic idea: model nonsmoothness by inequality constraints.
Example: for given x, we have
kxk1 = min {1T (x1 + x2 ) : x1 x2 = x, x1 0, x2 0}.
x1 ,x2

(8)

Therefore,
min{kxk1 : Ax = b} reduces to a linear program (LP)
minx kxk1 +

kAx
2

bk22 reduces to a bound constrained quadratic

program (QP)
minx {kAx bk2 : kxk1 } reduces to a bound constrained QP
minx {kxk1 : kAx bk2 } reduces to a second-order cone program

(SOCP)
16 / 33

Conic programming
Basic form:
min{cT x : Fx + g K 0, Ax = b.}
x

a K b stands for a b K, which is a convex, closed, pointed cone.

Examples:
n
first orthant (cone): Rn
+ = {x R : x 0}.

norm cone (2nd order cone): Q = {(x, t) : kxk t}

polyhedral cone: P = {x : Ax 0}
positive semidefinite cone: S+ = {X : X 0, XT = X}

Example:
1

"
(x, y, z) :

x
y

y
S+
z

0.5

0
1
1
0

0.5

1 0

x
17 / 33

Linear program
Model
min{cT x : Ax = b, x K 0}
where K is the nonnegative cone (first orthant).
x K 0 x 0.
Algorithms
the Simplex method (move between vertices)
interior-point methods (IPMs) (move inside the polyhedron)
decomposition approaches (divide and conquer)

In primal IPM, x 0 is replaced by its logarithmic barrier:

X
(y) =
log(yi )
i

log-barrier formulation:
min{cT x (1/t)

log(xi ) : Ax = b}

i
18 / 33

Second-order cone program

Model
min{cT x : Ax = b, x K 0}
where K = K1 KK ; each Kk is the second-order cone
q
o
n
Kk = y Rnk : ynk y12 + + yn2 k 1 .
IPM is the standard solver (though other options also exist)
Log-barrier of Kk :
(y) = log yn2 k (y12 + + ynk 1 )

19 / 33

Semi-definite program

Model
min{C X : A(X) = b, X K 0}
n

where K = K1 KK ; each Kk = S+k .

IPM is the standard solver (though other options also exist)
n

Log-barrier of S+k (still a concave function):

(Y) = log det(Y).

20 / 33

(from Boyd & Vandenberghe, Convex Optimization)

properties (without proof): for y K 0,
y T (y) =

(y) K 0,
nonnegative orthant Rn+: (y) =

i=1 log yi

(y) = (1/y1, . . . , 1/yn),

y T (y) = n

positive semidefinite cone Sn+: (Y ) = log det Y

(Y ) = Y 1,

tr(Y (Y )) = n

second-order cone K = {y Rn+1 | (y12 + + yn2 )1/2 yn+1}:

y1
..
2
,

y T (y) = 2
(y) = 2
yn+1 y12 yn2 yn
yn+1

21 / 33

(from Boyd & Vandenberghe, Convex Optimization)

Central path
for t > 0, define x?(t) as the solution of
minimize tf0(x) + (x)
subject to Ax = b
(for now, assume x?(t) exists and is unique for each t > 0)
central path is {x?(t) | t > 0}
example: central path for an LP
minimize cT x
subject to aTi x bi,

i = 1, . . . , 6

hyperplane cT x = cT x?(t) is tangent to

level curve of through x?(t)

x?(10)

22 / 33

Log-barrier formulation:
min{tf0 (x) + (x) : Ax = b}
Complexity of log-barrier interior-point method:
'
&
P
log(( i i )/(t(0) ))
k
log

23 / 33

Primal-dual interior-point methods

more efficient than barrier method when high accuracy is needed
update primal and dual variables at each iteration; no distinction
between inner and outer iterations
often exhibit superlinear asymptotic convergence
search directions can be interpreted as Newton directions for modified
KKT conditions
can start at infeasible points
cost per iteration same as barrier method

24 / 33

`1 minimization by interior-point method

Model
min{kxk1 : kAx bk2 }
x

min min {1T (x1 + x2 ) : x1 0, x2 0, x1 x2 = x, kAx bk2 }

x1 ,x2

min {1T (x1 + x2 ) : x1 0, x2 0, kA(x1 x2 ) bk2 }

x1 ,x2

min {1T x1 + 1T x2 : Ax1 Ax2 + y = b, z = , (x1 , x2 , z, y) K 0}

x1 ,x2

where (x1 , x2 , z, y) K 0 means

x1 , x2 Rn
+,
(t, y) Qm+1 .

Solver: Mosek, SDPT3, Gurobi.

Also, modeling language CVX and YALMIP.
25 / 33

Nuclear-norm minimization by interior-point method

If we can model
min{kXk : A(X) = b}
X

(9)

as an SDP ... (how? see next slide) ...

then, we can also model
minX {kXk : kA(X) bkF }
minX {kA(X) bkF : kXk }
minX kXk +

1
kA(X)
2

bk2F

as well as problems involving tr(X) and spectral norm kXk.

kXk I X 0.

26 / 33

Sparse calculus for `1

inspect |x| to get some ideas:

y, z 0 and yz |x| = 12 (y + z) yz |x|.

moreover, 12 (y + z) = yz = |x| if y = z = |x|.

observe

y, z 0 and

y
yz |x|
x

#
x
0.
z

So,
"

we attain

1
(y
2

y
x

#
x
1
0 = (y + z) |x|.
2
z

+ z) = |x| if y = z = |x|.

Therefore, given x, we have

"
(
0
1
|x| = min
tr(M) :
M
2
0

#
)
1
T
M = x, M = M , M 0 .
0

27 / 33

Generalization to nuclear norm

Consider X Rmn (w.o.l.g., assume m n) and lets try imposing

Y
XT

#
X
0
Z

Diagonalize X = UVT , = diag(1 , . . . , m ), kXk =

"
[UT , VT ]

Y
XT

i .

#"
#
X
U
= UT YU + VT ZV UT XV VT XT U
Z V
= UT YU + VT ZV 2 0.

So, tr(UYUT + VZVT 2) = tr(Y) + tr(Z) 2kXk 0.

To attain =, we can let Y = UUT and Z = Vnn VT .

28 / 33

Therefore,
"
#
)
Y
X
1
(tr(Y) + tr(Z)) :

0
Y,Z
2
XT Z
(
"
#
)
0
I
1
T
= min
tr(M) :
M = X, M = M , M 0 .
M
2
0 0
(

kXk = min

(10)

(11)

Exercise: express the following problems as SDPs

minX {kXk : A(X) = b}
minX kXk +

1
kA(X)
2

bkF

minL,S {kLk + kSk1 : A(L + S) = b}

29 / 33

Practice of interior-point methods (IPMs)

LP, SOCP, SDP are well known and have reliable (commercial,

off-the-shelf) solvers
Yet, the most reliable solvers cannot handle large-scale problems (e.g.,

images, video, manifold learning, distributed stuff, ...)

Example: to recover a still image, there can be 10M variables and 1M

constraints. Even worse, the constraint coefficients are dense. Result: Out
of memory.

Simplex and active-set methods: matrix containing A must be inverted or

factorized to compute the next point (unless A is very sparse).

IPMs approximately solve a Newton system and thus also factorize a

matrix involving A.
Even large and dense matrices can be handled, for sparse optimization,

one should take advantages of the solution sparsity.

Some compressive sensing problems have A with structures friendly for

operations like Ax and AT y.

30 / 33

Practice of interior-point methods (IPMs)

The Simplex, active-set, and IPMs have reliable solvers; good to be the

benchmark.
They have nice interfaces (including CVX and YALMIP, which save you

time.)
CVX and YALMIP are not solvers; they translate problems and then call
solvers; see http://goo.gl/zUlMK and http://goo.gl/1u0xP.
They can return highly accurate solutions; some first-order algorithms

(coming later in this course) do not always.

There are other remedies; see next slide.

31 / 33

Papers of large-scale SDPs

Low-rank factorizations:
S. Burer and R. D. C. Monteiro, A nonlinear programming algorithm for solving
semidefinite programs via low-rank factorization, Math. Program., 95:329357, 2003.
LMaFit, http://lmafit.blogs.rice.edu/

First-order methods for conic programming:

Z. Wen, D. Goldfarb, and W. Yin. Alternating direction augmented Lagrangian methods
for semidefinite programming. Math. Program. Comput., 2(3-4):203230, 2010.

Matrix-free IPMs:
K. Fountoulakis, J. Gondzio, P. Zhlobich. Matrix-free interior point method for compressed
sensing problems, 2012. http://www.maths.ed.ac.uk/~gondzio/reports/mfCS.html

32 / 33

Subgradient methods

Sparse optimization is typically nonsmooth, so it is natural to consider

subgradient methods.
apply subgradient descent to, say, minx kxk1 +

kAx
2

bk22 .

apply projected subgradient descent to, say, minx {kxk1 : Ax = b}.

Good: subgradients are easy to derive, methods are simple to implement.

Bad: convergence requires carefully chosen step sizes (classical ones require
diminishing step sizes). Convergence rate is weak on paper (and in practice,
too?)
Further readings: http://arxiv.org/pdf/1001.4387.pdf,
http://goo.gl/qFVA6, http://goo.gl/vC21a.

33 / 33

Advanced Machine Learning: (COMP 5328)
No ratings yet
Advanced Machine Learning: (COMP 5328)
46 pages
Turbo Codes Principles and Applications
No ratings yet
Turbo Codes Principles and Applications
329 pages
Book of Programming Challenges
No ratings yet
Book of Programming Challenges
12 pages
Tối Ưu Hóa Cho Khoa Học Dữ Liệu
No ratings yet
Tối Ưu Hóa Cho Khoa Học Dữ Liệu
64 pages
2022lectures1-8 Optimization For DataScience
No ratings yet
2022lectures1-8 Optimization For DataScience
35 pages
' - Magic: Recovery of Sparse Signals Via Convex Programming
No ratings yet
' - Magic: Recovery of Sparse Signals Via Convex Programming
19 pages
Sparse Regression and Dictionary Learning
No ratings yet
Sparse Regression and Dictionary Learning
14 pages
Convex Cardinality Optimization
No ratings yet
Convex Cardinality Optimization
26 pages
Gradient Descent With Sparsification: An Iterative Algorithm For Sparse Recovery With Restricted Isometry Property
No ratings yet
Gradient Descent With Sparsification: An Iterative Algorithm For Sparse Recovery With Restricted Isometry Property
8 pages
SSP 4 2 - Modelling 2
No ratings yet
SSP 4 2 - Modelling 2
34 pages
M2 Exam 2022-23 Solutions
No ratings yet
M2 Exam 2022-23 Solutions
12 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
18 pages
A Proximal-Gradient Homotopy Method For The Sparse Least-Squares Problem
No ratings yet
A Proximal-Gradient Homotopy Method For The Sparse Least-Squares Problem
37 pages
Introduction
No ratings yet
Introduction
64 pages
Penalty Decomposition Methods For L - Norm Minimization
No ratings yet
Penalty Decomposition Methods For L - Norm Minimization
26 pages
SESO2018 Wednesday Sagastizabal
No ratings yet
SESO2018 Wednesday Sagastizabal
181 pages
Dynamic ' Reconstruction: Justin Romberg Georgia Tech, ECE Collaborators: M. Salman Asif Aur' Ele Balavoine Chris Rozell
No ratings yet
Dynamic ' Reconstruction: Justin Romberg Georgia Tech, ECE Collaborators: M. Salman Asif Aur' Ele Balavoine Chris Rozell
53 pages
C2 M2 Exam Withsol
No ratings yet
C2 M2 Exam Withsol
12 pages
Sparse Coding and Dictionary Learning For Image Analysis: Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro
No ratings yet
Sparse Coding and Dictionary Learning For Image Analysis: Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro
41 pages
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
No ratings yet
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
29 pages
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
No ratings yet
Enhancing Sparsity by Reweighted Minimization: Michael B. Wakin
29 pages
03 Sparse Approx Algs PDF
No ratings yet
03 Sparse Approx Algs PDF
12 pages
Compressive Sensing Lecture Notes
No ratings yet
Compressive Sensing Lecture Notes
4 pages
Wainwrightslides 2
No ratings yet
Wainwrightslides 2
77 pages
LAD Lasso
No ratings yet
LAD Lasso
15 pages
Figueiredo Et Al. - 2007 - Gradient Projection For Sparse Reconstruction Application To Compressed Sensing and Other Inverse Problems-Annotated
No ratings yet
Figueiredo Et Al. - 2007 - Gradient Projection For Sparse Reconstruction Application To Compressed Sensing and Other Inverse Problems-Annotated
12 pages
Sparse Coding and Dictionary Learning
No ratings yet
Sparse Coding and Dictionary Learning
40 pages
Lecture 24
No ratings yet
Lecture 24
8 pages
Compressive Sensing
No ratings yet
Compressive Sensing
6 pages
Justin Romberg
No ratings yet
Justin Romberg
69 pages
Materials 12 01227 PDF
No ratings yet
Materials 12 01227 PDF
10 pages
6 Approx
No ratings yet
6 Approx
26 pages
Compressive Sensing: Notes
No ratings yet
Compressive Sensing: Notes
4 pages
Fast Solution of ' - Norm Minimization Problems When The Solution May Be Sparse
No ratings yet
Fast Solution of ' - Norm Minimization Problems When The Solution May Be Sparse
45 pages
Representations Creuses Dans Le TduS BD
No ratings yet
Representations Creuses Dans Le TduS BD
46 pages
2023 02 Exam
No ratings yet
2023 02 Exam
5 pages
Cone Programming
No ratings yet
Cone Programming
27 pages
Research Article: Tree-Based Backtracking Orthogonal Matching Pursuit For Sparse Signal Reconstruction
No ratings yet
Research Article: Tree-Based Backtracking Orthogonal Matching Pursuit For Sparse Signal Reconstruction
9 pages
Stable Recovery
No ratings yet
Stable Recovery
15 pages
Lecture 7 (With Notes)
No ratings yet
Lecture 7 (With Notes)
39 pages
Norm Methods For Convex-Cardinality Problems
No ratings yet
Norm Methods For Convex-Cardinality Problems
31 pages
' Minimization: Resolution/ Bandwidth # Samples
No ratings yet
' Minimization: Resolution/ Bandwidth # Samples
11 pages
opt-ml-2025
No ratings yet
opt-ml-2025
46 pages
1.1 Sparsity: Occam's Razor of Modern Signal Processing?
No ratings yet
1.1 Sparsity: Occam's Razor of Modern Signal Processing?
14 pages
9 - Num Lin Alg
No ratings yet
9 - Num Lin Alg
24 pages
AMP Paper
No ratings yet
AMP Paper
16 pages
10-725/36-725 Optimization Midterm Exam: Name
No ratings yet
10-725/36-725 Optimization Midterm Exam: Name
10 pages
Sparse Modelling
No ratings yet
Sparse Modelling
13 pages
Lecture 17 Least Squares, State Estimation
No ratings yet
Lecture 17 Least Squares, State Estimation
29 pages
ECEN615 Fall2022 Lect16-1
No ratings yet
ECEN615 Fall2022 Lect16-1
47 pages
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
No ratings yet
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
86 pages
Quiz Sol 2324
No ratings yet
Quiz Sol 2324
7 pages
Face Recognition With Compressive Sensing: Mathias Lohne Spring, 2017
No ratings yet
Face Recognition With Compressive Sensing: Mathias Lohne Spring, 2017
14 pages
Adaptive Signal Recovery in Sparse Nonparametric Models: Natalia Stepanova and Marie Turcicova
No ratings yet
Adaptive Signal Recovery in Sparse Nonparametric Models: Natalia Stepanova and Marie Turcicova
15 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Convex Problems
No ratings yet
Convex Problems
48 pages
1.1 Background: September 28, 2015 16:10 Sparse Coding and Its Applications in Computer Vision - 9in X 6in b2310
No ratings yet
1.1 Background: September 28, 2015 16:10 Sparse Coding and Its Applications in Computer Vision - 9in X 6in b2310
6 pages
Compressed Sensing Simple Explanation
No ratings yet
Compressed Sensing Simple Explanation
5 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
Nonlinear Optimization: Benny Yakir
No ratings yet
Nonlinear Optimization: Benny Yakir
38 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Moreover For Any Open Set U Which Intersects M, We Have H-Dim ( M U) 2
No ratings yet
Moreover For Any Open Set U Which Intersects M, We Have H-Dim ( M U) 2
41 pages
Alkane Nomenclature
0% (1)
Alkane Nomenclature
1 page
Applied and Computational Harmonic Analysis: Emmanuel J. Candès, Mark A. Davenport
No ratings yet
Applied and Computational Harmonic Analysis: Emmanuel J. Candès, Mark A. Davenport
7 pages
Dyon - Monopole Bound States, Self-Dual Harmonic Forms On The Multi-Monopole Moduli Space, and SL (2, Z) Invariance in String Theory
No ratings yet
Dyon - Monopole Bound States, Self-Dual Harmonic Forms On The Multi-Monopole Moduli Space, and SL (2, Z) Invariance in String Theory
12 pages
Matrix-Lifting Semi-Definite Programming For Decoding in Multiple Antenna Systems
No ratings yet
Matrix-Lifting Semi-Definite Programming For Decoding in Multiple Antenna Systems
15 pages
Compressive Phase Retrieval
No ratings yet
Compressive Phase Retrieval
138 pages
Overview of C For Programming With Mpi and Openmp: Hinnerk ST Uben
No ratings yet
Overview of C For Programming With Mpi and Openmp: Hinnerk ST Uben
31 pages
Complex Hyperbolic Geometry: William M. Goldman September 25, 1998
No ratings yet
Complex Hyperbolic Geometry: William M. Goldman September 25, 1998
79 pages
Folland 2
No ratings yet
Folland 2
18 pages
Measurable Spaces and Measurable Maps
No ratings yet
Measurable Spaces and Measurable Maps
20 pages
Slovene English German Travel Dictionary Dictionary
No ratings yet
Slovene English German Travel Dictionary Dictionary
71 pages
Lie 1
No ratings yet
Lie 1
150 pages
Peran Ipp - Bu Costy
No ratings yet
Peran Ipp - Bu Costy
13 pages
Uninformed Search Strategies
No ratings yet
Uninformed Search Strategies
33 pages
ML Lab
No ratings yet
ML Lab
21 pages
AI Asgmnt5
No ratings yet
AI Asgmnt5
3 pages
Balancing Double Inverted Pendulum On A Cart by Linearization Technique
No ratings yet
Balancing Double Inverted Pendulum On A Cart by Linearization Technique
5 pages
Sridevi Women'S Engineering College: Mini Project Seminar On
No ratings yet
Sridevi Women'S Engineering College: Mini Project Seminar On
23 pages
TOC Solutions Adi
No ratings yet
TOC Solutions Adi
58 pages
Marine Structural Design-16-20
No ratings yet
Marine Structural Design-16-20
5 pages
Planning As Heuristic Search: Blai Bonet, Héctor Geffner
No ratings yet
Planning As Heuristic Search: Blai Bonet, Héctor Geffner
29 pages
Introduction of Asymptotic Notation: Dr. Munesh Singh
No ratings yet
Introduction of Asymptotic Notation: Dr. Munesh Singh
14 pages
Assignment3 Sol PDF
100% (1)
Assignment3 Sol PDF
5 pages
Leet Code
No ratings yet
Leet Code
12 pages
Automatic Liquid Level Controller Using Labview
No ratings yet
Automatic Liquid Level Controller Using Labview
15 pages
Frontmatter
No ratings yet
Frontmatter
9 pages
Stability PDF
No ratings yet
Stability PDF
64 pages
Ec0033-Asic Design-Model Qp-Vii Sem-Set1
No ratings yet
Ec0033-Asic Design-Model Qp-Vii Sem-Set1
1 page
Operation Research Chapter Three 3. Transportation and Assignment Model Chapter Objectives
No ratings yet
Operation Research Chapter Three 3. Transportation and Assignment Model Chapter Objectives
16 pages
Indian Stock Market Prediction Using Deep Learning
No ratings yet
Indian Stock Market Prediction Using Deep Learning
6 pages
CAT1 (Design and Analysis of Algorithms)
No ratings yet
CAT1 (Design and Analysis of Algorithms)
6 pages
Lab 3 PSD
No ratings yet
Lab 3 PSD
7 pages
Exercise 2 - Solution
No ratings yet
Exercise 2 - Solution
4 pages
Speech Emotion Recognition
No ratings yet
Speech Emotion Recognition
6 pages
Computer Paper Class 6
No ratings yet
Computer Paper Class 6
5 pages
CSS Unit-2
No ratings yet
CSS Unit-2
11 pages
Learning Multiple Layers of Features From Tiny Images. Alex Krizhevsky
No ratings yet
Learning Multiple Layers of Features From Tiny Images. Alex Krizhevsky
60 pages
(OR1) Midterm Practice
No ratings yet
(OR1) Midterm Practice
6 pages
10-MAT U8 QP-50M - EM - SivAmoorthy VPM - Kalviexpress - Cropped
No ratings yet
10-MAT U8 QP-50M - EM - SivAmoorthy VPM - Kalviexpress - Cropped
2 pages
DC Fault Detection and Pulsed Lo
No ratings yet
DC Fault Detection and Pulsed Lo
10 pages

Sparse Optimization Lecture: Basic Sparse Optimization Models

Uploaded by

Sparse Optimization Lecture: Basic Sparse Optimization Models

Uploaded by

Sparse Optimization

Lecture: Basic Sparse Optimization Models

Instructor: Wotao Yin

online discussions on piazza.com

Those who complete this lecture will know

basic `1 , `2,1 , and nuclear-norm models

Examples of Sparse Optimization Applications

See online seminar at piazza.com

find least `1 -norm point on the affine plane {x : Ax = b}

`1 ball touches the affine plane

Basis pursuit denoising, LASSO

min kxk1 + kAx bk22 ,

min{kxk1 : kAx bk2 }.

all models allow Ax 6= b

Basis pursuit denoising, LASSO

min{kAx bk2 : kxk1 },

min{kxk1 : kAx bk2 }.

k k2 is most common for error but can be generalized to loss function L

it seeks for a balance between sparsity and fitting

solution from tube-like set {x : kAx bk2 }

of A) to linearly express the observation b

Sparse under basis / `1 -synthesis model

signal x is sparsely synthesized by atoms from , so vector s is sparse

combinations; they have analytic properties, often easy to compute (for

Sparse under basis / `1 -synthesis model

If is orthogonal, problem (3) is equivalent to

by change of variable x = s, equivalently s = x.

min{k xk1 : kAx bk2 }.

Sparse after transform / `1 -analysis model

Signal x becomes sparse under the transform (may not be orthogonal)

When is not orthogonal, the analysis is more difficult

Example: sparsify an image

(a) DCT coefficients

(b) Haar wavelets

(c) Local variation

(d) DCT coeffs decay

(e) Haar wavelets

(f) Local variation

1. Can we trust these models to return intended sparse solutions?

Joint sparse recovery model:

k[xi1 xi,2 xin ]k2 .

`2 -norm is applied to each row of X

be touched by {X : A(X) = b}, so the solution tends to be row-sparse

Group-sparse recovery model:

Auxiliary constraints introduce additional structures of the underlying signal

They can be very effective in practice. They also generate corners.

Reduce to conic programs

bk22 reduces to a bound constrained quadratic

a K b stands for a b K, which is a convex, closed, pointed cone.

norm cone (2nd order cone): Q = {(x, t) : kxk t}

In primal IPM, x 0 is replaced by its logarithmic barrier:

Second-order cone program

where K = K1 KK ; each Kk = S+k .

Log-barrier of S+k (still a concave function):

(from Boyd & Vandenberghe, Convex Optimization)

(y) = (1/y1, . . . , 1/yn),

positive semidefinite cone Sn+: (Y ) = log det Y

second-order cone K = {y Rn+1 | (y12 + + yn2 )1/2 yn+1}:

(from Boyd & Vandenberghe, Convex Optimization)

hyperplane cT x = cT x?(t) is tangent to

Primal-dual interior-point methods

`1 minimization by interior-point method

min min {1T (x1 + x2 ) : x1 0, x2 0, x1 x2 = x, kAx bk2 }

min {1T (x1 + x2 ) : x1 0, x2 0, kA(x1 x2 ) bk2 }

min {1T x1 + 1T x2 : Ax1 Ax2 + y = b, z = , (x1 , x2 , z, y) K 0}

where (x1 , x2 , z, y) K 0 means

Solver: Mosek, SDPT3, Gurobi.

Nuclear-norm minimization by interior-point method

as an SDP ... (how? see next slide) ...

as well as problems involving tr(X) and spectral norm kXk.

Sparse calculus for `1

y, z 0 and yz |x| = 12 (y + z) yz |x|.

moreover, 12 (y + z) = yz = |x| if y = z = |x|.

Therefore, given x, we have

Generalization to nuclear norm

Consider X Rmn (w.o.l.g., assume m n) and lets try imposing

a K b stands for a b K, which is a convex, closed, pointed cone.

min {1T x1 + 1T x2 : Ax1 Ax2 + y = b, z = , (x1 , x2 , z, y) K 0}

where (x1 , x2 , z, y) K 0 means