A Partial Derandomization of Phaselift Using Spherical Designs

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

J Fourier Anal Appl

DOI 10.1007/s00041-014-9361-2

A Partial Derandomization of PhaseLift Using


Spherical Designs

D. Gross · F. Krahmer · R. Kueng

Received: 15 October 2013 / Revised: 20 August 2014


© Springer Science+Business Media New York 2014

Abstract The problem of retrieving phase information from amplitude measurements


alone has appeared in many scientific disciplines over the last century. PhaseLift is
a recently introduced algorithm for phase recovery that is computationally tractable,
numerically stable, and comes with rigorous performance guarantees. PhaseLift is
optimal in the sense that the number of amplitude measurements required for phase
reconstruction scales linearly with the dimension of the signal. However, it specif-
ically demands Gaussian random measurement vectors—a limitation that restricts
practical utility and obscures the specific properties of measurement ensembles that
enable phase retrieval. Here we present a partial derandomization of PhaseLift that only
requires sampling from certain polynomial size vector configurations, called t-designs.
Such configurations have been studied in algebraic combinatorics, coding theory, and
quantum information. We prove reconstruction guarantees for a number of measure-
ments that depends on the degree t of the design. If the degree is allowed to grow
logarithmically with the dimension, the bounds become tight up to polylog-factors.
Beyond the specific case of PhaseLift, this work highlights the utility of spherical
designs for the derandomization of data recovery schemes.

Communicated by Thomas Strohmer.

D. Gross · R. Kueng (B)


Institute for Physics, University of Freiburg, Rheinstraße 10, 79104 Freiburg, Germany
e-mail: richard.kueng@physik.uni-freiburg.de

D. Gross
Freiburg Center for Data Analysis Modeling, Eckerstr. 1, 79104 Freiburg, Germany

F. Krahmer
Institute for Numerical and Applied Mathematics, University of Göttingen, Lotzestraße 16-18, 37083
Göttingen, Germany
J Fourier Anal Appl

Keywords Phase retrieval · PhaseLift · Semidefinite relaxations of nonconvex


quadratic programs · Non-commutative large deviation estimates · Spherical designs ·
Quantum information

Mathematics Subject Classification 90C25 · 49N30 · 62H12 · 60F10

1 Introduction

In this work we are interested in the problem of recovering a complex signal (vector)
x ∈ Cd from an intensity measurement y0 = x22 and amplitude measurements

yi = |ai , x|2 i = 1, . . . , m,

where a1 , . . . , am ∈ Cd are sampling vectors. Problems of this type are abundant in


many different areas of science, where capturing phase information is hard or even
infeasible, but obtaining amplitudes is comparatively easy. Prominent examples for
this case occur in X-ray crystallography, astronomy and diffraction imaging—see
for example [57]. This inverse problem is called phase retrieval and has attracted
considerable interest over the last decades.
It is by no means clear how many such amplitude measurements are necessary
to allow for recovery. Thus from the very beginning, there have been a number of
works regarding injectivity conditions for this problem in the context of the specific
applications [16].
More recently this question has been studied in more abstract terms, asking for
the minimal number of amplitude measurements of the form (1)—without imposing
structural assumptions on the ai ’s—that are required to make the above map injective.
In [8], the authors showed that in the real case (x ∈ Rd ), at least 2d − 1 such
measurements are necessary and generically sufficient to guarantee injectivity, while
in the complex case a generic sample size of m ≥ 4d − 2 suffices. Here generic is to
be understood in the sense that the sets of measurements of such size which do not
allow for recovery form an algebraic variety in the space of all frames. Also, the latter
bound is close to optimal: as shown in [41], it follows from the results derived in [66]
that a sample size of m ≥ (4 + o(1)) d is necessary (cf. [58]). However, finding the
precise bound is still an open problem.
Balan et al. [9] consider the scenario of O(d 2 ) measurements, which form a com-
plex projective 2-design (cf. Definition 3 below). They derive an explicit reconstruction
formula for this setup based on the following observation well known in conic pro-
gramming. Namely, the quadratic constraints on x are linear in the outer product x x ∗ :
 
yi = |ai , x|2 = tr (ai ai∗ )(x x ∗ ) . (1)

This “lifts” the problem to matrix space of dimension d 2 , where it becomes linear and
can be explicitly solved to find the unique solution.
As we will show in Theorem 2, it is, without making additional assumptions on
the 2-design, not possible to use as measurements a random subset of this 2-design
J Fourier Anal Appl

which is of size o(d 2 ). In other words, for the measurement scenario described in [9],
the quadratic scaling in d is basically unavoidable.
To contrast these two extreme approaches, Ref. [8] works with a number of mea-
surements close to the absolute minimum, but there are no tractable reconstruction
schemes provided, the question of numerical stability is not considered, and it is
unclear whether non-generic measurements—i.e., vectors with additional structural
properties—can be employed. On the other hand, the number of measurements in [9]
is much larger, while the measurements are highly structured and there is an explicit
reconstruction method. A number of recent works including this paper aim to balance
between these two approaches, working with a number of measurements only slightly
larger while having at least some of the desired properties mentioned above.
Reference [3] introduces a reconstruction method called polarization that works
for O(d log d) measurements and can handle structured measurement vectors, includ-
ing the masked illumination setup that appears in diffraction imaging [10], where
the measurements are generated by the discrete Fourier transform preceded by a ran-
dom diagonal matrix. For Gaussian measurements, the polarization approach has also
shown to be stable with respect to measurement noise [3]. While simulations seem
to suggest stability also for the derandomized masked illumination setup, a proof of
stability is—to our knowledge—not available yet.
An alternative approach, which we will also follow in this paper, is the PhaseLift
algorithm, which is based on the lifted formulation (1). The algorithm was introduced
in [22] and reconstruction guarantees have been provided in [18,23]. The central obser-
vation is that the matrix x x ∗ , while unknown, is certainly of rank one. This connects the
phase retrievel problem with the young but already extensive field of low-rank matrix
recovery [21,35,54,63]. Over the past years, this research program has rigorously iden-
tified many instances in which low-rank matrices can be efficiently reconstructed from
few linear measurements. The existing results on low-rank matrix recovery were not
directly applicable to phase retrieval, because the measurement matrices ai ai∗ failed to
be sufficiently incoherent in the sense of [21,35] (the incoherence parameter captures
the well-posedness of a low-rank recovery problem). For the case of Gaussian mea-
surement vectors ai , Candès, Strohmer, Voroninski and Li were able to circumvent this
problem, providing problem-specific stable recovery guarantees [18,23] for a number
of measurements of optimal order O(d). For recovery, they use a convex relaxation
of the rank minimization problem, which makes the reconstruction algorithmically
tractable.
It should be noted, however, that because of the significantly increased problem
dimensions, PhaseLift is not as efficient as many phase retrieval algorithms developed
over the last decades in the physics literature (such as [30]) and the optimization
literature (for example [13]). Recently there have been attempts to provide recovery
guarantees for alternating minimization algorithms [61], which are somewhat closer to
the algorithms used in practice, but this direction of research is only at its beginnings.
While the above mentioned recovery guarantees for PhaseLift address the issues
of tractable reconstruction and stability with respect to noise, these results leave open
the question of whether measurement systems with additional structure and less ran-
domness still allow for guaranteed recovery. There are both practical and theoretical
motivations for pursuing such generalizations: A practitioner may be constrained in the
J Fourier Anal Appl

choice of measurements by the application at hand or reduce the amount of random-


ness required for implementation purposes. The most prominent example are again
masked Fourier measurements, which appear as a natural model in diffraction imag-
ing, but a lot of different scenarios imposing different structure are conceivable. From
a theoretical point of view, the use of Gaussian vectors obscures the specific properties
that make phase retrieval possible. As discussed in the following subsection, it is a
common thread in randomized signal processing that results are first established for
Gaussian measurements and later generalized to structured ensembles.
A different direction of research, which will not be pursued in this paper, is to ask
how additional structural assumptions on the signal to be recovered, such as sparsity,
can be incorporated into the theory. A general analysis based on the Gaussian width
of how many measurements are needed to allow for stable recovery of a signal known
to lie in a set T ⊂ Rd is provided in [28]. Notably the results allow for measurements
with arbitrary subgaussian rather than just Gaussian entries. Efficient algorithms for
recovery, however, are not provided. For the case of s-sparse signals, also tractable
recovery algorithms are available: It has been shown in [53] that PhaseLift can recover
x with high probability from Gaussian measurements for a number of measurements
m proportional to s 2 (up to logarithmic factors), which, for small s, can be consider-
ably less than the dimension. In [27], it is shown that only a number of subgaussian
measurements scaling linearly in the sparsity (up to logarithmic factors) is needed if
recovery proceeds using certain greedy algorithms.

1.1 Designs as a General-Purpose Tool for De-randomization


In this paper, we focus on the theoretical aspect: which properties of a measurements
are sufficient for PhaseLift to succeed? We prove recovery guarantees for ensembles of
measurement vectors drawn at random from a finite set whose first 2t moments agree
with those of Haar-random vectors (or, essentially, Gaussian vectors). A configuration
of finite vectors which gives rise to such an ensemble is known as a complex projective
t-design.1 Designs were introduced by Delsarte, Goethals and Seidel in a seminal paper
[25] and have been studied in algebraic combinatorics [70], coding theory [25,60],
and recently in quantum information theory [4,15,36,40,68]. Furthermore, complex
projective 2-designs were the key ingredient for the reconstruction formula for phase
retrieval proposed in [9].
One may see a more general philosophy behind this approach. In the field of sparse
and low-rank reconstruction, a number of recovery results had first been established
for Gaussian measurements. In subsequent works, it has then been proven that mea-
surements drawn at random from certain fixed orthonormal bases are actually suffi-
cient. Examples include uniform recovery guarantees for compressed sensing ([11,19]
vs. [20,65]) and low-rank matrix recovery ([63] vs. [54]), respectively. Typically, the
de-randomized proofs require much higher technical efforts and deliver slightly weaker
results. For a recent survey on structured random measurements in signal processing
see [48].

1 The definition of a t-design varies between authors. In particular, what is called a t-design here (and
in most of the physics literature), would sometimes be referred to as a 2t or even a (2t + 1)-design. See
Sect. 3.3 for our precise definition.
J Fourier Anal Appl

As the number of measurements needed for phase retrieval is larger than the sig-
nal space dimension, one cannot expect these results to exactly carry over to the
phase retrieval setting. Nevertheless, the question remains whether there is a larger,
but preferably not too large, set such that measurements drawn from it uniformly at
random allow for phase retrieval reconstruction guarantees. In some sense, the sam-
pling scenario we seek can be interpreted as an interpolation between the maximally
random setup of Gaussian measurements with an optimal order of measurements and
the construction in [9], which is completely deterministic, but suboptimal in terms of
the embedding dimension. While in this paper, we will focus on the phase retrieval
problem, we remark that such an interpolating approach between measurements drawn
from a basis and maximally random measurements may also be of interest in other sit-
uations where reconstructions from bases are known, but lead to somewhat suboptimal
embedding dimensions.
The concept of t-designs, as defined in Sect. 3.3, provides such an interpolation.
The intuition behind that definition is that with growing t, more and more moments of
the random vector corresponding to a random selection from the t-design agree with
the Haar measure on the unit sphere. In that sense, as t scales up further, t-designs
give better and better approximations to Haar-random vectors.
The utility of this concept as a general-purpose de-randomization tool for Hilbert-
space valued random construtions has been appreciated for example in quantum infor-
mation theory [4,55]. It has been compared [4] to the notion of t-wise independence,
which plays a role for example in the analysis of discrete randomized algorithms [56],
seems to have been long appreciated in coding theory. The smallest t-design in Cd
consists of O(d 2t ) elements. Thus, whenever that lower bound is met, drawing a sin-
gle element from a design requires 2t log d bits, as opposed to 2d bits for a complex
Bernoulli vector—an exponential gap.
From a practical point of view, the usefulness of these concepts hinges on the
availability of constructions for designs. Explicit constructions for any order t and any
dimension d are known [7,40,47,69]—however, they are typically “inefficient” in the
sense that they require a vector set of exponential size. For example, the construction
in [40] uses O(t)d vectors which is exponential in the dimension d.
Tighter analytic expressions for exact designs are notoriously difficult to find.
Designs of degree 2 are widely known [45,46,67,75]. A concrete example is used
for the converse bound in Sect. 7 (as well as for the converse bounds for low-rank
matrix recovery from Fourier-type bases in [35]). For degree 3, both real [70] and
complex [49] designs are known.2 For higher t, there are numerical methods based on
the notion of the frame potential [45,49,64] , non-constructive existence proofs [69],
and constructions in sporadic dimensions (c.f. [5] and references therein).
Importantly, almost-tight randomized constructions for approximate designs for
arbitrary degrees and dimensions are known [4,15,40]. The simplest results [40] show
that collections of Haar-random vectors form approximate t-designs. This indeed can
reduce randomness: One only needs to expend a considerable amount of randomness
once to generate a design—for subsequent applications it is sufficient to sample small

2 While stated only for dimensions that are a power of 2, the results can be used for construtions in arbitrary
dimensions [49].
J Fourier Anal Appl

subsets from it.3 Going further, there have been recent deep results on designs obtained
from certain structured ensembles [15]. We do not describe the details here, as they
are geared toward quantum problems and may have to be substantially modified to
be applicable to phase retrieval. The only connection to phase retrieval to date is the
estimation of pure quantum states [41,59].
Finally we point out that the notion of the frame potential above is no coincidence.
In [6] a frame-theoretic approach to designs is provided, underlining their close con-
nection.

1.2 Main Results

In this paper, we show that spherical designs can indeed be used to partially derandom-
ize recovery guarantees for underdetermined estimation problems; we generalize the
recovery guarantee in [23] to measurements drawn uniformly at random from complex
projective designs, at the cost of a slightly higher number of measurements.

Theorem 1 (Main Theorem) Let x ∈ Cd be the unknown signal. Suppose that x22 is
known and that m measurement vectors a1 , . . . , am have been sampled independently
and uniformly at random from a t-design Dt ⊂ Cd (t ≥ 3). Then, with probability at
least 1 − e−ω , PhaseLift (the convex optimization problem (24) below) recovers x up
to a global phase, provided that the sampling rate exceeds

m ≥ ω Ct d 1+2/t log2 d. (2)

Here ω ≥ 1 is an arbitrary parameter and C is a universal constant.

As the discussion of the previous subsection suggests, the bounds on the sampling rate
decrease as the order of the design increases. For fixed t, and up to poly-log factors,
it is proportional to O(d 1+2/t ). This is sub-quadratic for the regime t ≥ 3 where our
arguments apply. If the degree is allowed to grow logarithmialy with the dimension
(as t = 2 log d), we recover an optimal, linear scaling up to a polylog overhead,
m = O(d log3 d).
In light of the highly structured, analytical and exact designs known for degree 2
and 3, it is of great interest to ask whether a linear scaling can already be achieved for
some small, fixed t. As shown by the following theorem, however, for t = 2 not even
a subquadratic scaling is possible if no additional assumptions are made, irrespective
of the reconstruction algorithm used.

Theorem 2 (Converse bound) Let d be a prime power larger than 2. Then there exists
a 2-design D2 ⊂ Cd and orthogonal, normalized vectors x, z ∈ Cd which have the
following property:
Suppose that m measurement vectors y1 , . . . , ym are sampled independently and
uniformly at random from D2 . Then, for any ω ≥ 0, the number of measurements must
obey

3 The situation is comparable to the use of random graphs as randomness expanders [43].
J Fourier Anal Appl

ω
m≥ d(d + 1),
4
or the event

|ai , x|2 = |ai , z|2 ∀ i ∈ {1, . . . , m}

will occur with probability at least e−ω .

It is worthwhile to put this statement in perspective with other advances in the field.
Throughout our work, we have only demanded that the set of all possible measurement
vectors forms a t-design and have not made any further assumptions. Theorem 2 has
to be interpreted in this regard: The 2-design property alone does not allow for a sub-
quadratic scaling when a “reasonably small” probability of failure is required in the
recovery process.
Note that this does not exclude the possibility that certain realizations of 2-designs
can perform better, if additional structural properties can be exploited. A good exam-
ple for such a measurement process is the multi-illumination setup provided in [17].
In [37] the authors of this paper verified that the set of all measurement vectors
used in the framework of [17] does constitute a 2-design (Lemma 6). Additional
structural properties—most notably a certain correlated Fourier basis structure in the
individual measurements—allowed
 for establishing recovery guarantees already for
m = O d log4 d measurements [17] and m = O(d log2 d) [37], respectively—
which both clearly are sub-quadratic sampling rates.

1.3 Outlook

There are a number of problems left open by our analysis. First, recall that our results
achieve linear scaling up to logarithmic factors only when samples are drawn from
a set of superpolynomial size. Thus it would be very interesting to find out whether
there are polynomial size sets such that sampling from them already achieves such
a scaling, in particular, if t-designs for some fixed t can be used. The case of t = 3
seems particularly important in that regard, since the converse bound (Theorem 2)
shows that a design order of at least 3 is necessary. Also, highly structured 3-designs
are known to exist (see above).
Another important follow-up problem concerns approximate t-designs. While our
main result is phrased for exact t-designs, certain scenarios will only exhibit approxi-
mate design properties. We expect that our proofs can be generalized to such a setup,
but also leave this problem for future work. Lastly, the reconstruction quality for noisy
measurements is also an important issue yet to be investigated.

2 Numerical Experiments

In this section we complement our theoretical results with numerical experiments,


which we have implemented in Matlab using CVX [33,34]. As may have been
expected, these experiments suggest that PhaseLift from designs actually works much
J Fourier Anal Appl

200

180

Number of Measurements 160

140

120

100

80

60

40

20

5 10 15 20 25 30 35 40
Dimension
Fig. 1 Phase diagram for PhaseLift from (projected) stabilizer states, which form an exact 3-design in
dimensions 2n and a weighted one else. The x-axis indicates the problem’s dimension, while the y-axis
denotes the number of independent design measurements performed. The frequency of a successful recovery
over 30 independent runs of the experiment appears color-coded from black (zero) to white (one). To guide
the eye, we have furthermore included a red line indicating m = 4d − 4

better than our main theorem suggests. To be concrete, we use stabilizer states—a
highly structured vector set which is very prominent in quantum information theory
[31,32]. Stabilizer states exist in any dimension, though their properties are some-
what better-behaved in prime power dimensions. In this case, there exists O(d log d )
stabilizer state vectors. Due to their rich combinatorial structure, these vectors can be
constructed efficiently. For dimensions d = 2n that are a power of two, it is known
[49] that the set of stabilizer states forms a 3-design. This statement is false for other
prime power dimensions (d
= 2n for some n), where they only form an exact 2-design.
However, weighted 3-designs can be constructed for arbitrary dimensions d by pro-
jecting down stabilizer states from the next largest power-of-2-dimension 2n obeying
2n−1 < d < 2n [49]. For further clarification of the concept of exact and weighted
t-designs we defer the reader to [68] and references therein.
We have used these vectors in our numerical simulations, the results of which are
depicted in Fig. 1. For each dimension d between 1 and 40 (x-axis) and for each
number of measurements m ranging from 1 to 200 (y-axis), we ran a total of 30
independent experiments. Each such experiment consisted in choosing a Haar-random
(normalized Gaussian) complex vector x as test signal. Then, we drew m projected
stabilizer states uniformly at random and calculated their squared overlap with the
test signal. We then ran PhaseLift on this data and declared the recovery a “success”
if the Forbenius distance between the reconstructed matrix X̃ and the true projection
X = x x ∗ was smaller than 10−3 . Figure 1 depicts the empirical success probability:
Black corresponds to only failures, white to only successes.
J Fourier Anal Appl

We obtain the picture of a relatively sharp phase transition along a line that scales
linearly in the problem dimension. In fact, the transition seems to occur in the vicin-
ity of the line m = 4d − 4 – drawn in red in Fig. 1. This seems to agree with the
conjecture that 4d − 4 measurements are required for injectivity (see e.g. [41]). How-
ever, there are a few differences in the problem setup: Firstly, the conjecture only asks
whether there is a unique solution, while the numerical simulations study whether the
PhaseLift algorithm can find it. Secondly, the conjecture concerns unique solutions
for all possible inputs, while numerically, we estimate the success probability. And
thirdly, the conjecture states that generic measurements work, while our simulations
use a specific random procedure (drawn uniformly from a 3-design) to generate them.

3 Technical Background and Notation

3.1 Vectors, Matrices and Matrix Valued Operators

In this work we require three different objects of linear algebra: vectors, matrices and
operators acting on matrices.
We will work with vectors in a d-dimensional complex Hilbert space V d equipped
with an inner product ·, ·. We refer to the associated induced norm by

z2 = z, z ∀z ∈ V d .

We will denote such vectors by Latin characters. For z ∈ V d , we define the dual vector
z ∗ ∈ (V d )∗ via

z ∗ y = z, y ∀y ∈ V d .

On the level of matrices we will exclusively consider d × d dimensional hermitian


matrices, which we denote by capital Latin characters. Endowed with the Hilbert–
Schmitt (or Frobenius) scalar product

(Z , Y ) = tr(Z Y ), (3)

the space H d becomes a Hilbert space. In addition to that, we will require the 3
different Schatten-norms

Z 1 = tr(|Z |) (trace norm),



Z 2 = tr(Z 2 ) (Frobenius norm),
Z y2
Z ∞ = sup (operator norm),
y∈V d y2

where the second one is induced by the scalar product (3). These three norms are
related via the inequalities
√ √
Z 2 ≤ Z 1 ≤ dZ 2 and Z ∞ ≤ Z 2 ≤ dZ ∞ ∀Z ∈ H d .
J Fourier Anal Appl

We call a hermitian matrix Z positive-semidefinite (Z ≥ 0), if y, Z y ≥ 0 for all


y ∈ V d . Positive semidefinite matrices form a cone [12] (Chapter II,12), which induces
a partial ordering of matrices. Concretely, for Z , Y ∈ H d we write Y ≥ Z if Y − Z
is positive-semidefinite (Y − Z ≥ 0).
In this work, the identity matrix 1 and rank-1 projectors are of particular importance.
They are positive semidefinite and any matrix of the latter kind can be decomposed as
Z = zz ∗ for some z ∈ V d . Up to a global phase, they correspond to vectors z ∈ V d .
The most important cases are the projection onto the unknown signal x and onto the
ith measurement vector ai , respectively. They will be denoted by

X = x x ∗ and Ai = ai ai∗ .

Finally, we will frequently encouter matrix-valued operators acting on the space H d .


We label such objects with capital calligraphic letters and introduce the operator norm

MZ 2
Mop = sup
Z ∈H d Z 2

induced by the Frobenius norm on H d . It turns out that only very few matrix-valued
operators will appear below. These are: the identity map

I : Hd → Hd
Z → Z ∀Z ∈ H d

and (scalar multiples of) projectors onto some matrix Y ∈ H d . The latter corresponds
to

Y : H d → H d
Z → Y (Y, Z ) = Y tr(Y Z ) ∀Z ∈ H d .

The operator

1 : Z → 1 tr(1Z ) = 1 tr(Z ) ∀Z ∈ H d ,

is a very important example for this subclass of operators. Note that it is not a normal-
ized projection, but d1 1 is. Indeed, for Z ∈ H d arbitrary
 2
d −1 1 Z = d −2 1 tr(11 Z ) = d −2 tr(1)1 tr(Z ) = d −1 1 Z . (4)

The notion of positive-semidefiniteness directly translates to matrix valued operators.


Concretely, we call M positive-semidefinite (M ≥ 0) if (Z , MZ ) ≥ 0 for all Z ∈
H d . Again, this induces a partial ordering. Like in the matrix case, we write N ≥ M,
if N − M ≥ 0. It is easy to check that all the operators introduced so far are positive
semidefinite and in particular we obtain the ordering
J Fourier Anal Appl

0 ≤ 1 ≤ dI, (5)

by using (4).

3.2 Multilinear Algebra

The properties of t-designs are most naturally stated in the framework of (t-fold)
tensor product spaces. This motivates recapitulating some basic concepts of multi-
linear algebra that are going to greatly simplify our analysis later on. The concepts
presented here are standard and can be found in any textbook on multilinear algebra.
Our presentation has been influenced in particular by [51,74].
Let V1 , . . . , Vk be (finite dimensional, complex) vector spaces, and let V1∗ , . . . , Vk∗
be their dual spaces. A function

f : V1 × · · · × Vk → C

is multilinear, if it is linear in each Vi , i = 1, . . . , k. We denote the space of such


functions by V1∗ ⊗· · ·⊗Vk∗ and call it the tensor product of V1∗ , . . . , Vk∗ . Consequently,
 ⊗k k
the tensor product V d = i=1 V d is the space of all multilinear functions
 ∗  ∗
f : V d × · · · × V d → C, (6)


k times

and we call the elementary elements z 1 ⊗ · · · ⊗ z k the tensor product of the vectors
z 1 , . . . , z k ∈ V d . Such an element can alternatively be defined more concretely via the
Kronecker product of the individual vectors. However, such a construction requires
an explicit choice of basis in V d which is not the case in (6).
 d ∗maps V → V (d ×d-matrices)
With this notation, the space of linear d d corresponds

to the tensor product M := V ⊗ V
d d which is spanned by y ⊗ z ∗ : y, z ∈ V d —
the set of all rank-1 matrices. For this generating set of M d , we define the trace to be
the natural bilinear map
 ∗
tr : V d ⊗ V d → C
 
y ⊗ z ∗ → z ∗ y = z, y

for all y, z ∈ V d . The familiar notion of trace is obtained by extending this definition
linearly to M d .
 ∗  ⊗k
Using M d = V d ⊗ V d allows us to define the (matrix) tensor product M d
to be the space of all multilinear functions
 ∗   ∗ 
f : Vd × Vd × ··· × Vd × Vd → C


k times
J Fourier Anal Appl

in complete analogy to the above. We call the elements Z 1 ⊗ · · · ⊗ Z k the tensor


product of the matrices Z 1 , . . . , Z k ∈ M d .
On this tensor space, we define the partial trace (over the i-th system) to be
 ⊗k  ⊗(k−1)
tri : M d → Md
Z 1 ⊗ · · · ⊗ Z k → tr(Z i ) (Z 1 ⊗ · · · ⊗ Z i−1 ⊗ Z i+1 ⊗ · · · ⊗ Z k ) .
 ∗
Note that with the identification M d = V d ⊗ V d , tri corresponds to the natural
contraction at position i. The partial trace over more than one system can be obtained
by concatenating individual traces of this form, e.g. for 1 ≤ i < j ≤ k
 ⊗k  ⊗(k−2)
tri, j := tri ◦ tr j : M d → Md .

In particular, the full trace then corresponds to


 ⊗k
tr := tr 1,...,k : M d →C
(Z 1 ⊗ · · · ⊗ Z k ) → tr(Z 1 ) . . . tr(Z k ).
 ⊗k
Let us now return to the tensor space V d of vectors. We define the (symmetrizer)
 d ⊗k  d ⊗k
map PSymk : V → V via their action on elementary elements:

1 
PSymk (z 1 ⊗ · · · ⊗ z k ) := z π(1) ⊗ · · · ⊗ z π(k) , (7)
k!
π ∈Sk

 ⊗k
where Sk denotes the group of permutations of k elements. This map projects V d
 ⊗k
onto the totally symmetric subspace Symk of V d whose dimension [51] is
 
d +k−1
dim Sym = k
. (8)
k

3.3 Complex Projective Designs

The idea of (real) spherical designs originates in coding theory [25] and has been
extended to more general spaces in [42,52,62]. We refer the interested reader to Lev-
enshtein [52] for a unified treatment of designs in general metric spaces and from now
on focus on designs in the complex vector space V d .
Roughly speaking, a complex projective t-design is a finite subset of the complex
unit sphere in V d with the property that the discrete average of any polynomial of
degree t or less equals its uniform average. Many equivalent definitions—see e.g.
[42,46,62]—capture this essence. However, there is a more explicit definition of a
t-design that is much more suitable for our purpose:
J Fourier Anal Appl

Definition 3 (Definition 2 in [68]) A finite set {w1 , . . . , w N } ⊂ V d of normalized


vectors is called a t-design of dimension d if and only if

1 
N
(wi wi∗ )⊗t = dim(Symt )−1 PSymt , (9)
N
i=1

where PSymt denotes the projector onto the totally symmetric subspace (7) of (V d )⊗t
 
and consequently dim(Symt ) = d+t−1t .

Note that the defining property (9) is invariant under global phase changes wi →
eiφ wi , thus it matches the symmetry of the phase retrieval problem. The definition
above is equivalent to demanding


1 
N
∗ ⊗t
(wi wi ) = dw (ww ∗ )⊗t ,
N w
i=1

where the right hand side is integrated with respect to the Haar measure. This form
makes the statement that t-designs mimic the first 2t moments of Haar measure more
explicit.
P. Seymor and T. Zaslavsky proved in [69] that t-designs on V d exist for every
t, d ≥ 1, provided that N is large enough (N ≥ N (d, t)), but they do not give an
explicit construction. A necessary criterion—cf. [42,46]—for the t-design property is
that the number of vectors N obeys
  
d + t/2 − 1 d + t/2−1
N≥ = O(d 2t ). (10)
t/2 t/2

However, the proof in [69] is non-constructive and known constructions are “innefi-
cient” in the sense that the number of vectors required greatly exceeds (10). Hayashi
et al. [40] proposed a construction requiring O(t)d vectors. For real spherical designs
other “inefficient” constructions have been proposed [7,47] (N = t O(d ) ) which can
2

be used to obtain complex projective designs.


Addressing this apparent lack of efficient constructions, Ambainis and Emerson
[4] proposed the notion of approximate desings. These vector sets fulfill property (9)
only up to an -precision, but their great advantage is that they can be constructed
efficiently. Concretely, they show that for every d ≥ 2t, there exists an  = O(d −1/3 )
approximate t-design consisting of O(d 3t ) vectors only.
The great value of t-designs is due to the following fact: If we sample m vectors
ai , . . . , am iid from a t-design Dt = {w1 , . . . , w N }, the design property guarantees
(with Ai = ai ai∗ and Wi = wi wi∗ )
   
1  ⊗k
m   1  ⊗k
N
d + k − 1 −1
E Ai = E A⊗k
1 = Wi = PSymk
m N k
i=1 i=1
J Fourier Anal Appl

for all 1 ≤ k ≤ t. This knowledge about the first t moments of the sampling procedure
is the key ingredient for our partial derandomization of Gaussian PhaseLift [23].

3.4 Large Deviation Bounds

This approach makes heavy use of operator-valued large deviation bounds. They have
been established first in the field of quantum information by Ahlswede and Winter
[2]. Later the first author of this paper and his coworkers successfully applied these
methods to the problem of low rank matrix recovery [35,38]. By now these methods
are widely used and we borrow them in their most recent (and convenient) form from
Tropp [71,72].

Theorem 4 (Uniform Operator Bernstein inequality, [35,72]) Consider a finite


sequence {Mk } of independent, random self-adjoint operators. Assume that each ran-
dom variable satisfies E [Mk ] = 0 and Mk ∞ ≤ R (for some  finite
 constant
 R)
almost surely and define the norm of the total variance σ 2 :=  k E Mk2 ∞ . Then
the following chain of inequalities holds for all t ≥ 0.
    
 t 2 /2 d exp(−3t 2 /8σ 2 ) t ≤ σ 2 /R
Pr  Mk ∞ ≥ t ≤ d exp − ≤
k
σ 2 + Rt/3 d exp(−3t/8R) t ≥ σ 2 /R.


Theorem 5 (Smallest Eigenvalue Bernstein Inequality, [71]) Let S = k Mk be a
sum of iid random matrices Mk which obey E [M K ] = 0 and λmin (M
 k ) ≥ −R almost
surely for some fixed R. With the variance parameter σ 2 (S) =  k E Mk2 ∞ the
following chain of inequalities holds for all t ≥ 0.
  
t 2 /2 d exp(−3t 2 /8σ 2 ) t ≤ σ 2 /R
Pr [λmin (S) ≤ −t] ≤ d exp − ≤
σ 2 + Rt/3 d exp(−3t/8R) t ≥ σ 2 /R.

3.5 Wiring Diagrams

The defining property (9) of t-designs is phrased in terms of tensor spaces. To work with
these notions practically, we need tools for efficiently computing contractions between
high-order tensors. The concept of wiring diagrams provides such a method—see [51]
for an introduction and also [24,73] (however, they use a slightly different notation).
Here, we give a brief description that should suffice for our calculations.
Roughly, the calculus of wiring diagrams associates with every tensor a box, and
with every index of that tensor a line emanating from the box. Two connected lines
represent contracted indices. (More precisely, we place contravariant indices of a
tensor on top of the associated box and covariant ones at the bottom. However, one
should be able to digest our calculations without reference to this detail). A matrix
A : V d → V d can be seen as a two-indexed tensor Ai j . It will thus be represented

by a node A with the upper line corresponding to the index i and the lower one to j.
J Fourier Anal Appl

Two matrices A, B are multiplied by contracting B’s “contravariant” index with A’s
“covariant” one:

(AB)i j = Ai k B k j
k

Pictographically, we write

A
AB =
B

The trace operation



A → tr A = Ak k
k

corresponds to a contraction of the two indices of a matrix:

tr(A) = A .

Tensor products are arranged in parallel:

A⊗B = A B.

Hence, a partial trace takes the following form:

tr2 (A ⊗ B) = A B .

The last ingredient we need are the transpositions σ(i, j) on (V d )⊗t which act by
interchanging the ith and the jth tensor factor. For example

σ(1,2) (x ⊗ y ⊗ · · · ) = y ⊗ x ⊗ · · · ,

with x, y ∈ V d arbitrary. Transpositions suffice, because they generate the full group
 ⊗2
of permutations. For V d we only have

1= (trivial permutation) and σ(1,2) = ,

but for higher tensor systems more permutations can occur. Consequently, permuta-
tions act by interchanging different input and output lines and the wiring diagram
representation allows one to keep track of this pictorially. In fact, only the input and
J Fourier Anal Appl

output position of a line matters. We can use diagrams to simplify expressions by


 ⊗2
disentangling the corresponding lines. Take σ(1,2) on V d as an example. Using
wiring diagrams we can derive the standard result

2
σ(1,2) = = =1

pictorially. We are now ready to prove some important auxiliary results.

Lemma 6 Let A, B ∈ H d be arbitrary. Then it holds that


  1
tr 2 PSym2 A ⊗ B = (tr(B)A + B A) . (11)
2

We remark that in general,

1
PSym2 (X ⊗ Y )
= (X ⊗ Y + Y ⊗ X ) ,
2

which is, in our experience, a common misconception.


Proof of Lemma 6 The basic formula (7) for PSym2 is given by

1  1 
PSym2 = σπ(1),π(2) = 1 + σ(1,2) ,
2 2
π ∈S2

and the concepts from above allow us to translate this into the following wiring dia-
gram:

1
PSym2 = + .
2

 ⊗2
(Note that this operator acts on the full tensor space V d , hence in the wiring
diagram it is represented by a two-indexed box.) Applying the graphical calculus
yields

⎛ ⎞ ⎛ ⎞
A B A B A B A B A
1⎜

⎟ 1⎜
⎟= ⎜


tr2 PSym2 A ⊗ B = PSym2 = + +
2⎝ ⎠ 2⎝ B ⎠

1
= (tr(B)A + BA) ,
2

which is the desired result. 



J Fourier Anal Appl

Obviously, it is also possible to obtain (11) by direct calculation. We have included


such a calculation in the appendix (Sect. 1) to demonstrate the complexity of direct
calculations as compared to graphical ones.
We conclude this section with the following slightly more involved result.
Lemma 7 Let A, B, C ∈ H d be arbitrary. Then it holds that
  1
tr 2,3 PSym3 A ⊗ B ⊗ C = (A tr(B)tr(C) + B A tr(C) + C Atr(B)
6
+A tr(BC) + C B A + BC A) . (12)

The proof can in principle be obtained by evaluating all permutations of 3 tensor


systems algebraically and taking the partial trace afterwards. However, a pictorial
calculation using wiring diagrams is much faster and more elegant.
Proof For permutations of three elements, formula (7) implies

1  1 
PSym3 = σπ(1),π(2),π(3) = σ1,2,3 +σ2,1,3 +σ3,2,1 +σ1,3,2 +σ2,3,1 +σ3,1,2 ,
6 6
π ∈S3

where. σ2,1,3 (u ⊗ v ⊗ w) = (v ⊗ u ⊗ w), etc. This in turn allows us to write

A B C
PSym3

⎛ ⎞
A B C A B C A B C A B C A B C A B C
1⎜



= + + + + +
6⎝ ⎠

⎛ ⎞
A B C A C A B A C A A
1⎜


= + B + C + B + B + C⎟
6⎝ ⎠
C B

1
= (A tr(B)tr(C) + BA tr(C) + CAtr(B) + A tr(BC) + CBA + BCA)
6

and we are done. 




4 Problem Setup

4.1 Modelling the Sampling Process

In the sampling process, we start by measuring the intensity of the signal:

y0 = x22 = tr(1X ). (13)


J Fourier Anal Appl

This allows us to assume w.l.o.g. x2 = 1. Next, we choose m vectors a1 , . . . , am


iid at random from a t-design Dt ⊂ V d and evaluate

yi = tr(Ai X ) = |x, ai |2 for i = 1, . . . m, (14)

and consequently the vector y = (y1 , . . . , ym )T ∈ Rm


+ captures all the information we
obtain from the sampling process. This process can be represented by a measurement
operator

A : H d → Rm ,

m
Z → tr(Ai Z )ei , (15)
i=1

where e1 , . . . , em denotes the standard basis of Rm . Therefore A(X ) = y completely


encodes the measurement process. For technical reasons we also consider the mea-
surement operator

R : Hd → Hd,

m 
m
Z → m −1 (d + 1)d  Ai Z = m −1 (d + 1)d Ai tr(Ai Z ), (16)
i=1 i=1

which is a renormalized version of A∗ A : H d → H d . Concretely

(d + 1)d ∗
R= A A.
m

The scaling is going to greatly simplify our analysis, because it guarantees that R is
“near-isotropic”, as the following result shows.

Lemma 8 (R is near-isotropic) The operator R defined in (16) is near-isotropic in


the sense that

E[R] = I + 1 or E [R] Z = Z + tr(Z )1 ∀Z ∈ H d (17)

Proof Let us start with deriving (17). For Z ∈ H d arbitrary we have

(d + 1)d 
m
E[R]Z = E[Ai tr(Ai Z )]
m
i=1
 
= (d + 1)d tr 2 E[A⊗21 ]1 ⊗ Z (18)
 
= 2 tr 2 PSym2 1 ⊗ Z (19)
 
= Z + 1(tr Z ) = I + 1 Z .
J Fourier Anal Appl

Here, (18) follows from the fact that the ai ’s are chosen iid from a t-design, (19) uses
 −1
the fact that dim(Sym2 ) = d+12 together with Definition 3, and the final line is an
application of Lemma 6. 


Let now x ∈ V d be the signal we want to recover. As in [23] we consider the space
 
T := x z ∗ + zx ∗ : z ∈ V d ⊂ H d (20)

(which is the tangent space of the manifold of all hermitian matrices at the point
X = x x ∗ ). This space is of crucial importance for our analysis. The orthogonal
projection onto this space can be given explicitly:

PT : H d → T,
Z → X Z + Z X − X Z X (21)
= X Z + Z X − (X, Z )X. (22)

We denote the projection onto its orthogonal complement with respect to the Frobenius
inner product by PT⊥ . Then for any matrix Z ∈ H d the decomposition

Z = PT Z + PT⊥ Z =: Z T + Z T⊥

is valid. We point out that in particular

P T 1 P T =  X (23)

holds. We will frequently use this fact. For a proof, consider Z ∈ H d arbitrary and
insert the relevant definitions:

PT 1 PT Z = PT 1 tr(1PT Z ) = (X 1 + 1X − X 1X ) tr (X Z + Z X − X Z X )
= X tr(X Z ) =  X Z .

4.2 Convex Relaxation

Following [8,18,23] the measurements (13) and (14) can be translated into matrix
form by applying the following “lifts”:

X := x x ∗ , and Ai := ai ai∗ .

By doing so the measurements assume the a linear form:

y0 = x22 = (1, X ) = tr(X ),


yi = (Ai , X ) = Tr (Ai X ) i = 1, . . . , m.
J Fourier Anal Appl

Hence, the phase retrieval problem becomes a matrix recovery problem. The solution
to this is guaranteed to have rank 1 and encodes (up to a global phase) the unknown
vector x via X = x x ∗ . Relaxing the rank minimization problem (which would output
the correct solution) to a trace norm minimization yields the now-familiar convex
optimization problem

minarg X  X  1
 
subject to Ai , X  = yi i = 1, . . . m,
 †
X = X ,
tr(X  ) = 1,
X  ≥ 0. (24)

While this convex program is formally equivalent to the previously studied general-
purpose matrix recovery algorithms [21,35,63], there are two important differences:
• The measurement matrices Ai are rank-1 projectors: Ai = ai ai∗ .
• The unknown signal is known to be proportional to a rank-1 projector (X = x x ∗ )
as well.
While the second fact is clearly of advantage for us, the first one makes the problem
considerably harder: In the language of [35], it means that the “incoherence parameter”
μ = d maxi=1,...,m Ai ∞ = dai 22 = d is as large as it can get! Higher values of μ
correspond to more ill-posed problems and as a result, a direct application of previous
low-rank matrix recovery results fails. It is this problem that Refs. [18,23] first showed
how to circumvent for the case of Gaussian measurements. Below, we will adapt these
ideas to the case of measurements drawn from designs, which necessitates following
more closely the approach of [35].

4.3 Well-Posedness/Injectivity

In this section, we follow [23,35] to establish a certain injectivity property of the


measurement operator A. Compared to [23], our injectivity properties are somewhat
weaker. Their proof used the independence of the components of the Gaussian mea-
surement operator, which is not available in this setting, where individual vector com-
ponents might be strongly correlated. We will pay the price for these weaker bounds
in Sect. 6. There, we construct an “approximate dual certificate” that proves that the
sought-for signal indeed minimizes the nuclear norm. Owing to the weaker bounds
found here, the construction is more complicated than in [23]. In the language of [35],
we will have to carry out the full “golfing scheme”, as opposed to the “single leg” that
proved sufficient in [23].
3m
Proposition 9 With probability of failure smaller than d 2 exp(− 384d ) the inequality

0.25d −2 Z 22 < m −1 A(Z )22 (25)

is valid for all matrices Z ∈ T simultaneously.


J Fourier Anal Appl

Proof We aim to show the more general statement


   
−1 3mδ 2
Pr m A(Z )2 < 0.5(1 − δ)Z 2 ∀Z ∈ T ≤ d exp −
2 2 2
96d

for any δ ∈ (0, 1).


For Z ∈ T abritrary use near-isotropicity of R (E[R] = I + 1 ) and observe


m  1
m −1 A(Z )22 = m −1 (tr(Z Ai ))2 = tr(Z m −1 Ai tr(Ai Z )) = tr(Z RZ )
(d + 1)d
i=1 i
1 1
= tr(Z (R − E[R])Z ) + tr(Z (I + 1 )Z )
(d + 1)d (d + 1)d
1 1
= tr(Z PT (R − E[R])PT Z ) + (tr(Z 2 ) + (tr Z )2 )
(d + 1)d (d + 1)d
 
≥ 0.5d −2 tr(Z PT (R − E[R])PT Z ) + tr(Z 2 )

≥ 0.5d −2 (1 + λmin (PT (R − E[R])PT ) Z 22 , (26)

where we have used PT Z = Z as well as M ≥ λmin (M)I for any matrix-valued


operator M. Therefore everything boils down to bounding the smallest eigenvalue of
PT (R − E[R])PT . To this end we aim to apply Theorem 5 and decompose


m
(d + 1)d
PT (R − E[R])PT = (Mi − E[Mi ]) with Mi = P T  Ai P T .
m
i=1

Note that these summands have mean zero by construction. Furthermore observe that
the auxiliary result (23) implies

2 1 1 1 1
− I ≤ − I −  X ≤ − PT IPT − PT 1 PT
m m m m m
= −PT E[Mi ]PT ≤ PT (Mi − E[Mi ])PT

and the a priori bound

λmin (Mi − E[Mi ]) ≥ −2/m =: −R

follows. For the variance we use the standard identity

0 ≤ E[(Mi − E[Mi ])2 ] = E[Mi2 ] − E[Mi ]2 ≤ E[Mi2 ]

and focus on the last expression. Writing it out explicitly yields

(d + 1)2 d 2  
0 ≤ E[Mi2 ] = 2
P T E  Ai P T  Ai P T
m
(d + 1)2 d 2  
= 2
PT E tr(Ai PT Ai ) Ai PT .
m
J Fourier Anal Appl

The trace can be bounded from above by


 
tr(Ai PT Ai ) = tr Ai (X Ai + Ai X − tr(Ai X )X )
= 2 tr(Ai X ) − tr(Ai X )2 ≤ 2 tr(Ai X ),

where we have used the basic definition of PT and 0 ≤ tr(Ai X ) = |ai , x|2 ≤ 1.
Consequently, for Z ∈ T arbitrary

2(d + 1)2 d 2
PT E[Mi2 ]PT Z ≤ PT E [Ai tr(Ai X ) tr(Ai Z )]
m2
2(d + 1)2 d 2  
= 2
PT tr 2,3 E[Ai⊗3 ]1 ⊗ X ⊗ Z
m
12(d + 1)2 d 2  
= PT tr 2,3 PSym3 1 ⊗ X ⊗ Z
m (d + 2)(d + 1)d
2

2d
≤ PT (1 tr(Z ) + X tr(Z ) + Z + 1 tr(X Z ) + Z X + X Z )
m2
2d
= (X tr(X Z ) + X tr(X Z ) + Z + X tr(X Z ) + PT Z + X tr(X Z ))
m2
2d 12d
= (4 X + 2I ) Z ≤ 2 I Z .
m2 m
 −1
Here we have applied dim Sym3 = d+2 3 and Lemma 7 in lines 3 and 4, respectively.
Furthermore we used Z ∈ T —hence PT Z = Z and tr(Z ) = tr(X Z ) – as well as the
basic definition (22) of PT to simplify the terms occurring in the fourth line. Putting
everything together yields

12d
E[(Mi − E[Mi ])2 ] ≤ E[Mi2 ] ≤ I
m2

and we can safely set σ 2 := 12d


m . Now Theorem 5 tells us
 
3mδ 2
Pr [λmin (PT (R − E[R])PT ) ≤ −δ] ≤ d exp − 2
8 × 12d

for all 0 ≤ δ ≤ 1 ≤ 6d = σ 2 /R. This gives the desired bound on the event

{λmin (PT (R − E[R])PT ) ≤ −δ}

occurring. If this is not the case, (26) implies

m −1 A(Z )22 > 0.5d −2 (1 − δ)Z 22

for all matrices Z ∈ T simultaneously. This is the general statement at the beginning
of the proof and setting δ = 1/2 yields Proposition 9. 

J Fourier Anal Appl

Proposition 10 Let A be as above with vectors sampled from a t-design (t ≥ 1). Then
the statement
m −1 A(Z )22 ≤ Z 22 (27)

holds with probability one for all matrices Z ∈ H d simultaneously.

Proof Pick Z ∈ H d arbitrary and observe


   
1  1 
m m
A(Z )22 = (tr(Ai Z )) = tr Z
2
 Ai Z ≤ tr(Z I Z ) = Z 22 ,
m m
i=1 i=1

where we have used 0 ≤  Ai ≤ I. 




Note that equation (27) can be improved. Indeed, a standard application of the
Operator Bernstein inequality (Theorem 4) gives

m −1 A(Z )22 ≤ 2d −1 Z 22

for all matrices Z ∈ T with probability of failure smaller than d 2 exp (−Cm/d) for
some 0 < C ≤ 1. However, we actually do not require this tighter bound.

5 Proof of the Main Theorem/Convex Geometry

In this section, we will follow [21,35] to prove that the convex program (24) indeed
recovers the sought for signal x, provided that a certain geometric object—an approx-
imate dual certificate—exists.

Definition 11 (Approximate dual certificate) Assume that the sampling process cor-
responds to (13) and (14). Then we call Y ∈ H d an approximate dual certificate,
provided that Y ∈ span (1, A1 , . . . , Am ) and

1 1
YT − X 2 ≤ as well as YT⊥ ∞ ≤ . (28)
4d 2

Proposition 12 Suppose that the measurement gives us access to x22 and yi =


|ai , x|2 for i = 1, . . . , m. Then the convex optimization (24) recovers the unknown
x (up to a global phase) provided that (25) holds and an approximate dual certificate
Y exists.

Proof Let X̃ ∈ H d be an arbitrary feasible point of (24) and decompose it as X̃ =


X + . Feasibility then implies A( X̃ ) = A(X ) and A( ) = 0 must in turn hold
for any feasible displacement . Now the pinching inequality [14] (Problem II.5.4)
implies

 X̃ 1 = X + 1 ≥ X 1 + tr( T ) +  ⊥
T 1 .
J Fourier Anal Appl

Consequently X is guaranteed to be the unique minimum of (24), if

tr( T ) +  ⊥
T 1 > 0 (29)

is true for every feasible . In order to show this we combine feasibility of with
inequalities (25) and (27) to obtain

 T 2 < 2dm −1/2 A( T )2 = 2dm −1/2 A( ⊥ ⊥


T )2 ≤ 2d T 2 . (30)

Feasibility of also implies (Y, ) = 0, because by definition Y is in the range of


A∗ . Combining this insight with the defining property (28) of Y and (30) yields

0 = (Y, ) = (YT − X, T ) + (X, T ) + (YT⊥ , ⊥


T)
≤ YT − X 2  T 2 + tr( T ) + YT⊥ ∞  ⊥
T 1
< tr( T ) + YT − X 2 2d ⊥ ⊥ ⊥
T 2 + YT ∞  T 1
≤ tr( T ) + 1/2 ⊥ ⊥
T 2 + 1/2 T 1
≤ tr( T ) +  ⊥
T 1 ,

which is just the desired optimality criterion (29). 




6 Constructing the Dual Certificate

A straightforward approach to construct an approximate dual certificate would be to


set

(d + 1)d 
m
Y = RX − tr(X )1 = Ai tr(Ai X ) − tr(X )1 ∈ span (1, A1 , . . . , Am ) .
m
i=1
(31)
In expectation, E[Y ] = X , which is the “perfect dual certificate” in the sense that the
norm bounds in (28) vanish. The hope would be to use the Operator Bernstein inequal-
ity to show that with high probability, Y will be sufficiently close to its expectation.
It has been shown that a slight refinement of the ansatz (31) indeed achieves this goal
Ref. [35,50]. However, the Bernstein bounds depend on the worst-case operator norm
of the summands. In our case, they can be as large as d 2 |ai , x|2 , which can reach d 2 .
This is far larger than in previous low-rank matrix recovery problems. Reference [23]
relied on the fact that large overlaps |ai , x|2  O(d −1 ) are “rare” for Gaussian ai .
The key observation here is that the t-design property provides one with useful
information about the first t moments of the random variable |x, ai |2 . This knowl-
edge allows us to explicitly bound the probability of “dangerously large overlaps” or
“coherent measurement vectors” occurring.

Lemma 13 (Undesired events) Let x ∈ V d be an arbitrary vector of unit length. If a


is chosen uniformly at random from a t-design (t ≥ 1) Dt ⊂ V d , then the following
is true for every γ ≤ 1:
J Fourier Anal Appl

 
Pr |a, x|2 ≥ 5td −γ ≤ 4−t d −t (1−γ ) . (32)

Proof We aim to prove the slightly more general statement

 
Pr |a, x|2 ≥ (δ + 1)td −γ ≤ δ −t d −t (1−γ ) ,

which is valid for any δ ≥ 1. Setting δ = 4 then yields (32). The t-design property
provides us with useful information about the first t moments of the non-negative
random variable ξ = |a, x|2 . Indeed, with A = aa ∗ it holds for every k ≤ t that

   
E ξ k = E tr(AX )k
   
= tr E A⊗k X ⊗k
 
d + k − 1 −1  
= tr PSymk X ⊗k
k
 
d + k − 1 −1  ⊗k 
= tr X
k
≤ d −k k!,

because X ⊗k is invariant under PSymk . One way of seeing this4 is to note that
range(X ⊗k ) = span(x ⊗k ) and the latter is already contained in Symk . Therefore
the k-th moment τk of ξ is bounded by

 1/k
τk = E[ξ k ] ≤ (d −k k!)1/k ≤ k/d.

These inequalities are tight for the mean μ = τ1 of ξ and hence

μ = E[ξ ] = d −1 .

Now we aim to use the well-known t-th moment bound

Pr [|ξ − μ| ≥ sτt ] ≤ s −t ,

which is a straightforward generalization of Chebyshev’s inequality. Applying it, yields


the desired result. Indeed,

4 Alternatively one could also rearrange tensor systems: X ⊗k = (x x ∗ )⊗k  x ⊗k (x ∗ )⊗k and use
PSymk x ⊗k = x ⊗k .
J Fourier Anal Appl

   
Pr |a, x|2 ≥ (δ + 1)td −γ = Pr ξ − μ ≥ (δ + 1)td −γ − d −1
 
≤ Pr ξ − μ ≥ δtd −γ
 
≤ Pr |ξ − μ| ≥ δd 1−γ τt
≤ δ −t d −t (1−γ ) ,

and we are done. 




The previous lemma bounds the probability of the undesired events


 
E ic = |ai , x|2 ≥ 5td −γ , (33)

where 0 ≤ γ ≤ 1 is a fixed parameter which we refer to as the truncation rate. It turns


out that a single truncation of this kind does not quite suffice yet for our purpose. We
need to introduce a second truncation step.

Definition 14 Fix Z ∈ T arbitrary and decompose it as


 
Z = ζ x z ∗ + zx ∗ ,

for some unique ζ > 0 and z ∈ V d with z2 = 1. For this z we introduce the event
 
G ic := |z, ai |2 ≥ 5td −γ

and define the two-fold truncated operator

(d + 1)d 
m
R Z := Rx,y = 1 E i 1G i  Ai , (34)
m
i=1

where 1 Ei and 1G i denote the indicator functions associated with the events E i and
G i , respectively.

The following result shows that due to Lemma 13 this truncated operator is in
expectation close to the original R.

Proposition 15 Fix Z ∈ T arbitrary and let R Z be as in (34). Then

E[R Z − R]op ≤ 41−t d 2−t (1−γ ) (35)

Proof We start by introducing the auxiliar (singly truncated) operator

(d + 1)d 
m
Raux := 1 E i  Ai
m
i=1
J Fourier Anal Appl

and observe

E [R Z − R] op ≤ E [R − Raux ] op + E [R Z − Raux ] op . (36)

Now use Lemma 13 to bound the first term:


 
 (d + 1)d m
 
 
E[R − Raux ]op =  E (1 − 1 Ei ) Ai 
 m 
i=1 op

(d + 1)d 
m  
≤ E 1 Eic  Ai op
m
i=1

2d 2    2d 2 
m m
≤ E 1 Eic = Pr[E ic ]
m m
i=1 i=1
−t −t (1−γ ) 1−2t 2−t (1−γ )
≤ 2d × 4 d
2
=2 d .

Similarly,
 m 
(d + 1)d    2d 2 
m
 
E [Raux − R Z ]op ≤  E 1G ic  Ai  ≤ E[1G ic ]
m   m
i=1 op i=1

2d 2 
m
≤ Pr[G ic ] ≤ 21−2t d 2−t (1−γ )
m
i=1

and inserting these bounds into (36) yields the desired statement. 


We now establish a technical result which will allow us to find a suitable approximate
dual certificate using the “golfing scheme” construction [35,50].

Proposition 16 Fix Z ∈ T arbitrary, let R Z be as in (34). Assume that that the design
order t is at least 3 and the truncation rate γ satisfies

γ ≤ 1 − 2/t.

Then for 1/4 ≤ b ≤ 1 and c ≥ 2b with probability at least 1 − d exp(− 640td
9mb
2−γ )
one has

PT⊥ (R Z Z − tr(Z )1) ∞ ≤ bZ 2 and (37)


PT (R Z − Z − tr(Z )1) 2 ≤ cZ 2 . (38)

Proof The statement is invariant under rescaling of Z . Therefore it suffices to treat


the case Z 2 = 1. In this case we can decompose

Z = ζ (zx ∗ + x z ∗ )
J Fourier Anal Appl

with some fixed z ∈ V d obeying z2 = 1 and 0 < ζ ≤ 1. Near-Isotropicity (Lemma


8) of R guarantees PT⊥ E[R]Z = tr(Z )PT⊥ I as well as PT E[R]Z = Z + tr(Z )PT 1.
Let us now focus on (37) and use Proposition 15 in order to write

PT⊥ (R Z Z − tr(Z )1) ∞ = PT⊥ (R Z − E[R]) Z ∞


≤ PT⊥ (R Z − E[R Z ]) Z ∞ + PT⊥ E[R Z − R]Z ∞
≤ PT⊥ op (R Z − E[R Z ])Z ∞ + 41−t d 2−t (1−γ ) PT⊥ op Z 2
≤ (R Z − E[R Z ])Z ∞ + b/4.

Here we have used PT⊥ op ≤ 1 as well as

E [R Z − R] op ≤ 41−t d 2−t (1−γ ) ≤ 41−t ≤ 1/16 ≤ b/4, (39)

which follows from γ ≤ 1 − 2/t, t ≥ 3 and b ≥ 1/4. To obtain (38) we use a similar
reasoning:

PT (R Z Z − Z − tr(Z )1) 2 = PT (R Z − E[R]) Z 2



≤ 2PT (R Z − E[R Z ]) Z ∞ + PT E[R Z − R]Z 2

≤ 2PT op (R Z − E[R Z ])Z ∞ + b/4PT op Z 2

≤ 2 (R Z − E[R Z ]) Z ∞ + b/4,

where we have used the fact that PT projects onto a subspace of at most rank-2
matrices in the third line and (39) in the fourth. This motivates to define the event

E := { (R Z − E[R Z ]) Z ∞ ≤ 3b/4}

which guarantees both (37) and (38) due to the assumption on c and Z 2 = 1. So
everything boils down to bounding the probability of E c . We decompose


m
(d + 1)d
(R Z − E[R Z ]) Z = (Mi − E[Mi ]) with Mi = 1 Ei 1G i Ai tr(Ai Z ).
m
i=1

We will estimate this sum using the Operator Bernstein inequality (Theorem 4). Thus
we need an a priori bound for the summands

(d + 1)d 2d 2
Mi ∞ = 1 Ei 1G i Ai ∞ | tr(Ai Z )| ≤ 1 Ei 1G i 2|x, ai ||z, ai |
m m
4d 2 20 2−γ
≤ 5td −γ = td =: R,
m m

as well as a bound for the variance. First observe that


   
E[(Mi − E[Mi ])2 ] = E Mi2 − E[Mi ]2 ≤ E Mi2
J Fourier Anal Appl

and therefore

  (d + 1)2 d 2   (d + 1)2 d 2  
E Mi2 = 2
E 1 Ei 1G i tr(Ai Z )2 Ai2 ≤ 2
E tr(Ai Z )2 Ai2
m m
(d + 1)2 d 2   6(d + 1)d  
⊗3
= tr 2,3 E[Ai ]1 ⊗ Z ⊗ Z = 2 tr 2,3 PSym3 1 ⊗ Z ⊗ Z
m 2 m (d + 2)
(d + 1)d  
= 2 1 tr(Z )2 + Z tr(Z ) + Z + 1 tr(Z 2 ) + 2Z 2
m (d + 2)
8d 8d
≤ 2 Z 22 1 = 2 1.
m m

Here we have used tr(Z ) ≤ 2Z 2 , Z ≤ Z 2 1 and Z 2 = 1. From this we can
conclude
 
  8d
 E[(Mi − E[Mi ])2  ≤ m max E[Mi2 ]∞ ≤ =: σ 2 .
∞ i=1,...,m m
i

Observing that

σ2 8 γ −1 2 3
≤ d ≤ ≤ b,
R 20t 15 4

Theorem 4 yields
 
  3 × 3mb
Pr E c
= Pr [ (R Z − E[R Z ]) Z ∞ > 3b/4] ≤ d exp − ,
8 × 4 × 20td 2−γ

as desired. 


With this ingredient we can now construct a suitable approximate dual certificate
Y , closely following [50].

Proposition 17 Let x ∈ V d be an arbitrary normalized vector (x2 = 1), X = x x ∗


and let ω ≥ 1 be arbitrary. If the design order t (t ≥ 3) and the truncation rate γ are
chosen such that

γ ≤ 1 − 2/t

holds and the total number of measurements fulfills

m ≥ Cωtd 2−γ log2 (d), (40)

then with probability larger than 1 − 0.5e−ω , there exists an approximate dual cer-
tificate Y as in Definition 11. Here, C is a universal constant (which can in principle
be recovered explicitly from the proof).
J Fourier Anal Appl

Algorithm 1: Summary of the randomized “golfing scheme” [35] used in the


proof of Proposition 17 to show the existence of an approximate dual certificate6 .
Input:
X ∈ Hd # signal to be recovered
l∈N # maximum number of iterations
{m i }li=1 ⊂ N # number of measurement vectors used in ith iteration
r # require r “successful” iterations
# (i.e. iterations where we enter the inner if-block)
Initialize:
Y = [] # a list of matrices in H d , initially empty
Q = [X ] # a list of matrices in T , initialized to hold X as its only element
i =1 # number of current iteration
ξ = [0, . . . , 0] # array of l zeros; ξi will be set to 1 if ith iteration succeeds
Body:

while i ≤ l and ij=1 ξ j ≤ r do
set Q to be the last element of Q and Y to be the last element of Y,
sample m i vectors uniformly from the t-design; construct R Q according to Definition 14.
if (37), (38) hold for R Q and Q ∈ T with parameters b = 1/4, c = 1/2 then
ξi = 1
Y ← R Q Q − tr(Q)1 + Y , append Y to Y
Q ← X − PT Y , append Q to Q
end
i ←i +1
end

if li=1 ξi = r then
report success and output Y, Q, ξ
else
report failure
end

Proof The randomized construction of Y is summarized in Algorithm 1. If this algo-


rithm succeeds, it outputs three lists5
Y = [Y1 , . . . , Yr ] , Q = [X, Q 1 , . . . , Q r ] , and ξ = {ξ1 , . . . , ξl } .

The recursive construction yields the following expressions (c.f. [50, Lemma 14]):

Y := Yr = R Qr −1 Q r −1 − tr(Q r −1 )1 + Yr −1
r
 
= R Q i−1 Q i−1 − tr(Q i−1 )1 and
i=1
 
Q i = X − PT Yi = PT Q i−1 + tr(Q i−1 )1 − R Q i−1 Q i−1
  
i
 
= PT I + 1 − R Q i−1 Q i−1 = · · · = PT I + 1 − R Q j−1 X.
j=1

5 The use of pseudo-code allows for a compact presentation of this randomized procedure. However, the
reader should keep in mind that the construction is purely part of a proof and should not be confused with
the recovery algorithm (which is given in Eq. (24)).
J Fourier Anal Appl

We now set
r = log2 d + 2. (41)
Then, in case of success, the validity of properties (37) and (38) for c = 1/2 and
b = 1/4 in each step (Q i → Q i+1 and Yi → Yi+1 , respectively) guarantee


r
1 1
YT − X 2 = Q r 2 ≤ X 2 = 2−log2 d−2 X 2 ≤ ,
2 4d
j=1
r 
  ⊥ 

YT⊥ ∞ ≤ PT R Q i−1 Q i−1 − tr(Q i−1 )1 

i=1
r
1 1  1−i
r
≤ Q i−1 2 ≤ 2 Q 0 ∞
4 4
i=1 i=1


1 1
≤ X ∞ 2−i = .
4 2
i=0

Thus, Yr constitutes an approximate dual certificate in the sense of Definition 11.


What remains to be done is to choose the parameters l and {m i }li=1 such that the
probability of the algorithm failing is smaller than 0.5e−ω . Algorithm 1 fails precisely
if
l
ξi < r. (42)
i=1
Recall that the ξi ’s are Bernoulli random variables which indicate whether the ith
iteration of the algorithm has been successful (ξi = 1), or failed (ξi = 0). Our aim
is to bound the probability of the event in (42) by a similar expression involving
independent 6 Bernoulli variables ξi . To this end, write
 l   
  
l−1 
Pr ξi < r = E Pr ξl < r − ξi ξl−1 , . . . , ξ1 . (43)
i=1 i=1

Conditioned on an arbitrary instance of ξl−1 , . . . , ξ1 , the variable ξl follows a Bernoulli


distribution with some parameter p(ξl−1 , . . . , ξ1 ). Note that if ξ ∼ B( p) is a Bernoulli
variable with parameter p, then for every fixed t ∈ R, the probability Pr ξ ∼B( p) [ξ < t]
is non-increasing as a function of p. Consequently, the estimate
 l   
 
l−1
Pr ξi < r ≤ Pr ξl + ξi < r (44)
i=1 i=1

6 It was pointed out to us by A. Hansen that in some previous papers [35,50] which involve a similar
construction to the one presented here, it was tacitly assumed that the ξi are independent. This will of
course not be true in general. Fortunately, a more careful argument shows that all conclusions remain valid
[1]. Our treatment here is similar to the one presented in [1].
J Fourier Anal Appl

is valid if ξl is an independent p  -Bernoulli distributed with

p ≤ min p(ξl−1 , . . . , ξ1 ).
ξl−1 ,...,ξ1

Proposition 16 provides a uniform lower bound on the success probability p(ξl−1 , . . . ,


ξ1 ). Indeed, there is a universal constant C1 such that invoking Proposition 16 with

m  := C1 td 2−γ log d

and Z = Q gives a probability of success of at least 9/10 for any Q (in particular,
independently of the ξl−1 , . . . , ξ1 ). Thus, choosing p  = 9/10 and m i = m  for all i,
we can then iterate the estimate (44) to arrive at
     l 

l 
l−1 
Pr ξi < r ≤ Pr ξl + ξi < r ≤ · · · ≤ Pr ξi <r , (45)
i=1 i=1 i=1

where the ξi are independent Bernoulli variables with parameter 9/10. A standard
Chernoff bound (e.g. [39, Sect. Concentration: Theorem 2.1]) gives
 

l
ξi ≤ l(9/10 − t) ≤ e−2lt .
2
Pr
i=1

Choosing t = 9/10 − r/l we obtain


     l 

l 
l 
Pr ξi <r ≤ Pr ξi ≤ r = Pr ξi ≤ l (9/10 − t)
i=1 i=1 i=1
  2 
9 r
≤ exp −2l − . (46)
10 l

Setting the number of iterations generously to


 
l = 10ωr = 10ω log2 d + 2

implies
 2  2
9 r 8
2l − ≥ 20ωr ≥ 12ωr ≥ ω + log 2,
10 l 10

where we have used ω ≥ 1 ≥ log 2 in the last inequality. Together with (42), (45) and
(46) this gives the desired bound
 l   l 
    1
Pr algorithm fails = Pr ξi < r ≤ Pr ξi < r ≤ e−ω−log(2) = e−ω ,

2
i=1 i=1
J Fourier Anal Appl

on our construction of Y failing. The total number of measurement vectors sampled is


l
m i = lm  ≤ Cωtd 2−γ log2 d,
i=1

for some constant C. 




Finally we are ready to put all pieces together and show or main result—Theorem
1.
Proof of the Main Theorem In Sect. 5 (Proposition 12) we have shown that the
algorithm (24) recovers the sought for signal x, provided that (25) holds and a suitable
approximate dual certificate Y exists. Proposition 17—with a maximal truncation rate
of γ = (1 − 2/t)—implies that the probability that no such Y can be constructed is
smaller than 0.5e−ω , provided that the total sampling rate m obeys

m ≥ Cωtd 1+2/t log2 d, (47)

for a sufficiently large absolute constant C. Provided that this constant is large enough,
Proposition 9 implies that the probability of (25) failing is also bounded by 0.5e−ω .
Theorem 1 now follows from the union bound over these two probabilities of failure. 

7 Converse Bound

In this paper, we require designs of order at least three. Here we prove that this criterion
is fundamental in the sense that sampling from 2-designs in general cannot guarantee
a sub-quadratic sampling rate. In order to do so, we will use a particular sort of 2-
design, called a maximal set of mutually unbiased bases (MUBs) [45,46,67,75]. Two
orthonormal bases {u i }i=1
d
and {vi }i=1
d
are called mutually unbiased if their overlap is
uniformly minimal. Concretely, this means that

1
|u i , v j |2 = ∀i, j = 1, . . . , d
d

must hold for all i, j = 1, . . . , d. Note that this is just a generalization of the inco-
herence property between standard and Fourier basis. In prime power dimensions, a
maximal set of (d + 1) such MUBs is known to exist (and can be constructed) [44].
Such a set is maximal in the sense that it is not possible to find more than (d + 1)
MUBs in any Hilbert space. Among other interesting properties—cf. [26] for a detailed
survey—maximal sets of MUBs are known to form 2-designs [45,75].
The defining properties of a maximal set of MUBs allow us to derive the converse
bound—Theorem 2.

Theorem 18 (Converse bound) Let d ≥ 2 be a prime power and let D2 ⊂ Cd be


a maximal set of MUBs. Then there exist orthogonal, normalized vectors x, z ∈ Cd
which have the following property:
J Fourier Anal Appl

Suppose that m measurement vectors a1 , . . . , am are sampled independently and


uniformly at random from D2 . Then, for any ω ≥ 0, the number of measurements must
obey
ω
m ≥ d(d + 1), (48)
4
or the event

|ai , x|2 = |ai , z|2 ∀ i ∈ {1, . . . , m}

will occur with probability at least e−ω .

Consequently a scaling of O(d 2 ) in general cannot be avoided when demanding


only the property of being a 2-design and simultaneously requiring a “reasonably
small” probability of failure in the recovery process.
Proof of Theorem 18 Suppose that {u i }i=1
d
is one orthonormal basis contained in the
maximal set of MUBs D2 and set x := u 1 as well as z := u 2 . Note that by definition
these vectors are orthogonal and normalized. Due to the particular structure of MUBs,
x and z can only be distinguished if either u 1 or u 2 is contained in {a1 , . . . , am }. Since
each ai is chosen iid at random from D2 containing (d + 1)d elements, the probability
of obtaining either u 1 or u 2 is p = (d+1)d
2
. As a result, the problem reduces to the
following standard stopping time problem (cf. for example Example (2) in Chapt. 6.2
in [29]):
Suppose that the probability of success in a Bernoulli experiment is p. How many
trials m are required in order for the probability of at least one success to be 1 − eω or
larger?
To answer this question, we have to find the smallest integer m such that

1 − (1 − p)m ≥ 1 − e−ω , or equivalently − m log(1 − p) ≥ ω. (49)

The standard inequality

p
p ≤ − log(1 − p) ≤ ≤ 2p
1− p

for any p ∈ [0, 1/2] implies that (48) is a necessary criterion for (49) and we are done.



8 Conclusion

In this paper we have derived a partly derandomized version of Gaussian PhaseLift [18,
23]. Instead of Gaussian random measurements, our method guarantees recovery for
sampling iid from certain finite vector configurations, dubbed t-designs. The required
sampling rate depends on the design order t:
 
m = O td 1+2/t log2 d . (50)
J Fourier Anal Appl

For small t this rate is worse than the Gaussian analogue—but still non-trivial. How-
ever, as soon as t exceeds 2 log d, we obtain linear scaling up to a polylogarithmic
overhead.
In any case, we feel that the main purpose of this paper is not to present yet another
efficient solution heuristics, but to show that the phase retrieval problem can be deran-
domized using t-designs. These finite vector sets lie in the vast intermediate region
between random Fourier vectors and Gaussian random vectors (the Fourier basis is a
1-design, whereas normalized Gaussian random vectors correspond to an ∞-design).
Therefore the design order t allows us to gradually transcend between these two
extremal cases.

Acknowledgments DG and RK are grateful to the organizers and participants of the Workshop on Phase-
less Reconstruction, held as part of the 2013 February Fourier Talks at the University of Maryland, where
they were introduced to the details of the problem. This extends, in particular, to Thomas Strohmer. The work
of DG and RK is supported by the Excellence Initiative of the German Federal and State Governments (Grant
ZUK 43), by scholarship funds from the State Graduate Funding Program of Baden-Württemberg, and by
the US Army Research Office under contracts W911NF-14-1-0098 and W911NF-14-1-0133 (Quantum
Characterization, Verification, and Validation), and the DFG. FK acknowledges support from the German
Federal Ministry of Education and Reseach (BMBF) through the cooperative research project ZeMat.

Appendix

Here we briefly state an elementary proof of Lemma 6. In the main text we proved this
result using wiring diagrams. The purpose of this is to underline the relative simplicity
of wiring diagram calculations. Indeed, the elementary proof below is considerably
more cumbersome than its pictorial counterpart.

Elementary Proof of Lemma 6

Let us choose an arbitrary orthonormal basis b1 , . . . , bd of V d . In the induced basis


d
bi ⊗ b j i, j=1 of V d ⊗ V d the transpositions then correspond to


d 
d 
d
1=1⊗1= bi bi∗ ⊗ b j b∗j and σ(1,2) = bi b∗j ⊗ b j bi∗ .
i=1 j=1 i, j=1

This choice of basis furthermore allows us to write down tr 2 (A) for A ∈ M d ⊗ M d


explicitly:


d
 
tr 2 (A) = 1 ⊗ bi∗ A (1 ⊗ bi ) .
i=1

Consequently we get for A, B ∈ H d arbitrary


  1 1  
tr 2 PSym2 A ⊗ B = tr 2 (A ⊗ B) + tr 2 σ(1,2) A ⊗ B .
2 2
J Fourier Anal Appl

The latter term can be evaluated explicitly:

  d
 
d
tr 2 σ(1,2) A ⊗ B = 1 ⊗ bk∗ bi b∗j ⊗ b j bi∗ A ⊗ B (1 ⊗ bk )
k=1 i, j=1


d 
d
= bi b∗j Abk∗ b j bi∗ Bbk = bi , Bb j bi b∗j A
i, j,k=1 i, j=1
 d  ⎛ ⎞
 d
= bi bi∗ B⎝ b j b∗j ⎠ A = 1B1A = B A,
i=1 j=1

and the desired resultfollows. Here we have used the basis representation of the
identity, namely 1 = i=1d
bi bi∗ .

References

1. Adcock, B., Hansen, A.C.: Generalized Sampling and Infinite-Dimensional Compressed Sensing,
Technical Report NA2011/02, DAMTP. University of Cambridge, Cambridge (2011)
2. Ahlswede, R., Winter, A.: Strong converse for identification via quantum channels. IEEE Trans. Inform.
Theory 48, 569–579 (2002)
3. Alexeev, B., Bandeira, A.S., Fickus, M., Mixon, D.G.: Phase retrieval with polarization. SIAM J.
Imaging Sci. 7, 35–66 (2014)
4. Ambainis, A., Emerson, J.: Quantum t-designs: t-wise independence in the quantum world. In: Pro-
ceedings of the 22nd Annual IEEE Conference on Computational Complexity, pp. 129–140 ( 2007)
5. Bachoc, C., Venkov, B.: Modular forms, lattices and spherical designs. In: Euclidean Lattices, Spherical
Designs and Modular Forms, pp. 87–111. On the works of Boris Venkov. L’Enseignement Mathéma-
tique, Genève (2001)
6. Bachoc, C., Ehler, M.: Tight p-fusion frames. Appl. Comput. Harmon. Anal. 35, 1–15 (2013)
7. Bajnok, B.: Construction of spherical t-designs. Geom. Dedic. 43, 167–179 (1992)
8. Balan, R., Casazza, P., Edidin, D.: On signal reconstruction without phase. Appl. Comput. Harmon.
Anal. 20, 345–356 (2006)
9. Balan, R., Bodmann, B.G., Casazza, P.G., Edidin, D.: Painless reconstruction from magnitudes of
frame coefficients. J. Fourier Anal. Appl. 15, 488–501 (2009)
10. Bandeira, A.S., Chen, Y., Mixon, D.G.: Phase retrieval from power spectra of masked signals. Inf.
Inference (2014). doi:10.1093/imaiai/iau002
11. Baraniuk, R.G., Davenport, M., DeVore, R.A., Wakin, M.: A simple proof of the restricted isometry
property for random matrices. Constr. approx. 28, 253–263 (2008)
12. Barvinok, A.: A Course in Convexity. American Mathematical Society (AMS), Providence (2002)
13. Bauschke, H.H., Combettes, P.L., Luke, D.R.: Hybrid projection–reflection method for phase retrieval.
JOSA A 20, 1025–1034 (2003)
14. Bhatia, R.: Matrix Analysis. Springer, New York (1996)
15. Brandao, F.G., Harrow, A.W., Horodecki, M.: Local random quantum circuits are approximate
polynomial-designs. arXiv:1208.0692 (2012)
16. Bruck, Y.M., Sodin, L.: On the ambiguity of the image reconstruction problem. Opt. Commun. 30,
304–308 (1979)
17. Candes, E., Li, X., Soltanolkotabi, M.: Phase retrieval from masked fourier transforms. Appl. Comput.
Harmon. Anal. arXiv:1310.3240 (to appear)
18. Candès, E., Li, X.: Solving quadratic equations via PhaseLift when there are about as many equations
as unknowns. Found. Comput. Math., 1–10 (2013)
19. Candès, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inform. Theory 51, 4203–4215
(2005)
J Fourier Anal Appl

20. Candès, E.J., Romberg, J., Tao, T.: Stable signal recovery from incomplete and inaccurate measure-
ments. Commun. Pure Appl. Math. 59, 1207–1223 (2006)
21. Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans.
Inform. Theory 56, 2053–2080 (2010)
22. Candes, E.J., Eldar, Y.C., Strohmer, T., Voroninski, V.: Phase retrieval via matrix completion. SIAM
J. Imaging Sci. 6, 199–225 (2013)
23. Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude
measurements via convex programming. Commun. Pure Appl. Math. 66, 1241–1274 (2013)
24. Cvitanović, P.: Group Theory. Birdtracks, Lie’s, and Exceptional Groups. Princeton University Press,
Princeton (2008)
25. Delsarte, P., Goethals, J., Seidel, J.: Spherical codes and designs. Geom. Dedic. 6, 363–388 (1977)
26. Durt, T., Englert, B.-G., Bengtsson, I., Życzkowski, K.: On mutually unbiased bases. Int. J. Quantum
Inf. 8, 535–640 (2010)
27. Ehler, M., Fornasier, M., Sigl, J.: Quasi-linear compressed sensing. Multiscale Model. Simul. 12,
725–754 (2014)
28. Eldar, Y.C., Mendelson, S.: Phase retrieval: stability and recovery guarantees. Appl. Comput. Harmon.
Anal. 36, 473–494 (2014)
29. Feller, W.: An Introduction to Probability Theory and its Applications. I. Wiley, New York (1968)
30. Fienup, J.R.: Phase retrieval algorithms: a comparison. Appl. Opt. 21, 2758–2769 (1982)
31. Gottesman, D.: Stabilizer codes and quantum error correction. Ph.D. dissertation, California Institute
of Technology (1997)
32. Gottesman, D.: The Heisenberg representation of quantum computers. In: Proceedings of the XXII
International Colloquium on Group Theoretical Methods in Physics, pp. 32–43. International Press,
Cambridge (1999)
33. Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S.,
Kimura, H. (eds.) Recent Advances in Learning and Control, Ser. Lecture Notes in Control and Infor-
mation Sciences, Springer-Verlag Limited, London, pp. 95–110. http://stanford.edu/~boyd/graph_dcp.
html (2008). Accessed 13 Oct 2014
34. Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://
cvxr.com/cvx (2014). Accessed 13 Oct 2014
35. Gross, D.: Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inform.
Theory 57, 1548–1566 (2011)
36. Gross, D., Audenaert, K., Eisert, J.: Evenly distributed unitaries: on the structure of unitary designs. J.
Math. Phys. 48, pp. 052 104, 22 (2007)
37. Gross, D., Krahmer, F., Kueng, R.: Improved recovery guarantees for phase retrieval from coded
diffraction patterns. arXiv:1402.6286 (2014)
38. Gross, D., Liu, Y.-K., Flammia, S.T., Becker, S., Eisert, J.: State tomography via compressed sensing.
Phys. Rev. Lett. 105, 150401 (2010)
39. Habib, M., McDiarmid, C., Ramírez Alfonsín, J., Reed, B. (eds.): Probabilistic Methods for Algorithmic
Discrete Mathematics. Springer, Berlin (1998)
40. Hayashi, A., Hashimoto, T., Horibe, M.: Reexamination of optimal quantum state estimation of pure
states. Phys. Rev. A 72, 032301 (2005)
41. Heinosaari, T., Mazzarella, L., Wolf, M.M.: Quantum tomography under prior information. Commun.
Math. Phys. 318, 355–374 (2013)
42. Hoggar, S.: t-Designs in projective spaces. Eur. J. Comb. 3, 233–254 (1982)
43. Hoory, S., Linial, N., Wigderson, A.: Expander graphs and their applications. Bull. Am. Math. Soc.
43, 439–561 (2006)
44. Klappenecker, A., Rötteler, M.: Constructions of mutually unbiased bases. In: Finite Fields and Appli-
cations. 7th International Conference, Fq 7 , pp. 137–144. Springer, Berlin (2004)
45. Klappenecker, A., Rotteler, M.: Mutually unbiased bases are complex projective 2-designs. In: 2005
IEEE International Symposium on Information Theory (ISIT), vols. 1, 2, pp. 1740–1744 (2005)
46. König, H.: Cubature formulas on spheres. In: Advances in Multivariate Approximation. Proceedings
of the 3rd International Conference on Multivariate Approximation Theory, pp. 201–211. Wiley-VCH,
Berlin (1999)
47. Korevaar, J., Meyers, J.: Chebyshev-type quadrature on multidimensional domains. J. Approx. Theory
79, 144–164 (1994)
J Fourier Anal Appl

48. Krahmer, F., Rauhut, H.: Structured random measurements in signal processing. arXiv:1401.1106
(2014)
49. Kueng, R., Gross, D.: Stabilizer states are complex projective 3-designs in qubit dimensions.
(in preparation)
50. Kueng, R., Gross, D.: RIPless compressed sensing from anisotropic measurements. Lin. Alg. Appl.
441, 110–123 (2014)
51. Landsberg, J.M.: Tensors: Geometry and Applications. American Mathematical Society (AMS),
Providence (2012)
52. Levenshtein, V.I.: Universal bounds for codes and designs. In: Handbook of Coding Theory. Part 1:
Algebraic Coding, vol. 1, pp. 499–648. Elsevier, Amsterdam (1998)
53. Li, X., Voroninski, V.: Sparse signal recovery from quadratic measurements via convex programming.
SIAM J. Math Anal. 45, 3019–3033 (2013)
54. Liu, Y.-K.: Universal low-rank matrix recovery from pauli measurements. Adv. Neural Inf. Process.
Syst. 24, 1638–1646 (2011)
55. Low, R.A.: Large deviation bounds for k-designs. Proc. R. Soc. Lond. Ser. A 465, 3289–3308 (2009)
56. Luby, M., Wigderson, A.: Pairwise independence and derandomization. Print version of Foundations
and Trends in Theoretical Computer Science, Vol. 1, No. 4 (2005), print version of foundations and
trends in theoretical computer science vol. 1, no. 4 (2005) ed. Boston, MA (2006)
57. Millane, R.: Phase retrieval in crystallography and optics. JOSA A 7, 394–411 (1990)
58. Mixon, D.: Short, Fat Matrices, Blog, 2013. http://dustingmixon.wordpress.com/. Accessed 13 Oct
2014
59. Mondragon, D., Voroninski, V.: Determination of all pure quantum states from a minimal number of
observables. arXiv:1306.1214 (2013)
60. Nebe, G., Rains, E., Sloane, N.: The invariants of the Clifford groups. Des. Codes Cryptogr. 24, 99–121
(2001)
61. Netrapalli, P., Jain, P., Sanghavi, S.: Phase retrieval using alternating minimization. In: Advances in
Neural Information Processing Systems, pp. 2796–2804 (2013)
62. Neumaier, A.: Combinatorial Configurations in Terms of Distances. Department of Mathematics Mem-
orandum, Eindhoven (1981)
63. Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via
nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)
64. Renes, J.M., Blume-Kohout, R., Scott, A., Caves, C.M.: Symmetric informationally complete quantum
measurements. J. Math. Phys. 45, 2171–2180 (2004)
65. Rudelson, M., Vershynin, R.: On sparse reconstruction from Fourier and Gaussian measurements.
Commun. Pure Appl. Math. 61, 1025–1045 (2008)
66. Sanderson, B.: Immersions and embeddings of projective spaces. Proc. Lond. Math. Soc. 3(14),
137–153 (1964)
67. Schwinger, J.: Unitary operator bases. Proc. Natl. Acad. Sci. USA 46, 570–579 (1960)
68. Scott, A.: Tight informationally complete quantum measurements. J. Phys. A 39, 13507–13530 (2006)
69. Seymour, P., Zaslavsky, T.: Averaging sets: a generalization of mean values and spherical designs. Adv.
Math. 52, 213–240 (1984)
70. Sidelnikov, V.: Spherical 7-designs in 2n -dimensional Euclidean space. J. Algebr. Comb. 10, 279–288
(1999)
71. Tropp, J. A.: User-friendly tools for random matrices: an introduction, notes. http://users.cms.caltech.
edu/~jtropp/notes/Tro12-User-Friendly-Tools-NIPS.pdf (2012). Accessed 13 Oct 2014
72. Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12, 389–434
(2012)
73. Turaev, V.: Quantum Invariants of Knots and 3-Manifolds. Walter de Gruyter, Berlin (1994)
74. Watrous, J.: Theory of quantum information, lecture notes. https://cs.uwaterloo.ca/~watrous/
LectureNotes.html (2011). Accessed 13 Oct 2014
75. Zauner, G.: Quantendesigns: Grundzüge einer nichtkommutativen Designtheorie. Ph.D. dissertation,
University of Vienna (1999)

You might also like