Matrix and Tensor Completion Using Tensor Ring Dec
Matrix and Tensor Completion Using Tensor Ring Dec
Matrix and Tensor Completion Using Tensor Ring Dec
1088/2632-2153/abcb4f
PAPER
1. Introduction
In various applications of multi-dimensional data processing, the underlying data tensors or matrices are
corrupted by outliers and/or incomplete due to imprecise data acquisition process, or corruption by artifacts
[1, 2]. The problem of recovering an original data tensor from its partially observed entries is called tensor
completion and has found many applications such as in computer vision, data mining, recommender
systems, image inpainting and signal reconstruction. Due to its importance and many practical applications,
many algorithms have been developed to solve this problem. Most of the methods are based on low-rank
matrix and tensor decomposition. For a comprehensive overview of such algorithms, we refer to [3, 4]. The
applicability and performance of these algorithms highly depend on the probability distribution upon which
the data components are missing. Many of these algorithms work quite well only when the underlying
missing components are distributed uniformly and fail to recover the data tensor when a whole part of a data
tensor is missing, e.g. several sequential rows or columns of a frame are missing.
Motivated by such limitations of existing tensor completion techniques, we exploit tensor ring (TR)
decomposition combined with Hankelization [5, 6]. The idea of Hankelization was first proposed in [7] and
recently generalized to tensors in [5] but so far has not been exploited extensively for the TR representation
with sparse constraints. A key pre-processing step is a prior Hankelization procedure (to be discussed in
section 3) which allows us to apply the TR representation and develop an efficient tensor completion
algorithm. The original data tensor is recovered through a so-called inverse Hankelization or
De-Hankelization [5]. In this paper, we propose a new approach for the tensor completion problem. Our
main idea is based on applying dictionary learning technique to the block Hankelized high-order data tensor
in TR format to yield sparse representations of the core tensors. More precisely, in contrast to the existing
Hankelized tensor completion methods [5, 8], we do not complete a Hankelized data tensor directly or using
Tucker decomposition, instead we use constrained TR format. We first initialize the core tensors and then
they are updated sequentially using an efficient dictionary learning procedure applied to each core tensor.
The size of core tensors is smaller than the original data tensor which leads to lower-scale optimization
problems.
The problem is formulated as a convex optimization problem which is solved efficiently by the alternating
direction methods of multiplier (ADMM) [9]. Our main contributions in this paper are as follows:
• Applying Hankelization of incomplete data in order to obtain higher order tensors and represent it in the
TR format.
• Exploiting the dictionary learning technique for sparse representations of the TR core tensors.
• Developing an optimised algorithm, which allows to reconstruct images corrupted by structured distor-
tions, like rows, columns or blocks.
The rest of this paper is organized as follows. In section 2, basic notations and concepts used throughout
the paper are introduced. In section 3, we describe the Hankelization technique and related concepts. The
proposed method and methodology is presented in section 4. Extensive results are illustrated in section 6,
and finally, a conclusion is given in section 7.
In this section, basic notations used throughout the paper are introduced and they are adapted from [10]. A
scalar is denoted by standard lowercase letters, e.g. a, vectors or 1st-order tensors are denoted by boldface
lower case letters, e.g. X, matrices or second order tensors are denoted by boldface capital letters, e.g. X and
higher order tensor are denoted by bold underlined capital letters, e.g. X. Also a tensor graphically
represented as, see figure 1(a). An (i1 , i2 , . . . , iN )th element of an Nth-order tensor X ∈ RI1 ×I2 ×···×IN is
denoted by xi1 ,i2 ,...,iN = X(i1 , i2 , . . . , iN ). The trace and Moore–Penrose pseudoinverse of matrices are
denoted by ‘Tr’ and †, respectively. The multi-index is defined as i1 i2 · · · iN = iN + (iN−1 − 1) IN
+ · · · + (i1 − 1) I2 I3 · · · IN . Let X ∈ RI1 ×I2 ×···×IN be an Nth-order tensor, then the n-unfolding of the
Qn Q
N
tensor X is of size Ii × Ii and denoted by X⟨n⟩ whose elements are defined as (see figure 1)
i=1 i=n+1
X⟨n⟩ i1 · · · in , in+1 · · · iN = X (i1 , i2 , . . . , iN ) .
Q
Similarly, the mode-n unfolding matrix of size In × Ii is denoted by X(n) whose components are
i̸=n
X(n) in , in+1 · · · iN i1 · · · in−1 = X (i1 , i2 , . . . , iN ) .
For a graphical illustration see figure 1(b). Slices are sub-tensors generated by fixing all indices except two of
them and as a result, they are matrices. For example, for a tensor X ∈ RI1 ×I2 ×I3 , the slices
X(:, :, i), i = 1, 2, . . . , I3 , are called frontal slices. The notation ‘Vec’ denotes the vectorization operator which
stacks the columns of a matrix. The inner product of two tensors X , Y ∈ RI1 ×I2 ×···×p IN
is defined as
P P P
⟨X, Y⟩ = i1 i2 . . . iN ai1 ,...,iN bi1 ,...,iN and the induced Frobenius norm is ∥X∥F = ⟨X, X⟩.
The n-mode (matrix) multiplication of a tensor X ∈ RI1 ×I2 ×···×IN and a matrix Q ∈ RJ×In is denoted by
X ×n Q ∈ RI1 ×···×In−1 ×J×In+1 ×···×IN whose components are
X
IN
(X ×n Q)i1 ···in−1 j in+1 ···iN = xi1 i2 ···iN q j in , j = 1, 2, . . . , J. (1)
in =1
2
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Figure 1. Graphical illustration of some tensor operations (for details see [10]).
such as machine learning [21, 28–30], compressing deep neural networks [31–33], tensor completion
[34–37], and hyperspectral image super-resolution [38, 39]. The memory storage requirement of these two
TNs scale linearly with the order of the tensor so they break the curse of dimensionality which is a main
bottleneck in handling high order data tensors [40]. The TT decomposition can be considered as a special
case of the TR decomposition and because of this, throughout the paper, we mainly focus on the TR
decomposition. Let X ∈ RI1 ×I2 ×···×IN be an Nth-order data tensor. The TR decomposition represents the
data tensor X into a sequence of latent core tensors G(n) ∈ RRn−1 ×In ×Rn , n = 1, 2, . . . , N. The element-wise
representation of the data tensor X in the TR format can be expressed as
!
Y
N
X(i1 , i2 , . . . , iN ) ∼
= Tr G (n)
(in ) , (2)
n=1
where G(n) (in ) ∈ RRn−1 ×Rn is the in th lateral slice matrix of the core tensor G(n) ∈ RRn−1 ×In ×Rn . The expanded
form of (2), is
P
R0 RP
N−1
X (i1 , i2 , . . . , iN ) ∼
= ··· G(1) (r0 , i1 , r1 ) · · ·G(N) (rN−1 , iN , r0 ) , (3)
r0 =1 rN−1 =1
and the N-tuple (R0 , R1 , . . . , RN−1 ) is called TR-ranks. Note that in the TR
decomposition, we have R0 = RN
and it is also shown in [15] that the TR-ranks satisfy R0 Rn ⩽ rank X⟨n⟩ for n = 1, 2, …, N. A shorthand
notation for the TR decomposition is as follows [10]:
X∼
=≪ G(1) , G(2) , . . . , G(N) ≫ .
As mentioned earlier, the TT/MPS decomposition is a particular case of the TR/MPS-PBC decomposition,
for R0 = RN = 1. This means that the first and last cores in the TT decomposition are matrices and rest of
them are 3rd-order tensors. For graphical illustration of the TT and the TR decompositions, see
figures 2(a)–(c). For a comprehensive overview on fast algorithms for the TR decomposition, see [41].
3
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Figure 2. Graphical illustration of the tensor train (TT) and tensor ring (TR) decompositions, reprinted from [41].
The operator PΩ (X) projecting the data tensor X onto the observation index tensor Ω is defined as
xi1 ,i2 ,...,iN (i1 , i2 , . . . , iN ) ∈ Ω,
PΩ (X) =
0 Otherwise.
The task of low rank TR completion, can be formulated as the following optimization problem
b
arg min PΩ (X) − PΩ X , (4)
b
X F
where
b =≪ G(1) , G(2) , . . . , G(N) ≫,
X (5)
3. Block Hankelization
A matrix is called a Hankel matrix if all its elements along the skew-diagonals are constant. A Hankel tensor
is a generalization of Hankel matrices. A data tensor X is called a Hankel tensor if all components xi1 ,i2 ,...,iN
for which the quantity i1 + i2 + · · · + iN is fixed, are the same. The procedure of generating a Hankel
matrix/tensor from a given data vector/matrix/tensor is called Hankelization.
4
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Figure 4. Illustration of block Hankelization of a matrix and construction of a 4th-order tensor using row scanning. Similar
tensors can be obtained by column scanning.
We first review basic concepts of Hankelization taken from [5]. From a vector x ∈ RN , we can generate a
Hankel matrix. For example, for a given vector x = (x1 , x2 , . . . , xN )T ∈ RN , and a window size τ , the operator
Hτ (x) : RN → Rτ ×(N−τ +1) generates a Hankel matrix as
x1 x2 · · · xN−τ +1
x2 x3 · · · xN−τ +2
XH = Hτ (X) := .. .. .. .. ∈ Rτ ×(N−τ +1) .
. . . .
xτ xτ + 1 ··· xN
The operator Hτ (X) is also called delay embedding transform of vector X with window size τ [5, 6].
Equivalently the Hankelization procedure can be expressed by a duplication matrix which provides a
relationship between a Hankel matrix XH and a corresponding given vector X. Let XH , be a Hankel matrix,
then T ∈ {0, 1}τ (N−τ +1)×N , is called a duplication matrix [5, 42] if
The operator fold(N,τ ) : Rτ (N−τ +1) → Rτ ×(N−τ +1) which returns back a vectorized form of a Hankel matrix
to the original Hankel matrix is called folding operator, that is
A sample illustration of this definition for a vector of size 8 and window size τ = 5 is presented in figure 3. It
is seen that the duplication matrices includes block of shifted identity matrices.
A block Hankel matrix is defined similarly, where instead of the individual entries of a matrix, its blocks
are repeated along the block skew-diagonals of the matrix. More precisely, from a set of matrices, we can
generate a block Hankel matrix; see figure 4 for an illustration of the block Hankelization procedure.
5
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Figure 5. Illustration of Hankelization procedure for 3rd order tensor which leads to generation of a 5th order tensor.
A vector from which a Hankel matrix was constructed can be obtained through so called inverse
Hankelization [5] as follows:
where T is a duplication matrix. The concept of delay embedding can be generalized to higher order tensors
via multi-way delay embedding [5, 6, 42]. The intuition behind this generalization is that the classical delay
embedding can be considered as a tensor-vector multiplication i.e. X×1 T = TX where for a vector as a first
order tensor, we have only one duplication matrix T. For a matrix X ∈ RI1 ×I2 (which is a second order
tensor), two duplication matrices T1 ∈ {0, 1}τ1 (I1 −τ1 +1)×I1 and T2 ∈ {0, 1}τ2 (I2 −τ2 +1)×I2 are used and we have
Then the operator fold(N,τ ) (X×1 T1 ×2 T2 ) reshapes the data matrix Y to a fourth order Hankel tensor of size
τ1 × (I1 − τ1 + 1) ×τ1 × (I2 − τ2 + 1).
For an Nth-order tensor, we should consider N duplication matrices Tn ∈ {0, 1}τn (In −τn +1)×In ,
n = 1, 2, . . . , N and the multi-way embedded space is defined as [5]
where fold(I,τ ) : Rτ1 (I1 −τ1 +1)×···×τN (IN −τN +1) → Rτ1 ×(I1 −τ1 +1)×···×τN ×(IN −τN +1) transforms an Nth-order
tensor to an 2Nth-order tensor. The N-tuple (τ1 , τ2 , . . . , τN ) indicates the window size of different
duplication matrices corresponding to different modes. Illustration for Hankelization of a matrix and
transforming it to a 4th-order tensor is shown in figure 4. The reverse procedure where the task is retrieving
the original data tensor from the tensor XH was constructed, can be expressed as
6
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
The existing algorithms are based on applying a completion technique to the Hankelized data tensor. In
[5] Tucker decomposition was used, while in [8] the TT is exploited. Our work is quite different from them
in the sense that we learn the core tensors of the TR decomposition of the incomplete data tensor through a
dictionary learning technique and sparse representations.
In this section, we discuss in detail our approach. Our main idea is first Hankelizing the incomplete data
tensor X into XH using row Hankelization approach and then recovering the incomplete Hankelized data
tensor while imposing sparse representation constraint on the core tensors. We will experimentally show that
such a formulation allows us to recover the structured missing patterns of an incomplete data tensor
efficiently and achieve better performance compared to many other existing algorithms for this task. Unlike
the methods in [5, 8], where the completion approaches employ Tucker or the TT representations is applied
to XH and then the original data tensor is reconstructed through inverse Hankelization, we consider the
optimization problem (4) with block Hankelized data tensor X b H (described in section 3 and figures 4 and 5)
and exploit a TR decomposition as
b (1 ) , G
b H =≪ G
X b (2) , . . . , G
b (N) ≫,
(n )
b , n = 1, 2, . . . , N, are estimated using dictionary learning. To this end, we
where the core tensors G
formulate an optimization problem in which the constrained core tensors G, b (n) n = 1, 2, . . . , N, are updated
sequentially to minimize the cost function in (4). Moreover, inspired by [43, 44], we imposed implicit sparse
representation constraint on the core tensors G b (n) , (n = 1, 2, . . . , N). So we formulated the following
constrained optimization problem:
2 P
N
n min o PΩH X b (1) , G
bH− ≪ G b (2 ) , . . . , G
b ( N) ≫ +η C( n ) 1
b(1) ,G
G b(2) ,...,G
b(N) F n =1 (8)
s.t. b (n) = D(n) C(n) ,
G n = 1, 2, . . . , N,
(2)
(n )
b ∈ RIn ×Rn−1 Rn is mode-2 matricization of the core tensor G
where G
(n) b ∈ RRn−1 ×In ×Rn , D(n) ∈ RIn ×Rn−1 Rn is
(2)
a pre-defined dictionary, C(n) ∈ RRn Rn−1 ×Rn Rn−1 is a sparse matrix and η > 0 is a regularization parameter.
The method works well for quite a range of η from 0.05 to 0.3. We chose η = 0.082 to achieve best
performance. It is worth mentioning that the optimization problem (8) is convex with respect to each core
tensor Ge (n) but it is not convex with respect to the whole set of core tensors. As a result, there is not any
guarantee to converge to a global minimum, however we have used a suboptimal approximation. Also, in
contrast to [43], where the sparse representation of unfolding matrices are employed, we impose sparsity
representation constraint on the core tensors of the TR network. This can reduce the computational
complexity of the algorithm because of smaller size of the TR core tensors compared to the unfolding
matrices and also improve the performance. Various dictionary learning algorithms can be used for
computing the dictionary D(n) such as over-complete discrete cosine transform (ODCT) dictionary4
[45, 46], online dictionary learning [47] and group sparse dictionary learning technique [44]. We adopted
for our purpose the ODCT method [46, 48] because it is quite versatile and works for a broad range of data.
In the algorithm proposed in [43], the sparse representation constraints are imposed on the mode-n
unfolding matrices while the nuclear norm minimization is applied on the n-unfolding matrices5 . As a result,
our proposed formulation (8) can be considered as a non-trivial extension and combination of techniques
considered in [43] and [35], where on the one hand, we exploit l1 -norm instead of nuclear norm
minimization [35] and on the other hand, similar to [43], the dictionary learning technique is employed. It is
worth mentioning that ignoring the sparsity constraints, we come up to the algorithm proposed in [43].
According to works by [15, 35, 49], the relation between the core tensor G b (n) and the matricized tensor
can be represented by
b ( n ) (G
XH ⟨n⟩ = G b (̸=n) )T , ∀n, (9)
(2 ) ⟨2⟩
7
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Q
N
b (̸=n) is a 3rd-order tensor of size Rn ×
where G Ii × Rn−1 obtained by merging all other core tensors
i=1,i̸=n
whose slice matrices are computed as
Y −1
(̸=n) N
(j ) nY
b
G in+1 · · · iN i1 · · · in−1 = b
G ij b ( j ) ij .
G
j=n+1 j=1
for n = 1, 2, …, N, where
T 2
H Gb (n) = PΩ b (n) G
b H ⟨n⟩ − G
X b (̸=n) .
H ⟨n ⟩
F
The augmented Lagrangian function corresponding to the constrained optimization problem (10), can
be constructed as
D E
L G b (n) , C(n) , B = H Gb (n) + η C(n) + B, Gb (n) − D(n) C(n)
1
λ b (n) 2
+ G − D(n) C(n) , (11)
2 F
where B is a matrix representing the Lagrangian multipliers and λ is penalty parameter. While the method
works well for λ between 0.1 and 1.1. We set λ = 0.5 in our simulations. All hyper-parameters were
optimized to provide the best performance.
Based on Lagrangian function (11), the ADMM [9] update rules for solving (8) are straightforwardly
converted to simpler optimization problems as follows:
2
b (n) = min H G
G b (n) + λ G
b (n) − D(n) C(n) + B(k) , (12)
k+1 k k k
b(n)
Gk
2 F
λ b (n) 2
(n ) (n ) (n )
Ck+1 = min Ck + Gk − D(n) Ck + B(k) , (13)
(n)
Ck 1 2 F
B(k+1) = B(k) + Gb (n) − D(n) C(n) . (14)
k+1 k+1
It is worth to note that the ADMM algorithm is a workhorse approach for solving a variety of
optimization problems such as sparse inverse covariance selection, generalized and group lasso problems, etc.
For a comprehensive study of this technique and related problems, see [9]. Moreover, this strategy has been
utilized for solving many tensor completion problems: please see [43, 50–53], and for a comprehensive list of
references see the review papers [3, 4] and the references therein.
+λ Gb (n) − D(n) C(n) + B(k) , (15)
k k
where J = L G b (n) , C(n) , B and the first and second terms of (15) are the gradient of the first and second
k
terms of the objective function of optimization problem (12), respectively. The gradient of the first term of
8
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
(15) has been derived in [35] and the gradient of the second term can be derived by the following
straightforward computations:
λ b (n ) 2 λ 2 2
(n )
Gk − D(n) Ck + B(n) = b (n)
G + B(n) − D(n) Ck
(n )
k
2 F 2 F F
T
b (n)
+2Tr G B(n) − D(n) Ck
(n)
,
k
(n )
and taking the gradient with respect to Gk .
λ 2
(n)
F Ck = b (n) + B(k) − D(n) C(n)
G .
k k
2 F
The first step in the proximal gradient algorithms is considering an auxiliary proximal variable Zk , and
(n )
computing a quadratic approximation of E(Ck ) around Zk using the following equation:
D E
(n) (n) (n)
H Ck , Zk = Ck + F (Zk ) + ∇Ck F (Zk ) , Ck − Zk
1
c 2
(n)
+ C − Zk , (16)
2 k F
where the gradient ∇Ck F (Zk ) is computed straightforwardly by expanding F (Zk ) and taking gradient with
respect to Ck as
(n )
∇Ck F (Zk ) = λ D(n) T D(n) Ck − D(n) T Gb ( n ) + B( k ) . (17)
k
(n) 2
The parameter c > 0 in (16) is a given parameter and in our simulation we set c = D . Substituting
F
Wk = Zk − ∇Ck f (Zk ) /c in (16), we obtained
1
(n ) (n) 2
H Ck , Zk = T(Ck , Wk ) − ∥∇Ck f (Zk )∥F , (18)
2c
where
c 2
(n) (n ) (n)
T(Ck , Wk ) = Ck + C − Wk . (19)
1 2 k F
(n ) (n)
As a result, T(Ck , Wk ) is equivalent to H Ck , Zk up to a constant and it can be minimised instead of
(n ) (n)
H Ck , Zk . In the proximal gradient algorithms, instead of minimizing E(Ck ), which is a
(n)
non-differentiable optimization problem, the objective function T(Ck , Zk ) is minimized for a sequence of
j
proximal variables Zk . This leads to the following minimization problem which can be solved iteratively:
c 2
(n+1) (n ) (n) j +1
Ck,j = arg min Ck + C − Wk , (20)
Ck
(n) 1 2 k F
j+1 j j
where Wk = Zk − ∇Ck f Zk /c. Finally, we obtained the following closed form solution:
(n+1) j+1 1
Ck,j+1 = soft Wk , , (21)
c
where the soft thresholding operator is defined as soft (g, τ ) = sign (g) max(|g| − τ , 0). Note that in (20), the
j +1
soft thresholding operator is applied in an element-wise manner to all components of the matrix Wk .
9
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Algorithm 1. Algorithm for tensor ring decomposition with sparse representation (TRDSR).
Input: An incomplete data tensor X ∈ RI1 ×I2 ×···×IN , the observation index tensor Ω, TR-ranks (R0 , R2 , . . . , RN−1 ),
for simplicity, we assume that Rn = R ∀ n, a window size τ and regularization parameter λ > 0.
Output: Completed data tensor X b
1 Hankelize the data tensor X and Ω as XH and ΩH
2 Initialize the TR core tensors for the Hankelized tensor XH as G b (1) , G
b (2) , . . . , G
b (N) with Gaussian random tensors
3 while A stopping criterion is not satisfied do
4 for n = 1, 2, …, N do
5 Compute dictionary D(n) for mode-2 matricization of the core tensor G b (n )
6 Solve optimization Problem (10) for G b (n) and C(n) using the update ADMM rules (12)–(14)
7 end
b H =≪ G b (1) , G
b (2) b (N )
8 Compute X , . . . , G ≫
9 Compute X b H = PΩ X b H + PΩ ⊥ X bH
H H
10 bH
De-Hankelize X
11 end
j+1
There are several possible options for updating the proximal variable Zk , but in this paper we used the
accelerated one introduced in [54] as follows:
√
1 + 4t2j + 1
t k +1
= ,
2
Zj+1 = C(n+1) + t − 1 C(n+1) − C(n+1) ,
k
k k,j k,j+1 k,j
tk + 1
where t 1 = 1. The whole procedure is summarized in the pseudo-code of algorithm 1. The stopping criterion
which we used was that the relative error be less than a predefined tolerance or the maximum number of
iterations is reached.
Most of state-of-the-art tensor completion algorithms exploit the nuclear norm minimization formulation in
which the computation of the singular value decomposition (SVD) is required multiple times. This increases
their computational complexity and also makes them relatively slow. On the contrary, we avoid SVD
computation because we do not exploit the nuclear norm minimization formulation. For the sake of
simplicity, let us assume I1 = I2 = · · · = IN = I and R1 = R2 = · · · = RN = R, then the computational
complexity of TR-ALS [34] and TRLRF [49, 51] are O(pNR4 IN + NR6 ) and O(NR2 IN + NR6 ), respectively,
where p is constant between 0 and 1. The main operation in our algorithms Q
is matrix–matrix multiplication
Ii ×R2
in (15) where we need to multiply matrices G b (̸k=n) ∈ Ri̸=n
b (n) ∈ RR2 ×In and G which costs O(R2 IN ). This
k
indicates the lower complexity of our algorithm compared with others
6. Experiments
In this section, we evaluate the proposed algorithm referred to as TRDSR using real datasets including
images, videos, and time series signals. All simulations were conducted on a laptop computer with 2.60 GHz
Intel(R) Core(TM) i7-5500U processor and 16GB memory. The relative squared error (RSE) and peak
signal-to-noise ratio (PSNR) are used to evaluate and compare the performance of various algorithms. The
RSE is defined as
b −X
X
F
RSE = ,
∥X∥F
where X and X b are the original data tensor with all observations and the reconstructed data tensor,
respectively, while the PSNR is defined as
PSNR = 10 log10 2552 /MSE ,
where
2
b −X
MSE = X /num (X) ,
F
10
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Figure 7. Visual and PSNR comparison of reconstructed images using different tensor completion algorithms. In most cases, our
algorithm provides the best performance.
and ‘num(.)’ denotes the number of components of the tensor X. We used the Matlab toolbox poblano
toolbox 1.1 [56] for solving the underlying non-linear optimization problems.
Example 1. (Image recovery) In these experiments, we used the benchmark images ‘Lena’, ‘Barbara’,
‘House’, ‘Facade’, ‘Peppers’ and ‘Giant’ as shown in figure 6. All color images are of size 256 × 256 × 3. We
have tried different types of missing patterns. For example, in the case of ‘Lena’ image, we removed randomly
20% of pixels grouped in form of white big spots with some holes in the picture (first image) or continuous
scratching the image (sixth image) both of them are considered as structured missing. For the ‘Barbara’
image (second image), we removed more than 40% of whole columns and rows of pixels randomly. The last
two ‘Pepper’ and ‘Giant’ images have 90% of missing pixels. The missing patterns of the ‘House’, ‘Facade’
images were also structured distortions. Using the window size τ = (252, 252, 3), the incomplete images were
first Hankelized into a 5th-order tensor of size 252 × 5 × 252 × 5 × 3 and represented in TR format before
applying the completion algorithms. We compared the performance of our algorithm with some recent
tensor completion algorithms under various conditions and for different missing rates. The algorithms
which we used were TRLRF [57], TRALS [34], TR-WOPT [35], TT-WOPT [55], CP-WOPT [58], SPC-TV
[59] and MDT [5]. In order to provide a fair comparison with other algorithms, we use the same TT/TR
11
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Table 1. Performance comparison using PSNR and SSIM for benchmark images.
IMAGE Results TRLRF MDT TRALS TRWOPT TTWOPT CPWOPT SPC-TV Proposed (TRDSR)
Lena PSNR 29.63 27.65 30.19 28.39 23.80 21.45 27.11 31.49
SSIM 0.9156 0.8233 0.9024 0.9048 0.8130 0.6057 0.9049 0.9422
Barbara PSNR 18.76 19.34 19.59 10.63 17.53 10.71 16.96 24.12
Lines SSIM 0.5251 0.4700 0.7300 0.2525 0.4619 0.1242 0.5199 0.7702
House PSNR 32.67 32.41 31.66 30.49 27.78 22.73 30.73 33.95
SSIM 0.9186 0.8696 0.9050 0.8411 0.8601 0.6287 0.9075 0.9353
Barbara PSNR 29.98 26.77 29.67 28.70 28.64 21.05 30.29 31.84
White Circles SSIM 0.9200 0.7813 0.9177 0.8999 0.9126 0.5713 0.9279 0.9456
Occluded PSNR 22.56 19.91 16.18 20.699 15.22 22.14 20.06 23.24
Windows SSIM 0.7300 0.6258 0.5721 0.6560 0.5732 0.6704 0.7041 0.7425
Lena PSNR 32.89 32.03 33.95 32.18 30.00 21.73 34.34 36.71
Scratched SSIM 0.9159 0.9236 0.9390 0.9011 0.8932 0.6090 0.9513 0.9677
Peppers PSNR 17.85 20.11 21.33 20.52 13.88 16.50 20.06 22.19
90% Missing SSIM 0.2772 0.5619 0.5056 0.4689 0.1569 0.2617 0.6370 0.5506
Giant PSNR 16.71 19.58 18.35 18.37 13.48 16.38 19.09 20.32
90% Missing SSIM 0.3110 0.4631 0.3986 0.3756 0.1550 0.2531 0.4797 0.5272
Note: The bold values mean that the results of our proposed algorithm outperform other algorithms.
Figure 8. Comparison of RSE and CPU time with TR Ranks of some tensor completion algorithms for ‘Lena’ image with 90%
missing components.
ranks for TRLRF, TRALS, TR-WOPT, TT-WOPT, CP-WOPT algorithms and tune the hyper-parameters of
each algorithm to make performance as best as possible. The MDT and the SPC-TV algorithms were able to
tune their ranks automatically. The reconstructed images are displayed in figure 7 and detailed PSNR and
SSIM comparison are reported in table 1. From the results, it is seen that our proposed algorithm, in general,
outperforms the other algorithms in most of the cases for various scenarios and missing patterns. Also, we
examined our proposed algorithms in terms of robustness to TR rank selection. To this end, we considered
the ‘Lena’ image with 90% random missing pixels and compared the RSE. The results are reported in
figure 8(a) which show that our TRDSR algorithm achieves the lowest RSE compared to the other
completion methods and also it is relatively insensitive to selected TR ranks. We should mention that one of
the main challenging task in tensor completion problem is a good selection of rank and a larger rank does
not always necessarily provides better results. This is mainly because of the over-fitting problem where the
model is not selected appropriately. Because of this issue, it is of interest to have a tensor completion
algorithm which is less sensitive to rank selection. Actually, in this experiment we wanted to show that
different from other algorithms, ours is less sensitive to the choice of TR-ranks and it is more stable to
various TR-ranks. We also compared the computational time of our method with other existing methods.
The results are reported in figure 8(b). It is seen that our algorithm is the fastest algorithm while at the same
time, it provides better performance. Finally, we applied our proposed algorithm to an MRI images.
Here, we mainly exploited four different kinds of structural missing patterns (see figure 9). In the first
experiment, we scratched the data image (first image in figure 9), in the second one we removed 10% of
pixels and 10% of columns and rows randomly (second image in figure 9), in the third experiment, we
removed 40% of pixels and putting some spots in the image (third image in figure 9), and in the fourth
experiment, we removed 40% columns and rows (fourth image in figure 9). The recovered images together
with indicated corresponding PSNR index are reported in figure 9. Our computer experiments indicate that
the proposed algorithm also provides good results for recovering real-life medical images. Moreover, we
compared the performance of our algorithm for the MRI image with some other tensor completion
algorithm. These results are reported in figure 10. Here again, our algorithms provided better performance.
12
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Figure 9. Performance of the proposed TRDSR algorithm in recovering an MRI image with different kind of missing pixels.
Example 2. In this experiment, we consider the video recovery task. The dataset which we used was the
GunShot video [34, 60] of the size6 100 × 260 × 3 × 85. We removed 80% of voxels randomly. We Hankelized
video frames into an 8th-order tensor of the size 252 × 9 × 96 × 5 × 3 × 1 × 83 × 3 for reconstruction using
the window sizes τ = (252, 96, 3, 83). The reconstructed results of TR-ALS, MDT, SPC and our proposed
TRDSR algorithm are displayed in figure 11. From the results, the superiority of our technique is visible.
As a second experiment on videos, we considered the smoke video7 [5] which is composed of 64 frames
of size 90 × 160 × 3 with tensor size of 90 × 160 × 3 × 64. We removed 95% of voxels randomly to be used in
our simulations. The incompleted data tensor was Hankelized into an 8th-order tensor of size
85 × 6 × 155 × 6 × 3 × 1 × 60 × 5 using the window sizes τ = (85, 155, 3, 60). The reconstructed results of
the MDT [5] and our proposed TRDSR algorithm are displayed in figure 12.
Example 3. In this simulation, we evaluate our proposed algorithm for reconstruction of time series
generated by the Lorentz system which is illustrated in figure 13. The size of the Lorentz system was 1 × 4600,
which we reshaped to 3rd-order tensors of size 46 × 10 × 10. We considered two kinds of missing patterns as
follows:
• Removing some parts of the signal in the first and middle parts along with a Gaussian noise (see figure 13).
• Removing 20% of samples and also removing 300 samples in the middle and last parts of the signal (see
figure 14).
In both above cases, we used window size τ = (5, 44, 5) and Hankelized the incomplete 3rd-order tensor
into a 6th-order tensor of size 5 × 6 × 44 × 3 × 5 × 6. The constructed signals using our proposed algorithm
13
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Figure 10. Visual and PSNR comparison of MRI image reconstructed using different tensor completion algorithms.
Figure 11. Comparison of RSE on the GunShot Video Reconstruction with 80% random missing pixels.
Figure 12. Comparison of RSE on the Smoke Video Reconstruction with 95% random missing pixels.
14
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Figure 13. Signal reconstruction (generated by Lorentz system) using different algorithms.
Figure 14. Comparison of performance of signal denoising and prediction (generated by Lorentz system corrupted by additive
noise, where middle and last samples are missing) using different algorithms.
compared to the MDT and the SPC-TV algorithms are displayed in figures 13 and 14, respectively. Our
algorithm achieved the lowest RSE and this confirms the effectiveness and applicability of our method.
7. Conclusion
In this paper, a new and efficient tensor completion algorithm was proposed for recovering data tensors with
random and/or structured missing entries. The main idea is a prior Hankelization of the incomplete data
tensor after which the data tensor is completed through learning the core tensors of the TR representation
with sparse constraints. The ADMM algorithm and APG approaches were adopted to solve the underlying
optimization problems. Extensive simulations show that our proposed method is capable of recovering
incomplete data tensors with different types of structured and random missing elements. Our algorithm
exhibits higher performance in comparison to many existing tensor decomposition methods, while
providing lower computational cost in comparison to other tensor completion approaches.
Data sharing is not applicable to this article as no new data were created or analysed in this study. The
analyzed data are available online.
The data of images and videos are available at https://youtu.be/7y9apnbI6GA, https://www.nhk.or.jp/
archives/creative/material/category-list.html?i = 22 and https://drive.google.com/file/d/15xk67wYZ9GI2
Kn93g_aaA4CsmgnqGlHE/view.
15
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
Acknowledgment
The authors would like to thank two anonymous reviewers for their constructive comments which have
greatly improved the quality of the paper. This research was also partially supported by the Ministry of
Education and Science of the Russian Federation (Grant No. 14.756.31.0001).
ORCID iDs
References
[1] Liu J, Musialski P, Wonka P and Ye J 2012 Tensor completion for estimating missing values in visual data IEEE Trans. Pattern Anal.
Mach. Intell. 35 208–20
[2] Asif M T, Mitrovic N, Dauwels J and Jaillet P 2016 Matrix and tensor based methods for missing data estimation in large traffic
networks IEEE Trans. Intell. Transp. Syst. 17 1816–25
[3] Song Q, Ge H, Caverlee J and Hu X 2019 Tensor completion algorithms in big data analytics ACM Trans. Knowl. Discovery Data
13 1–48
[4] Long Z, Liu Y, Chen L and Zhu C 2019 Low rank tensor completion for multiway visual data Signal Process. 155 301–16
[5] Yokota T, Erem B, Guler S, Warfield S K and Hontani H 2018 Missing slice recovery for tensors using a low-rank model in
embedded space Proc. Conf. on Computer Vision and Pattern Recognition pp 8251–9
[6] Yokota T and Hontani H 2017 Simultaneous visual data completion and denoising based on tensor rank and total variation
minimization and its primal-dual splitting algorithm Proc. Conf. on Computer Vision and Pattern Recognition pp 3732–40
[7] Chen Y and Chi Y 2013 Spectral compressed sensing via structured matrix completion (arXiv:1304.4610)
[8] Sedighin F, Cichocki A, Yokota T and Shi Q 2020 Matrix and tensor completion in multiway delay embedded space using tensor
train, with application to signal reconstruction IEEE Signal Process. Lett. 27 810–14
[9] Boyd S et al 2011 Distributed optimization and statistical learning via the alternating direction method of multipliers Found.
Trends Mach. Learn. 3 1–122
[10] Cichocki A, Lee N, Oseledets I V, Phan A-H, Zhao Q and Mandic D P 2016 Tensor networks for dimensionality reduction and
large-scale optimization: part 1 perspectives and challenges Found. Trends Mach. Learn. 9 249–429
[11] Oseledets I V 2011 Tensor-train decomposition SIAM J. Sci. Comput. 33 2295–2317
[12] Khoromskij B N 2011 O(dlog N)-quantics approximation of n-d tensors in high-dimensional numerical modeling Constr. Approx.
34 257–80
[13] Espig M, Hackbusch W, Handschuh S and Schneider R 2011 Optimization problems in contracted tensor networks Comput. Vis.
Sci. 14 271–85
[14] Espig M, Naraparaju K K and Schneider J 2012 A note on tensor chain approximation Comput. Vis. Sci. 15 331–44
[15] Zhao Q, Zhou G, Xie S, Zhang L and Cichocki A 2016 Tensor ring decomposition (arXiv:1606.05535)
[16] White S R 1992 Density matrix formulation for quantum renormalization groups Phys. Rev. Lett. 69 2863
[17] White S R 1993 Density-matrix algorithms for quantum renormalization groups Phys. Rev. B 48 10345
[18] Schollwöck U 2011 The density-matrix renormalization group in the age of matrix product states Ann. Phys. 326 96–192
[19] Orús R 2014 A practical introduction to tensor networks: matrix product states and projected entangled pair states Ann. Phys.
349 117–58
[20] Bridgeman J C and Chubb C T 2017 Hand-waving and interpretive dance: an introductory course on tensor networks J. Phys.
A 50 223001
[21] Torlai G, Wood C J, Acharya A, Carleo G, Carrasquilla J and Aolita L 2020 Quantum process tomography with unsupervised
learning and tensor networks (arXiv:2006.02424)
[22] Biamonte J and Bergholm V 2017 Tensor networks in a nutshell (arXiv:1708.00006)
[23] Biamonte J, Kardashin A and Uvarov A 2018 Quantum machine learning tensor network states (arXiv:1804.02398)
[24] Cichocki A, Lee N, Oseledets I, Phan A-H, Zhao Q and Mandic D P 2017 Tensor networks for dimensionality reduction and
large-scale optimizations: Part 2 applications and future perspectives Found. Trends Mach. Learn. 9 431–673
[25] Cichocki A 2014 Era of big data processing: a new approach via tensor networks and tensor decompositions (arXiv:1403.2048)
[26] Reyes J and Stoudenmire M 2020 A multi-scale tensor network architecture for classification and regression (arXiv:2001.08286)
[27] Pippan P, White S R and Evertz H G 2010 Efficient matrix-product state method for periodic boundary conditions Phys. Rev.
B 81 081103
[28] Yang Y, Krompass D and Tresp V 2017 Tensor-train recurrent neural networks for video classification (arXiv:1707.01786)
[29] Kuznetsov M, Polykovskiy D, Vetrov D P and Zhebrak A 2019 A prior of a googol Gaussians: a tensor ring induced prior for
generative models Advances in Neural Information Processing Systems pp 4102–12
[30] Stoudenmire E and Schwab D J 2016 Supervised learning with tensor networks Advances in Neural Information Processing Systems
pp 4799–807
[31] Novikov A, Podoprikhin D, Osokin A and Vetrov D P 2015 Tensorizing neural networks Advances in Neural Information Processing
Systems pp 442–50
[32] Tjandra A, Sakti S and Nakamura S 2017 Compressing recurrent neural network with tensor train 2017 Int. Conf. on Neural
Networks (IJCNN) (IEEE) pp 4451–8
[33] Pan Y, Xu J, Wang M, Ye J, Wang F, Bai K and Xu Z 2019 Compressing recurrent neural networks with tensor ring for action
recognition Proc. Conf. on Artificial Intelligence vol 33 pp 4683–90
[34] Wang W, Aggarwal V and Aeron S 2017 Efficient low rank tensor ring completion Proc. IEEE Int. Conf. on Computer Vision
pp 5697–705
16
Mach. Learn.: Sci. Technol. 2 (2021) 035008 M G Asante-Mensah et al
[35] Yuan L, Cao J, Zhao X, Wu Q and Zhao Q 2018 Higher-dimension tensor completion via low-rank tensor ring decomposition 2018
Asia-Pacific Signal and Information Processing Association Annual Summit Conf. (APSIPA ASC) (IEEE) pp 1071–6
[36] Bengua J A, Phien H N, Tuan H D and Do M N 2017 Efficient tensor completion for color image and video recovery: low-rank
tensor train IEEE Trans. Image Process. 26 2466–79
[37] Zhao Q, Sugiyama M, Yuan L and Cichocki A 2019 Learning efficient tensor representations with ring-structured networks ICASSP
2019-2019 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) (IEEE) pp 8608–12
[38] He W, Chen Y, Yokoya N, Li C and Zhao Q 2020 Hyperspectral super-resolution via coupled tensor ring factorization
(arXiv:2001.01547)
[39] Dian R, Li S and Fang L 2019 Learning a low tensor-train rank representation for hyperspectral image super-resolution IEEE Trans.
Neural Netw. Learn. Syst. 30 2672–83
[40] Bellman R E 2015 Adaptive Control Processes: A Guided Tour vol 2045 (Princeton, NJ: Princeton University Press)
[41] Ahmadi-Asl S, Cichocki A, Phan A H, Asante-Mensah M G, Mousavi F, Oseledets I and Tanaka T 2020 Randomized algorithms for
fast computation of low-rank tensor ring model Mach. Learn.: Sci. Technol. 2 011001
[42] Yokota T, Hontani H, Zhao Q and Cichocki A 2020 Manifold modeling in embedded space: an interpretable alternative to deep
image prior IEEE Trans. Neural Networks Learning Systems (https://doi.org/10.1109/TNNLS.2020.3037923)
[43] Yang J, Zhu Y, Li K, Yang J and Hou C 2018 Tensor completion from structurally-missing entries by low-TT-rankness and
fiber-wise sparsity IEEE J. Sel. Top. Signal Process. 12 1420–34
[44] Zhang J, Zhao D and Gao W 2014 Group-based sparse representation for image restoration IEEE Trans. Image Process. 23 3336–51
[45] Elad M and Aharon M 2006 Image denoising via sparse and redundant representations over learned dictionaries IEEE Trans. Image
Process. 15 3736–45
[46] Aharon M, Elad M and Bruckstein A 2006 K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation
IEEE Trans. Signal Process. 54 4311–22
[47] Mairal J, Bach F, Ponce J and Sapiro G 2009 Online dictionary learning for sparse coding Proc. 26th Annual Int. Conf. on Machine
Learning pp 689–96
[48] Cai S, Kang Z, Yang M, Xiong X, Peng C and Xiao M 2018 Image denoising via improved dictionary learning with global structure
and local similarity preservations Symmetry 10 167
[49] Yuan L, Li C, Mandic D, Cao J and Zhao Q 2019 Tensor ring decomposition with rank minimization on latent space: an efficient
approach for tensor completion Proc. Conf. on Artificial Intelligence vol 33 pp 9151–8
[50] Huang H, Liu Y, Long Z and Zhu C 2020 Robust low-rank tensor ring completion IEEE Trans. Comput. Imaging 6 1117–26
[51] Yuan L, Li C, Cao J and Zhao Q 2020 Rank minimization on tensor ring: an efficient approach for tensor decomposition and
completion Mach. Learn. 109 603–22
[52] Zhao X-L, Nie X, Zheng Y-B, Ji T-Y and Huang T-Z 2019 Low-rank tensor completion via tensor nuclear norm with hybrid
smooth regularization IEEE Access 7 131888–901
[53] Ding M, Huang T-Z, Zhao X-L and Ma T-H 2020 Tensor completion via nonconvex tensor ring rank minimization with
guaranteed convergence (arXiv:2005.09674)
[54] Toh K-C and Yun S 2010 An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems
Pac. J. Optim. 6 15
[55] Yuan L, Zhao Q, Gui L and Cao J 2019 High-order tensor completion via gradient-based optimization under tensor train format
Signal Process.: Image Commun. 73 53–61
[56] Dunlavy D M, Kolda T G and Acar E 2010 Poblano v1. 0: a matlab toolbox for gradient-based optimization Technical Report
SAND2010-1422 (Sandia National Laboratories)
[57] Yuan L, Li C, Mandic D, Cao J and Zhao Q 2018 Tensor ring decomposition with rank minimization on latent space: an efficient
approach for tensor completion (arXiv:1809.02288)
[58] Acar E, Dunlavy D M, Kolda T G and Mørup M 2011 Scalable tensor factorizations for incomplete data Chemometr. Intell. Lab.
Syst. 106 41–56
[59] Yokota T, Zhao Q and Cichocki A 2016 Smooth PARAFAC decomposition for tensor completion IEEE Trans. Signal Process.
64 5423–36
[60] Discovery 2015 Pistol shot recorded at 73 000 frames per second (https://youtu.be/7y9apnbI6GA)
17