Rank Metric Decoder Architectures For Random Linear Network Coding With Error Control
Rank Metric Decoder Architectures For Random Linear Network Coding With Error Control
Rank Metric Decoder Architectures For Random Linear Network Coding With Error Control
2, FEBRUARY 2012
AbstractWhile random linear network coding is a powerful packets as vectors over some finite field and forms an outgoing
tool for disseminating information in communication networks, packet by linearly combining incoming packets using random
it is highly susceptible to errors caused by various sources. Due
to error propagation, errors greatly deteriorate the throughput coefficients. Due to its random linear operations, RLNC not
of network coding and seriously undermine both reliability only achieves network capacity in a distributed manner, but
and security of data. Hence, error control for network coding also provides robustness to changing network conditions.
is vital. Recently, constant-dimension codes (CDCs), especially
KtterKschischang (KK) codes, have been proposed for error Unfortunately, it is highly susceptible to errors caused by
control in random linear network coding. KK codes can also be various reasons, such as noise, malicious or malfunctioning
constructed from Gabidulin codes, an important class of rank nodes, or insufficient min-cut [3]. Since linearly combining
metric codes. Rank metric decoders have been recently proposed
for both Gabidulin and KK codes, but they have high compu- packets results in error propagation, errors greatly deteriorate
tational complexities. Furthermore, it is not clear whether such the throughput of network coding and seriously undermine both
decoders are feasible and suitable for hardware implementations. reliability and security of data. Thus, error control for random
In this paper, we reduce the complexities of rank metric decoders
and propose novel decoder architectures for both codes. The linear network coding is critical.
synthesis results of our decoder architectures for Gabidulin and Error control schemes proposed for RLNC assume two types
KK codes with limited error-correcting capabilities over small of transmission models. The schemes of the first type (see, for
fields show that our architectures not only are affordable, but also
achieve high throughput. example, [4]) depend on and take advantage of the underlying
network topology or the particular linear network coding op-
Index TermsConstant-dimension codes (CDCs), decoding,
error correction coding, Gabidulin codes, Galois fields, integrated erations performed at various network nodes. The schemes of
circuits, KtterKschischang (KK) codes, network coding, rank the second type [3], [5] assume that the transmitter and receiver
metric codes, subspace codes. have no knowledge of such channel transfer characteristics. The
two transmission models are referred to as coherent and nonco-
herent network coding, respectively.
I. INTRODUCTION It has been recently shown [3] that an error control code for
noncoherent network coding, called a subspace code, is a set of
N ETWORK coding [1] is a promising candidate for a new
unifying design paradigm for communication networks,
due to its advantages in throughput and robustness to network
subspaces (of a vector space), and information is encoded in the
choice of a subspace as a codeword; a set of packets that gen-
failures. Hence, network coding is already used or considered erate the chosen subspace is then transmitted [3]. A subspace
in gossip-based data dissemination, 802.11 wireless ad hoc net- code is called a constant-dimension code (CDC) if its subspaces
working, peer-to-peer networks, and mobile ad hoc networks are of the same dimension. CDCs are of particular interest since
(MANETs). they lead to simplified network protocols due to the fixed dimen-
Random linear network coding (RLNC) [2] is arguably sion. A class of asymptotically optimal CDCs has been proposed
the most important class of network coding. RLNC treats all in [3], and this class is referred to as the KK codes. A decoding
algorithm based on interpolation for bivariate linearized poly-
Manuscript received March 29, 2010; revised September 10, 2010; accepted
nomials is also proposed in [3] for the KK codes. It was shown
October 17, 2010. Date of publication January 06, 2011; date of current version that KK codes correspond to lifting [5] of Gabidulin codes [6],
January 18, 2012. This work was supported in part by Thales Communications, [7], which are a class of optimal rank metric codes. Gabidulin
Inc., a summer extension grant from Air Force Research Lab, and the National codes are also called maximum rank distance (MRD) codes,
Science Foundation under Grant ECCS-0925890. This paper was presented in
part at the IEEE Workshop on Signal Processing Systems, Tampere, Finland, since they achieve the Singleton bound in the rank metric [6],
October2009. as ReedSolomon (RS) codes achieve the Singleton bound of
N. Chen is with SandForce Inc., Saratoga, CA 95070 USA (e-mail: Hamming distance. Due to the connection between Gabidulin
nchen@sandforce.com).
Z. Yan is with the Department of Electrical and Computer Engineering, and KK codes, the decoding of KK codes can be viewed as
Lehigh University, Bethlehem, PA 18015 USA (e-mail: yan@lehigh.edu). generalized decoding of Gabidulin codes, which involves devia-
M. Gadouleau is with the Department of Computer Science, Queen Mary, tions as well as errors and erasures [5]. Gabidulin codes are sig-
University of London, E1 4NS U.K. (e-mail: mgadouleau@eecs.qmul.ac.uk).
Y. Wang is with the New Jersey Research Center, QUALCOMM, Inc.,
nificant in themselves: For coherent network coding, the error
Bridgewater, NJ 08807 USA (e-mail: aywang11@gmail.com). correction capability of error control schemes is succinctly de-
B. W. Suter is with Air Force Research Laboratory, Rome, NY 13441 USA scribed by the rank metric [8]; thus, error control codes for co-
(e-mail: bruce.suter@rl.af.mil).
Color versions of one or more of the figures in this paper are available online
herent network coding are essentially rank metric codes.
at http://ieeexplore.ieee.org. The benefits of network coding above come at the price of ad-
Digital Object Identifier 10.1109/TVLSI.2010.2096239 ditional operations needed at the source nodes for encoding, at
1063-8210/$26.00 2011 IEEE
CHEN et al.: RANK METRIC DECODER ARCHITECTURES FOR RLNC WITH ERROR CONTROL 297
the intermediate nodes for linear combining, and at the destina- The decoding algorithms of both Gabidulin and KK codes
tion node(s) for decoding. In practice, the decoding complexi- involve solving key equations. We adapt the inversionless
ties at destination nodes are much greater than the encoding and BerlekampMassey algorithm (BMA) in [14], [15] to
combining complexities. The decoding complexities of RLNC solving key equations for rank metric codes. Our inver-
are particularly high when large underlying fields are assumed sionless BMA leads to reduced complexities as well as
and when additional mechanisms such as error control are ac- efficient architectures;
counted for. Clearly, the decoding complexities of RLNC are The decoding algorithm of KK codes requires that the input
critical to both software and hardware implementations. Fur- be arranged in a row reduced echelon (RRE) form [16].
thermore, area/power overheads of their VLSI implementations We define a more generalized form called -RRE form,
are important factors in system design. Unfortunately, prior re- and show that it is sufficient if the input is in the -RRE
search efforts have mostly focused on theoretical aspects of net- form. This change not only reduces the complexity of re-
work coding, and complexity reduction and efficient VLSI im- formulating the input, but also enables parallel processing
plementation of network coding decoders have not been suffi- of decoding KK codes based on Cartesian products.
ciently investigated so far. For example, although the decoding Another main contribution of this paper is efficient decoder
complexities of Gabidulin and KK codes were analyzed in [9], architectures for both Gabidulin and KK codes. Aiming to re-
[10] and [3], [5], respectively, they do not reflect the impact of duce the area and to improve the regularity of our decoder archi-
the size of the underlying finite fields. To ensure high probability tectures, we have also reformulated other steps in the decoding
of success for RLNC, a field of size or is desired [11]. algorithm. To evaluate the performance of our decoder architec-
However, these large field sizes will increase decoding complex- tures for Gabidulin and KK codes, we implement our decoder
ities and hence complicate hardware implementations. Finally, architecture for two rate-1/2 Gabidulin codes and their corre-
to the best of our knowledge, hardware architectures for these sponding KK codes. Our KK decoders can be used in network
decoders have not been investigated in the open literature. coding with various packet lengths by Cartesian product [5].
In this paper, we fill this significant gap by investigating com- The synthesis results of our decoders show that our decoder ar-
plexity reduction and efficient hardware implementation for de- chitectures for Gabidulin and KK codes over small fields with
coders in RLNC with error control. This effort is significant to limited error-correcting capabilities not only are affordable, but
the evaluation and design of network coding for several rea- also achieve high throughput. Our decoder architectures and im-
sons. First, our results evaluate the complexities of decoders for plementation results are novel to the best of our knowledge.
RLNC as well as the area, power, and throughput of their hard- The decoders considered in this work are bounded distance
ware implementations, thereby helping to determine the feasi- decoders, and their decoding capability is characterized in [5,
bility and suitability of network coding for various applications. Theorem 11]. The thrust of our work is to reduce complexi-
Second, our research results provide instrumental guidelines to ties and to devise efficient architectures for such decoders, while
the design of network coding from the perspective of complexity maintaining their decoder capability. To this end, our reformu-
as well as hardware implementation. Third, our research results lations of the decoding algorithms do not affect the decoding ca-
lead to efficient decoders and hence reduce the area and power pability of the bounded distance decoders of Gabidulin and KK
overheads of network coding. codes. The error performance of the bounded distance decoders
In this paper, we focus on the generalized Gabidulin decoding has been investigated in our previous works [17][19]. Hence,
algorithm [5] for the KK codes and the modified Berlekamp- despite its significance, a detailed error performance analysis is
Massey decoding algorithm in [12] for Gabidulin codes for two beyond the scope of this paper, and we do not include it due to
reasons. First, compared with the decoding algorithm in [3], the limited space.
generalized Gabidulin decoding [5] has a smaller complexity, The remainder of this paper is organized as follows. After
especially for high-rate KK codes [5]. Second, components in briefly reviewing the background in Section II, we present our
the errors-only Gabidulin decoding algorithm in [12] can be complexity-saving algorithmic reformulations and efficient
easily adapted in the generalized Gabidulin decoding of KK decoder architectures in Sections III and IV, respectively.
codes. Thus, among the decoding algorithms for Gabidulin In Section V, the proposed architectures are implemented in
codes, we focus on the decoding algorithm in [12]. Verilog and synthesized for area/performance evaluation. The
Although we focus on RLNC with error control in this paper, conclusion is given in Section VI.
our results can be easily applied to RLNC without error control.
For RLNC without error control, the decoding complexity is pri- II. PRELIMINARIES
marily due to inverting of the global coding matrix via Gauss
Jordan elimination, which is also considered in this paper. A. Notation
Our main contributions include several algorithmic reformu- Let denote a power of prime and denote a finite field
lations that reduce the computational complexities of decoders of order . We use , , and to denote an
for both Gabidulin and KK codes. Our complexity-saving algo- identity matrix, an -dimensional vector space over , and
rithmic reformulations are given here. the set of all matrices over , respectively. For a set
We first adopt normal basis representations for all finite , denotes the complement subset
field elements and then significantly reduce the com- and denotes the columns of in .
plexity of bit-parallel normal basis multipliers by using In this paper, all vectors and matrices are in bold face.
our common subexpression elimination (CSE) algo- The rank weight of a vector over is defined as the max-
rithm [13]; imal number of its coordinates that are linearly independent over
298 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012
TABLE I
ANALOGY BETWEEN RS AND GABIDULIN CODES Algorithm 2 (General Rank Decoding [5]).
TABLE II
COMPLEXITIES OF BIT-PARALLEL NORMAL BASIS MULTIPLIERS OVER FINITE FIELDS (FOR THESE TWO FIELDS, ALL THREE
IMPLEMENTATIONS HAVE THE SAME CPD.)
the computational complexity of Algorithm 1 is primarily due , , is defined as the number of terms in computing a
to the following updates in Step 1.1: bit of the product , where and
and is a normal
basis. It has been shown [29] that a parallel MO multiplier over
(5)
needs AND gates and at most XOR
gates. For instance, for the fields and , their s are
which require divisions and computing [ 1]th powers. With minimized to 21 and 85, respectively [26]. Using a common
normal basis representation, [ 1]th powers are obtained by a subexpression elimination algorithm [13], we significantly re-
single cyclic shift. When , they can be computed in an in- duce the number of XOR gates while maintaining the same critical
versionless form , path delays (CPDs) of one AND plus five XOR gates and one AND
, which also avoids finite field plus seven XOR gates as direct implementations, respectively.
divisions or inversions. Thus, using normal basis representation Our results are compared with those in [26] and [29] in Table II,
also reduces the complexity of Gabidulins algorithm. where we also provide the prime polynomial for each field.
In addition to lower complexities of finite field arithmetic op- The reduced gate count for normal basis multiplication is
erations, normal basis representation leads to reduced complex- particularly important for hardware implementations of RLNC.
ities in the decoding of Gabidulin and KK codes for several rea- This improvement is transparent to the complexity of decoders,
sons. First, it was shown that using normal basis can facilitate in terms of finite field operations. When decoders for RLNC
the computation of symbolic product [9]. Second, it was also are realized in hardware, the reduced gate count for normal
suggested [9] that solving (4) can be trivial using normal basis. If basis multiplication will be reflected in reduced area and power
is a normal basis, the matrix , whose rows consumption.
are vector representations of s with respect to the basis s,
C. Inversionless BMA
becomes an identity matrix with additional all-zero columns.
Hence, solving (4) requires no computation. These two com- The modified BMA for rank metric codes [12] is similar to
plexity reductions were also observed in [10]. Third, if a normal the BMA for RS codes except that polynomial multiplications
basis of is used as s and , the parity check matrix are replaced by symbolic products. The modified BMA [12]
in (1) becomes a cyclic matrix. Thus, syndrome computa- requires finite field divisions, which are more complex than
tion becomes part of a cyclic convolution of other arithmetic operations. Following the idea of inversion-
and , for which fast algorithms are available (see, for example, less RS decoder [14], we propose an inversionless variant in
[27]). Using fast cyclic convolution algorithms are favorable Algorithm 3.
when is large.
Algorithm 3. iBMA
B. Normal Basis Arithmetic Operations
We also propose finite field arithmetic operations with re- Input: Syndromes
duced complexities, when normal basis representation is used. Output:
When represented by vectors, the addition and subtraction of
two elements are simply component-wise addition, which is 3.1 Initialize: , , and
straightforward to implement. For characteristic-2 fields , .
inverses can be obtained efficiently by a sequence of squaring 3.2 For ,
and multiplying, since for a) Compute the discrepancy .
[26]. Since the -th powers require no computation, b) If , then go to (e).
the complexity of inversion in turn depends on that of multi- c) Modify the connection polynomial:
plication. Division can be implemented by a concatenation of .
inversion and multiplication: , and hence the d) If , go to (e). Otherwise, ,
complexity of division also depends on that of multiplication in , and . Go to (a).
the end. e) Set and
There are serial and parallel architectures for normal basis fi- .
nite field multipliers. To achieve high throughput in our decoder, 3.3 Set .
we consider only parallel architectures. Most normal basis mul-
tipliers are based on the MasseyOmura (MO) architecture [26], Using a similar approach as in [14], we prove that the output
[28]. The complexity of a serial MO normal basis multiplier over of Algorithm 3 is the same as produced by the modi-
CHEN et al.: RANK METRIC DECODER ARCHITECTURES FOR RLNC WITH ERROR CONTROL 301
fied BMA, except it is scaled by a constant . We first define an -RRE form for received matrices. Given a
However, this scaling is inconsequential since the two polyno- matrix , where and , the ma-
mials have the same root space. trix is in its -RRE form as long as (its leftmost columns)
Using normal basis, the modified BMA in [12] requires at is in its RRE form. Compared with the RRE form, the -RRE
most inversions, multiplications, form is more relaxed as it puts no constraints on the right part.
and additions over [9]. Our inversionless We note that an -RRE form of a matrix is not unique.
version, Algorithm 3, requires at most multipli- We now show that the relaxed constraint does not affect the
cations and additions. Since a normal basis in- decoding. Similar to [5, Proposition 7], we first show that a re-
version is obtained by normal basis multiplications, the duction based on -RRE form of always exists. Given
complexity of normal basis inversion is roughly times that and , where represents the reducing
of normal basis multiplication. Hence, Algorithm 3 reduces the row operations, the product is in its
complexity considerably. Algorithm 3 is also more suitable for -RRE form. We note that and ,
hardware implementation, as shown in Section IV. where the column and row rank deficiency of are given by
and , respectively. We
D. Finding the Root Space have the following result about the reduction based on .
Instead of finding roots of polynomials in RS decoding, we Lemma 1: Let and and be defined as above. There
need to find the root spaces of linearized polynomials in rank exists a tuple and a set
metric decoding. Hence, the Chien search [30] in RS decoding satisfying , , , and
will have a high complexity for two reasons. First, it requires
polynomial evaluations over the whole field, whose complexity so that .
is very high. Second, it cannot find a set of linearly independent
roots. See Appendix A for the proof of Lemma 1. Lemma 1 shows
A probabilistic algorithm to find the root space was proposed that we can find an alternative reduction based on -RRE form
in [25]. For Gabidulin codes, it can be further simplified as sug- of , instead of an RRE form of . The key of our alternative
gested in [9]. However, hardware implementations of proba- reduction of is that the reduction is mostly determined by the
bilistic algorithms require random number generators. Further- first columns of . Also, this alternative reduction does
more, the algorithm in [25] requires symbolic long division, not come as a surprise. As shown in [5, Proposition 8], row opera-
which is also not suitable for hardware implementations. Ac- tions on can produce alternative reductions. Next, we show that
cording to [5], the average complexity of the probabilistic al- decoding based on our alternative reduction is the same as in [5].
gorithm in [25] is operations over , while that of Similar to [5, Theorem 9], we have the following results.
Berlekamps deterministic method [24] is operations in Lemma 2: Let be a reduction of de-
plus operations in . Since their complexity dif- termined by its -RRE form, we have
ference is small, we focus on the deterministic method, which .
is much easier to implement.
Suppose we need to find the root space of a linearized polyno- See Appendix B for the proof. Lemma 2 shows that the
mial , Berlekamps deterministic method first evaluates the subspace decoding problem is equivalent to the generalized
polynomial on a basis of the field such Gabidulin decoding problem with the alternative reduction
that , . Then it expands s in the , which is obtained from an -RRE form of .
base field as columns of an matrix and finds linearly Our alternative reduction leads to two advantages. First, it re-
independent roots such that . Using the representation sults in reduced complexity in preprocessing. Given a matrix
based on , the roots are also the roots of , the preprocessing needed to transform into its -RRE
the given polynomial. Finding is to obtain the linear depen- form is only part of the preprocessing to transform into its
dent combinations of the columns of , which can be done by RRE form. We can show that the maximal number of arith-
Gaussian elimination. metic operations in the former preprocessing is given by
, whereas that of the latter prepro-
E. -RRE Form cessing is . Since
Given a received subspace spanned by a set of received , the relaxed constraint leads to a lower complexity, and
packets, the input of Algorithm 2 is a three-tuple, called a reduc- the reduction depends on and . Second, the re-
tion of the received space represented by its generator matrix duction for -RRE forms is completely determined by the left-
; the three-tuple is obtained based on when it is in its RRE most columns of instead of the whole matrix, which greatly
form [5]. Thus, before the decoding starts, preprocessing is simplifies hardware implementations. This advantage is partic-
performed on the received packets so as to obtain the RRE form ularly important for the decoding of CDCs that are lifted from
of . We show that the processed matrix needs to satisfy only a Cartesian products of Gabidulin codes. Since the row opera-
relaxed constraint, which does not affect the decoding outcome, tions to obtain an -RRE form depend on only, decoding
while leading to two advantages. First, the relaxed constraint can be divided into parallel and smaller de-
results in reduced complexities in the preprocessing step. Second coding problems whose inputs are .
and more importantly, the relaxed constraint enables parallel Thus, for these CDCs, we can decode in a serial manner with
processing of decoding KK codes based on Cartesian products. only one small decoder, or in a partly parallel fashion with more
302 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012
decoders, or even in a fully parallel fashion. This flexibility al- If are linearly dependent, and hence
lows tradeoffs between cost/area/power and throughput. Fur- is ignored. So Algorithm 4 integrates detection of linearly
thermore, since the erasures are determined by and are the dependency at no extra computational cost.
same for all , the computation of and in Algo- Essentially, Algorithm 4 breaks down evaluations of high
rithm 2 can be shared among these parallel decoding problems, -degree polynomials into evaluations of polynomials with
thereby reducing overall complexity. -degree of one. It avoids operations with very high complexity
while maintaining the same total complexity of the algorithm.
F. Finding Minimal Linearized Polynomials
Minimal linearized polynomials can be computed by solving IV. ARCHITECTURE DESIGN
systems of linear equations. Given roots , the Aiming to reduce the storage requirement and total area as
minimal linearized polynomial satisfies well as to improve the regularity of our decoder architectures,
we further reformulate the steps in the decoding algorithms of
both Gabidulin and KK codes. Again, we assume the decoder
.. .. .. .. .. (6) architectures are suitable for RLNC over , where is a power
.. .
. . . . . of two.
B. Generalized BMA
The key equation of KK decoding is essentially the same as
(2), but has a -degree less than instead of .
Actually, in KK decoding, we do not know the exact value of
before solving the key equation. All we need is to determine
the maximum number of correctable errors given erasures
and deviations, which is given by . Fig. 4. Processing element BE (x is a cyclic shift and requires no hardware
but wiring).
Hence, we adapt our BMA in Section III-C to KK decoding, as
in Algorithm 6. To apply Algorithm 6 to Gabidulin decoding,
we can simply use . Gaussian eliminations on matrices over , while Gabidulins
algorithm operates on matrices over . Here, we focus on
Algorithm 6 (Generalized RiBMA). Gaussian eliminations over and Gabidulins algorithm will
be discussed in Section IV-D.
Input: and For high-throughput implementations, we adapt the pivoting
architecture in [31], which was developed for non-singular ma-
Output: trices over . It always keeps the pivot element on the top-
6.1 Initialize as follows: , left location of the matrix, by cyclically shifting the rows and
columns. Our Gaussian elimination algorithm, shown in Algo-
,
rithm 7, has three key differences from the pivoting architecture
, , and . in [31]. First, Algorithm 7 is applicable to matrices over any
6.2 For , field. Second, and more importantly, Algorithm 7 can be used
a) Modify the combined polynomial: for singular matrices. This feature is necessary since singular
; matrices occur in the reduction for the RRE form and finding
b) Set ; the root space. Third, Algorithm 7 is also flexible about matrix
c) If and , set , , sizes, which are determined by the variable numbers of errors,
and ; erasures, and deviations.
d) Set ,
; Algorithm 7 (Gaussian Elimination for Root Space).
e) Set and
. Input: , whose rows are evaluations of over
the normal basis, and
6.3 Set .
Output: Linearly independent roots of
Compared with Algorithm 5, we replace by . The variable 7.1 Set .
makes it difficult to design regular architectures. By carefully 7.2 For
initializing and , we ensure that the desired a)
output is always at a fixed position of , regard- b) While and
less of . Hence, the only irregular part is the initialization. , , and .
The initialization of Algorithm 6 can be done by shifting in at c) If is not zero, , ,
most cycles. Hence, the RiBMA architecture in Fig. 3 can be and ; Otherwise, .
adapted to the KK decoder and keep the same worst case latency 7.3 The first rows of are all zeros and the first
of cycles. rows of are roots.
C. Gaussian Elimination
The eliminate and shiftup operations are quite similar to those
We need Gaussian elimination to obtain -RRE forms as well in [31, Algorithm 2]. In , for ,
as to find root spaces. Furthermore, Gabidulins algorithm in for
Algorithm 1 is essentially a smart way of Gaussian elimination, , and . Note that a cyclic row
which takes advantage of the properties of the matrix. The re- shift and a cyclic column shift are already embedded in the elim-
duction (to obtain -RRE forms) and finding the root space are inate operation. In the operation, the first row
304 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012
TABLE III
WORST CASE DECODING LATENCY (IN TERMS OF CLOCK CYCLES). GAUSSIAN
ELIMINATION OVER (ROOT SPACE IN GABIDULIN AND KK DECODERS)
HAS THE LONGEST CRITICAL PATH OF ONE MULTIPLIER, ONE ADDER, ONE
TWO-INPUT MUX, AND ONE FIVE-INPUT MUX
Fig. 10. Architecture of linearized polynomial interpolation.
TABLE V
PERFORMANCE OF KK DECODERS FOR 512-byte PACKETS
and in turn larger fields are required. A larger field size implies REFERENCES
a higher complexity for finite field arithmetic, and longer codes [1] R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, Network infor-
with greater error correction capability also lead to higher com- mation flow, IEEE Trans. Inf. Theory, vol. 46, no. 4, pp. 12041216,
Jul. 2000.
plexity. It remains to be seen whether the decoder architectures [2] T. Ho, M. Mdard, R. Koetter, D. R. Karger, M. Effros, J. Shi, and B.
continue to be affordable for longer codes over larger fields, and Leong, A random linear network coding approach to multicast, IEEE
this will be the subject of our future work. Trans. Inf. Theory, vol. 52, no. 10, pp. 44134430, Oct. 2006.
[3] R. Ktter and F. R. Kschischang, Coding for errors and erasures in
random network coding, IEEE Trans. Inf. Theory, vol. 54, no. 8, pp.
VI. CONCLUSION 35793591, Aug. 2008.
[4] N. Cai and R. W. Yeung, Network coding and error correction, in
This paper presents novel hardware architectures for Proc. IEEE Inf. Theory Workshop (ITW02), Oct. 2025, 2002, pp.
119122.
Gabidulin and KK decoders. Our work not only reduces the [5] D. Silva, F. R. Kschischang, and R. Ktter, A rank-metric approach
computational complexity for the decoder but also devises to error control in random network coding, IEEE Trans. Inf. Theory,
regular architectures suitable for hardware implementations. vol. 54, no. 9, pp. 39513967, Sep. 2008.
[6] E. M. Gabidulin, Theory of codes with maximum rank distance,
Synthesis results using a standard cell library confirm that our Probl. Inf. Transm., vol. 21, no. 1, pp. 112, Jan.Mar. 1985.
designs achieve high speed and high throughput. [7] R. M. Roth, Maximum-rank array codes and their application to criss-
cross error correction, IEEE Trans. Inf. Theory, vol. 37, no. 2, pp.
APPENDIX A 328336, Mar. 1991.
PROOF OF LEMMA 1 [8] D. Silva and F. R. Kschischang, On metrics for error correction in net-
work coding, IEEE Trans. Inf. Theory, vol. 55, no. 12, pp. 54795490,
Proof: This follows the proof of [5, Proposition 7] closely. Dec. 2009.
[9] M. Gadouleau and Z. Yan, Complexity of decoding Gabidulin codes,
Suppose the RRE of is in Proc. 42nd Ann. Conf. Inf. Sci. Syst. (CISS08), Princeton, NJ, Mar.
1921, 2008, pp. 10811085.
[10] F. R. Kschischang and D. Silva, Fast encoding and decoding of
and an - form of is . Since the RRE Gabidulin codes, in Proc. IEEE Int. Symp. Inf. Theory (ISIT09),
Seoul, Korea, Jun. 28Jul. 3 2009, pp. 28582862.
form of is unique, . Thus, and . In the [11] P. A. Chou, Y. Wu, and K. Jain, Practical network coding, in Proc.
proof of [5, Proposition 7], is chosen based on . Thus, we 41st Ann. Allerton Conf. Commun., Control, and Computing, Monti-
cello, IL, Oct. 2003.
choose . Since is uniquely determined by and [12] G. Richter and S. Plass, Error and erasure decoding of rank-codes with
a modified Berlekamp-Massey algorithm, in Proc. 5th Int. ITG Conf.
is by , we also have . Finally, choosing , Source and Channel Coding (SCC04), Erlangen, Germany, Jan. 2004,
the rest follows the same steps as in the proof of [5, Proposition pp. 249256.
7]. [13] N. Chen and Z. Yan, Cyclotomic FFTs with reduced additive complex-
ities based on a novel common subexpression elimination algorithm,
APPENDIX B IEEE Trans. Signal Process., vol. 57, no. 3, pp. 10101020, Mar. 2009.
PROOF OF LEMMA 2 [14] H. Burton, Inversionless decoding of binary BCH codes, IEEE Trans.
Inf. Theory, vol. IT-17, no. 4, pp. 464466, Jul. 1971.
Proof: This follows a similar approach as in [5, Appendix [15] D. V. Sarwate and N. R. Shanbhag, High-speed architectures for
C]. We havethe following. ReedSolomon decoders, IEEE Trans. Very Large Scale Integr.
(VLSI) Syst., vol. 9, no. 5, pp. 641655, Oct. 2001.
[16] P. Lancaster and M. Tismenetsky, The Theory of Matrices, ser.
Comput. Sci. Appl. Math., 2nd ed. Orlando, FL: Academic, 1985.
[17] M. Gadouleau and Z. Yan, On the decoder error probability of
bounded rank-distance decoders for maximum rank distance codes,
IEEE Trans. Inf. Theory, vol. 54, no. 7, pp. 32023206, Jul. 2008.
[18] M. Gadouleau and Z. Yan, Decoder error probability of bounded
distance decoders for constant-dimension codes, in Proc. IEEE Int.
(8) Symp. Inf. Theory (ISIT09), Seoul, Korea, Jun. 28Jul. 3 2009, pp.
22262230.
[19] M. Gadouleau and Z. Yan, On the decoder error probability of
bounded rank distance decoders for rank metric codes, in Proc. IEEE
Inf. Theory Workshop (ITW09), Sicily, Italy, Oct. 1116, 2009, pp.
485489.
[20] P. Delsarte, Bilinear forms over a finite field, with applications to
coding theory, J. Comb. Theory, Ser. A, vol. 25, pp. 226241, 1978.
(9) [21] O. Ore, On a special class of polynomials, Trans. Amer. Math. Soc.,
vol. 35, no. 3, pp. 559584, 1933.
[22] O. Ore, Contributions to the theory of finite fields, Trans. Amer.
Math. Soc., vol. 36, no. 2, pp. 243274, 1934.
where (8) follows from and (9) follows [23] P. Loidreau, A Welch-Berlekamp like algorithm for decoding
from . Since , the Gabidulin codes, in Proc. 4th Int. Workshop Coding and Cryptog-
raphy (WCC05), Bergen, Norway, Mar. 1418, 2005, vol. 3969,
subspace distance is given by Lecture Notes in Computer Science, pp. 3645.
[24] E. R. Berlekamp, Algebraic Coding Theory. New York: McGraw-
Hill, 1968.
. [25] V. Skachek and R. M. Roth, Probabilistic algorithm for finding roots
of linearized polynomials, Des. Codes Cryptogr., vol. 46, no. 1, pp.
1723, Jan. 2008.
ACKNOWLEDGMENT [26] E. D. Mastrovito, VLSI architectures for computations in Galois
fields, Ph.D. dissertation, Dept. Electr. Eng., Linkping Univ.,
The authors would like to thank Dr. D. Silva and Prof. F. R. Linkping, Sweden, 1991.
[27] M. Wagh and S. Morgera, A new structured design method for convo-
Kschischang for valuable discussions and the reviewers for their lutions over finite fieldsPart I, IEEE Trans. Inf. Theory, vol. IT-29,
constructive comments. no. 4, pp. 583595, Jul. 1983.
CHEN et al.: RANK METRIC DECODER ARCHITECTURES FOR RLNC WITH ERROR CONTROL 309
[28] J. K. Omura and J. L. Massey, Computational method and apparatus Maximilien Gadouleau (S06M10) received the
for finite field arithmetic, U.S. Patent 4 587 627, May 6, 1986. M.S. degree in electrical and computer engineering
[29] A. Reyhani-Masoleh and M. A. Hasan, A new construction of from Esigelec, Saint-Etienne du Rouvray, France, in
MasseyOmura parallel multiplier over GF(2 ) , IEEE Trans. 2004, and the M.S. and Ph.D. degrees in computer
Comput., vol. 51, no. 5, pp. 511520, May 2002. engineering from Lehigh University, Bethlehem, PA,
[30] E. R. Berlekamp, Algebraic Coding Theory, revised ed. Laguna Hills, in 2005 and 2009, respectively.
CA: Aegean Park Pres, 1984. From 2009 to 2010, he was a Postdoctoral
[31] A. Bogdanov, M. C. Mertens, C. Paar, J. Pelzl, and A. Rupp, A par- Researcher with the Universit de Reims Cham-
allel hardware architecture for fast Gaussian elimination over GF(2), pagne-Ardenne, Reims, France. In 2010, he joined
in Proc. 14th Ann. IEEE Symp. Field-Programmable Custom Com- the School of Electronic Engineering and Computer
puting Machines (FCCM06), Napa Valley, CA, Apr. 2426, 2006, pp. Science at Queen Mary, University of London,
237248. London, U.K., as a Postdoctoral Research Assistant. His current research inter-
[32] J. E. Stine, I. Castellanos, M. Wood, J. Henson, F. Love, W. R. Davis, ests are coding theory, network coding, and cryptography, and their relations to
P. D. Franzon, M. Bucher, S. Basavarajaiah, J. Oh, and R. Jenkal, combinatorics and graph theory.
FreePDK: An open-source variation-aware design kit, in Proc. IEEE Dr. Gadouleau is a member of the IEEE Information Theory Society.
Int. Conf. Microelectron. Syst. Education (MSE07), San Diego, CA,
Jun. 34, 2007, pp. 173174.
[33] H. Lee, High-speed VLSI architecture for parallel ReedSolomon de-
coder, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no.
2, pp. 288294, Apr. 2003. Ying Wang (S00M06) received the Ph.D. degree
in electrical and computer engineering from the Uni-
versity of Illinois at Urbana-Champaign, Urbana, IL,
in 2006.
Currently, she is a Staff Engineer with the New
Ning Chen (S06M10) received the B.E. and M.E. Jersey Research Center of QUALCOMM Corporate
degrees from Tsinghua University, Beijing, China, in Research and Development, Bridgewater, NJ. Her
2001 and 2004, respectively, and the Ph.D. degree research interests include wireless communications,
from Lehigh University, in 2010, all in electrical en- statistical signal processing, multimedia security and
gineering. forensics, and detection and estimation theory.
Currently he is with the Enterprise Storage Di-
vision, PMC-Sierra, Allentown, PA. His research
interests are in the VLSI design and implementation
of digital signal processing and communication
systems. Bruce W. Suter (S80M85SM92) received the
B.S. and M.S. degrees in electrical engineering and
Ph.D. degree in computer science from the Univer-
sity of South Florida, Tampa, in 1972 and 1988, re-
spectively.
Zhiyuan Yan (S00M03SM08) received the In 1998, he joined the Technical Staff of the U.S.
B.E. degree in electronic engineering from Tsinghua Air Force Research Laboratory, Rome, NY, where he
University, Beijing, China, in 1995, and the M.S. was the founding Director of the Center for Integrated
and Ph.D. degrees in electrical engineering from the Transmission and Exploitation (CITE). He has held a
University of Illinois, Urbana, in 1999 and 2003, visiting appointments at Harvard University and the
respectively. Massachusetts Institute of Technology. His previous
During the summers of 2000 and 2002, he was positions include academia at the U.S. Air Force Institute of Technology and
a Research Intern with Nokia Research Center, the University of Alabama at Birmingham, together with industrial positions at
Irving, TX. He joined the Electrical and Computer Honeywell Inc. and Litton Industries. He is the author or coauthor of over 100
Engineering Department, Lehigh University, Beth- technical publications and the author of a widely accepted monograph, Multirate
lehem, PA, as an Assistant Professor in August and Wavelet Signal Processing (Academic Press, 1998). His current research
2003. He has authored or coauthored over 70 technical papers in refereed interests are focused on wireless computing networks and their applications to
journals and conference proceedings. He is an associate editor for the Journal signal and image processing.
of Signal Processing Systems. His current research interests are in coding Dr. Suter is a member of Tau Beta Pi and Eta Kappa Nu. He was the re-
theory, cryptography, wireless communications, and VLSI implementations of cipient of the Air Force Research Laboratory (AFRL) Fellow, an AFRL-wide
communication and signal processing systems. award for his accomplishments in the theory, application, and implementation
Dr. Yan is a member of Tau Beta Pi, Sigma Xi, and Phi Kappa Phi. He served of signal processing algorithms, the Arthur S. Flemming Award, a government-
as technical program committee co-chair and general co-chair for ACM Great wide award for his pioneering Hankel transform research, the General Ronald
Lakes Symposium on VLSI in 2007 and 2008, respectively. He has served as an W. Yates Award for Excellence in Technology Transfer for his patented Fourier
associate editor for the IEEE COMMUNICATIONS LETTERS since 2008. He is a transform processor, and the Fred I. Diamond Award for best laboratory re-
member of the IEEE Information Theory, Communications, Signal Processing, search publication. He is a former associate editor of the IEEE TRANSACTIONS
and Computer Societies. ON SIGNAL PROCESSING.