0% found this document useful (0 votes)
20 views4 pages

LATS_2022_v7

This paper reviews logarithmic multiplier designs that convert multiplication into more efficient addition and shifting operations, focusing on their performance and accuracy trade-offs. It discusses various design perspectives, including computing architecture, input preprocessing, and logarithm conversion methods, highlighting their applications in error-tolerant systems such as digital signal processing and neural networks. The conclusion emphasizes the need for further research into hybrid logarithmic multipliers to optimize the balance between hardware efficiency and accuracy.

Uploaded by

biomedical420
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views4 pages

LATS_2022_v7

This paper reviews logarithmic multiplier designs that convert multiplication into more efficient addition and shifting operations, focusing on their performance and accuracy trade-offs. It discusses various design perspectives, including computing architecture, input preprocessing, and logarithm conversion methods, highlighting their applications in error-tolerant systems such as digital signal processing and neural networks. The conclusion emphasizes the need for further research into hybrid logarithmic multipliers to optimize the balance between hardware efficiency and accuracy.

Uploaded by

biomedical420
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

A Brief Review of Logarithmic Multiplier Designs

Tingting Zhang Zijing Niu Jie Han


Department of Electrical Department of Electrical Department of Electrical
& Computer Engineering & Computer Engineering & Computer Engineering
University of Alberta University of Alberta University of Alberta
Edmonton, Canada Edmonton, Canada Edmonton, Canada
ttzhang@ualberta.ca zijing2@ualberta.ca jhan8@ualberta.ca

Abstract—The base-2 logarithmic arithmetic converts multipli- II. R ELATED W ORK


cation to hardware-efficient shift and addition. Aimed for a good
trade-off between circuit complexity and accuracy, a logarithmic
Approximate multipliers have extensively been studied to
multiplier inherently introduces inevitable approximation into the achieve a good trade-off between accuracy and hardware
computation. This paper briefly reviews logarithmic multiplier efficiency for error tolerant applications [11]. The accuracy
designs from different perspectives on improving performance is evaluated by using a number of error metrics [14], such
and accuracy, followed by a summary of their applications in as the error rate, the error distance, and the root-mean-square
error-tolerant systems.
error.
Index Terms—Approximate computing, logarithmic multiplier,
neural network Approximate non-logarithmic multiplier designs mainly fo-
cus on the simplification in partial product (PP) generation
and accumulation for signed and unsigned multiplications.
I. I NTRODUCTION
Approximation in the PP generation, for example, includes
The logarithmic number system (LNS) is an alternative to the use of an approximate 2 × 2 multiplier to construct
conventional fixed-point (FxP) and floating-point (FP) number larger unsigned multipliers [15], [16], approximate encoding to
representations in a digital system. It benefits from a wider reduce the delay of calculating the triple multiplicands for the
numerical range than FxP [1], a lower round-off noise than FP radix-8 Booth encoding algorithm [17], [18], and approximate
[2] and a milder switching activity [3]. Recent researches show encoders for the radix-4 Booth encoding algorithm [19]–
that the LNS with a reduced bit-width of numbers achieves [21]. Approximation in the PP accumulation includes the
a high output quality when implementing impulse response use of truncation [22], approximate PP trees [23], [24], and
filters [4], [5] and training neural networks (NNs) [6]–[8]. approximate compression using approximate adders [25], [26]
Approximate computing has emerged as a low power tech- or compressors [27]–[30].
nique by exploiting the inherent error resiliency in applications LMs deliver a mathematically elegant solution by replacing
such as digital signal processing and NNs [9], [10]. As a fun- multiplication with addition and shifting. Mitchell first pro-
damental arithmetic operation, multiplication often dominates posed a simple approximation method for binary logarithmic
the performance of circuits and systems. Therefore, designs conversion [31]. However, its error characteristic of always
of approximate multipliers have extensively been investigated, underestimating the product leads to a dramatic error accu-
including logarithmic multipliers (LMs) [11]. mulation in circuits and systems [11]. Thus, more hardware-
Without using a complex procedure, the base-2 logarithmic efficient LM designs have recently been pursued for a higher
arithmetic converts multiplication to simple additions and accuracy in intensive computations [32]–[34].
bit-shifting, therefore achieving a significant improvement in
III. L OGARITHMIC M ULTIPLIER D ESIGNS
hardware efficiency [12], [13]. In this paper, we review both
FxP and FP LMs from different design perspectives, as well A. Preliminaries
as their error tolerant applications. Let PN be an m-bit number in the binary representation,
m−1
The remainder of this paper is organized as follows. Section N = i=0 2i ni , where ni denotes the bit value at the ith
II presents the related work. LMs are reviewed in Section III. position. Let k denotes
Pk−1the position of the leading ‘1’ bit, we
Sections IV summarizes the applications. Section V concludes obtain N = 2k + i=0 2i ni . The binary (base-2) logarithm
this paper and discusses future prospects. of N is given by:

This work was partly supported by the Natural Sciences and Engineering log2 N = log2 (2k (1 + x)) = k + log2 (1 + x), (1)
Research Council (NSERC) of Canada (RES0048688 and RES0054326), Pk−1 i−k
MITACS (RES0058253), and Alberta Innovates (RES0053965). T. Zhang is where x (= i=0 2 ni and 0 ≤ x < 1)) represents the
supported by a Ph.D. scholarship from the China Scholarship Council (CSC). fractional part and k indicates the exponent. If P = AB,
log2 P is computed as:
978-1-6654-5707-1/22/$31.00 ©2022 IEEE log2 P = kA + log2 (1 + xA ) + kB + log2 (1 + xB ), (2)
where the exponents and fractional parts of A and B B. A Review
are respectively denoted with the corresponding subscripts. In this section, LMs are reviewed from different perspectives
Mitchell’s method approximates log2 (1 + x) by x, leading in the design process.
to [31]: 1) Computing Architecture: The iterative LM (ILM) im-
log2 P ∼
= kA + kB + xA + xB . (3) proves the accuracy of Mitchell’s LM by compensating errors
The final product P is obtained by using an antilogarithm through an iterative procedure [32]. The general architecture
function to compute 2log2 P . In Mitchell’s method, P is for an ILM is shown in Fig. 2. The basic block computes the
approximately given by [31]: approximate product of two inputs using the LM, followed by
( an error term calculator (ETC) to generate two errors. Then,
∼ 2kA +kB (1 + xA + xB ), xA + xB < 1, the LM serves as the error compensation block (ECB) by
P = (4) multiplying these two errors to obtain the error compensation
2kA +kB +1 (xA + xB ), xA + xB ≥ 1.
term, which is then added to the approximate product as the
The basic architecture of an FxP/FP LM is presented in final product. The use of additional ECBs leads to a higher
Fig. 1. It is implemented in three stages: (1) the logarithmic accuracy at a cost of a larger circuit.
conversion of the input operands, (2) the addition of the loga- To reduce the critical path delay in the basic block, pipelin-
rithms, and (3) computing the antilogarithm and converting it ing is used to implement the ILM in [32] for a higher level
to the standard number representation. Consider signed FxP of parallelism. However, for a higher performance, the ILM
multiplication in two’s complement; a signed to unsigned neglects the comparison between xA +xB and 1 in (4), thus at
(S2U) converter is particularly used to convert the inputs at a cost of accuracy. The truncated ILM in [35] compensates this
Stage 1. Different from an FP LM, an FxP LM requires error by considering the comparison prior to the completion of
a leading one detector (LOD) and a priority encoder (PE) the current iteration and utilizes truncation to reduce hardware.
to obtain the mantissa and the exponent for the logarithm A low-cost two-stage ILM further compensates errors in the
conversion. The LOD finds the most significant ‘1’ in each addition and uses a truncated LM [36].
input operand, whose bit position is then obtained by the
PE. The log converter implements a logarithm approximation. 𝐴 𝐵
At Stage 2, the multiplication is performed in the logarithm
domain by using two adders. At Stage 3, the sum of two Basic 1st ECB: mth ECB:
Block: LM
ETC
LM
...
logarithms is approximately converted to its antilogarithmic LM

value. The results are then decoded to obtain the final product.
...
Lastly, the signs of the inputs, sA and sB , are XOR-ed to
obtain the sign of the product P . Adder
Adder

𝐴 𝐵
𝑃

For FxP For FxP Fig. 2: An Iterative Logarithmic Multiplier Architecture.


S2U S2U

LOD LOD
2) Input Preprocessing: Some LM designs first preprocess
Stage 1 the input operands to simplify the circuit or to improve the
PE PE accuracy of Mitchell’s logarithmic approximation.
Log Log
a) Hardware-oriented: As a common technique, trunca-
𝑘𝐴 𝑘𝐵
Converter Converter tion has been considered in iterative and noniterative LMs
𝑥𝐴 𝑥𝐵 to generate smaller bit-width operands for processing [35]–
[39]. Especially, Yin et al. proposed a dynamic range LM for
signed and unsigned multiplication by dynamically truncating
Stage 2
Adder Adder input bits and setting the least significant bit of the truncated
𝑘𝐴 + 𝑘𝐵 𝑥𝐴 + 𝑥𝐵 operand to ‘1’ to compensate for the negative errors introduced
by Mitchell’s approximation [38]. Pilipović et al. split the
𝑠𝐴 𝑠𝐵

Antilog Converter
input operands into less and more significant sections, and
trimmed the less significant bits when the more significant
Stage 3
section contains at least one non-zero bit; otherwise, the more
Conversion to FxP/FP significant section is trimmed [40]. Instead of using the S2U
units, Kim et al. approximated two’s complement with one’s
complement for signed multiplication to reduce computation
𝑃
complexity [39].
Fig. 1: A Logarithmic Multiplier Architecture. b) Accuracy-oriented: Mahalingam and Ranganathan de-
composed the two input operands to four for decreasing the
probability that a bit ‘1’ occurs in the decomposed inputs, the 16-bit signed FxP LM in [43] produces a similar output
thus reducing the switching power [41]. The hybrid-encoding for image multiplication and smoothing, and results in a slight
LMs developed in [42] and [43] first decompose the input quality loss for image compression.
operands into two sections, and then apply the logarithm-based For NN-based applications, inference is less sensitive to
and radix-4 Booth multiplication, respectively, to generate less precision loss, so FxP LMs are usually applied for inference
and more significant bits in the product. [33], [36], [38]–[40], [42], [44]. The use of a 16-bit LM for
3) LOD Designs: In FxP LMs, the LOD unit at Stage 1 convolutional NN-based classification on MNIST, CIFAR-10,
accounts for about 49% and 54% of the area and energy, and ImageNet ILSVRC2012 datasets achieves nearly the same
respectively [44]. It detects the left-most ‘1’ and generates accuracy as that by using the exact design [36]. Interestingly,
a one-hot encoded output. For hardware efficiency, the LM compared to using the exact FxP multiplier, the use of an
in [45] achieves a 3× speed up by first grouping the input 8-bit FxP LM increases the classification accuracy of the
bits to sets of four, and then using OR gates and 4-bit LODs AlexNet with a smaller PDP [33]. Due to a wider range of
to detect the leading ‘1’. Approximate 32-bit LOD designs representation, hardware-efficient FP multipliers have been
are constructed by 4-bit LOD units in [44] by using two used for the training of NNs [34], [50]. The FP LM was
approaches. The first approach is to approximate 4-bit LODs evaluated for classifying MNIST at four precision levels. It
with a single fixed bias or dynamic biases determined by slightly improves the classification accuracy and uses less
a PE-controlled multiplexer. The second is to approximately energy for implementing a neuron in the single-precision
simplify a 32-bit LOD design by using an exact 16-bit LOD. It representation [34].
partitions the 32-bit inputs into two parts. The more significant K-means clustering is an unsupervised machine learning
half is processed by using an exact 16-bit LOD if it contains at algorithm, which is used to partition data points into groups.
least one ‘1’; otherwise, the less significant half is processed. LMs can be applied to calculate the squared deviation between
To reduce large errors for small inputs, a scaling scheme is points belonging to different clusters [38], [46]. The F-measure
further adopted by using bit-shift operations [44]. value is commonly used to evaluate the quality of clustering.
4) Logarithm Conversion: Mitchell’s method generates a Compared to the exact FxP multiplier, a 16-bit FxP LM
single-sided error distribution that leads to error accumulation. provides better F-measure values for the selected University of
To produce a double-sided error distribution, a nearest-one California Irvine (UCI) benchmark datasets [46]. A dynamic
logarithmic approximation finds the nearest power of two range LM leads to similar clustering results [38].
for N [33]. When N − 2k < 2k+1 − N , it uses the same
logarithm as Mitchell’s method; otherwise, N is represented V. C ONCLUSIONS AND P ROSPECTS
as N = 2k+1 (1 − y), where 0 ≤ y < 1, and the logarithm is
approximated by log2 N ∼ = k + 1 − y. In a circuit implemen- Logarithmic multipliers significantly improve the hardware
tation, instead of using LODs, a nearest-one detector (NOD) efficiency of error-tolerant applications by converting multi-
was designed to detect the nearest power of two. Inspired by plication to addition and shifting operations. In this paper,
this method, the logarithmic design in [34] generates a double- logarithmic multipliers are briefly reviewed from different
sided error distribution for FP LMs. design perspectives.
5) Addition Units: To further improve hardware efficiency, Compared with non-logarithmic approximate multipliers,
approximate designs have been considered for the mantissa Mitchell’s logarithmic multiplier benefits from its simple struc-
addition, including three types of approximate adders utilized ture, but with a large accuracy loss due to the underestimated
in [46]: the lower-part-OR adder (LOA) [47], the approximate product. Aimed for an optimized accuracy, the iterative loga-
mirror adder-A3 (MAA3) [48] and the set-one adder (SOA) rithmic algorithm suffers from a large circuit area and a long
[46]. The LOA calculates less significant bits by using OR delay required for error compensation. Hybrid logarithmic
gates. In the MAA3, one of the two inputs is approximately multipliers that collaboratively use Mitchell’s approximation
considered as less significant bits in the sum for addition. and a more accurate computation method for respectively
The SOA sets the less significant bits to ‘1’s. Ansari et al. encoding less and more significant input bits, worth further
used a modified SOA, which sets the less significant bits as investigation for a better trade-off between hardware and
alternating ‘1’s and ‘0’s [33], [44]. accuracy. Moreover, dedicated approximate circuits can be
designed to compensate errors due to the logarithmic approx-
IV. A PPLICATIONS imation for improvements in both hardware and accuracy.
LMs have been considered to improve the hardware ef- For emerging NN applications, a state-of-the-art 4-bit train-
ficiency of error tolerant systems, such as those for image ing strategy based on a logarithmic radix-4 format shows
processing and machine learning. a great potential of using the LNS with a small bit width
Applications in image processing include multiplication for a significant hardware improvement. There is also some
[43], [46], sharpening [37], [42], compression [43], smoothing evidence that the use of logarithmic multipliers is likely to im-
[40], [43], [49], and matching [32]. Two main evaluation met- prove accuracy for classification. However, it is a challenge to
rics for image processing are the peak signal-to-noise ratio and generalize this result. An analysis on the relationship between
the structural similarity. Compared with exact multiplication, the error characteristics of LMs and the various features of
NN applications may help understand this important issue. The [25] H. Jiang, C. Liu, F. Lombardi, and J. Han, “Low-power approximate
verification and test of LMs remain open for future research. unsigned multipliers with configurable error recovery,” TCAS I, vol. 66,
no. 1, pp. 189–202, 2018.
[26] C. Liu, J. Han, and F. Lombardi, “A low-power, high-performance
R EFERENCES approximate multiplier with configurable partial error recovery,” in
DATE, 2014, pp. 1–4.
[1] N. G. Kingsbury and P. J. Rayner, “Digital filtering using logarithmic [27] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and
arithmetic,” Electron. Lett., vol. 7, no. 2, pp. 56–58, 1971. analysis of approximate compressors for multiplication,” TC, vol. 64,
[2] B. Parhami, “Computing with logarithmic number system arithmetic: no. 4, pp. 984–994, 2014.
Implementation methods and performance benefits,” Comput. Electr. [28] S. Venkatachalam and S.-B. Ko, “Design of power and area efficient
Eng., vol. 87, p. 106800, 2020. approximate multipliers,” TVLSI, vol. 25, no. 5, pp. 1782–1786, 2017.
[3] V. Paliouras and T. Stouraitis, “Logarithmic number system for low- [29] M. S. Ansari, H. Jiang, B. F. Cockburn, and J. Han, “Low-power
power arithmetic,” in PATMOS, 2000, pp. 285–294. approximate multipliers using encoded partial products and approximate
[4] C. Basetas, I. Kouretas, and V. Paliouras, “Low-power digital filtering compressors,” JETC, vol. 8, no. 3, pp. 404–416, 2018.
based on the logarithmic number system,” in PATMOS, 2007, pp. 546– [30] D. Esposito, A. G. M. Strollo, E. Napoli, D. De Caro, and N. Petra,
555. “Approximate multipliers based on new approximate compressors,”
TCAS I, vol. 65, no. 12, pp. 4169–4182, 2018.
[5] S. A. Alam and O. Gustafsson, “Design of finite word length linear-
[31] J. N. Mitchell, “Computer multiplication and division using binary
phase FIR filters in the logarithmic number system domain,” VLSI Des.,
logarithms,” IRE Trans. Electron. Comput., no. 4, pp. 512–517, 1962.
vol. 2014, 2014.
[32] Z. Babić, A. Avramović, and P. Bulić, “An iterative logarithmic multi-
[6] X. Sun, N. Wang, C.-Y. Chen, J. Ni, A. Agrawal et al., “Ultra-low
plier,” Microprocess. Microsyst., vol. 35, no. 1, pp. 23–33, 2011.
precision 4-bit training of deep neural networks,” NIPS, vol. 33, pp.
[33] M. S. Ansari, B. F. Cockburn, and J. Han, “An improved logarithmic
1796–1807, 2020.
multiplier for energy-efficient neural computing,” TC, vol. 70, no. 4, pp.
[7] X. Sun, J. Choi, C.-Y. Chen, N. Wang, S. Venkataramani et al.,
614–625, 2020.
“Hybrid 8-bit floating point (hfp8) training and inference for deep neural
[34] Z. Niu, H. Jiang, M. S. Ansari, B. F. Cockburn, L. Liu, and J. Han, “A
networks,” NIPS, vol. 32, 2019.
logarithmic floating-point multiplier for the efficient training of neural
[8] E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, “Lognet: networks,” in GLSVLSI, 2021, pp. 65–70.
Energy-efficient neural networks using logarithmic computation,” in [35] S. E. Ahmed, S. Kadam, and M. Srinivas, “An iterative logarithmic
ICASSP, 2017, pp. 5900–5904. multiplier with improved precision,” in ARITH, 2016, pp. 104–111.
[9] J. Han and M. Orshansky, “Approximate computing: An emerging [36] H. Kim, M. S. Kim, A. A. Del Barrio, and N. Bagherzadeh, “A cost-
paradigm for energy-efficient design,” in ETS, 2013, pp. 1–6. efficient iterative truncated logarithmic multiplication for convolutional
[10] M. Traiola, A. Virazel, P. Girard, M. Barbareschi, and A. Bosio, “A neural networks,” in ARITH, 2019, pp. 108–111.
survey of testing techniques for approximate integrated circuits,” Proc. [37] S. E. Ahmed and M. Srinivas, “An improved logarithmic multiplier for
IEEE, vol. 108, no. 12, pp. 2178–2194, 2020. media processing,” J. Signal Process. Syst., vol. 91, no. 6, pp. 561–574,
[11] H. Jiang, F. J. H. Santiago, H. Mo, L. Liu, and J. Han, “Approximate 2019.
arithmetic circuits: A survey, characterization, and recent applications,” [38] P. Yin, C. Wang, H. Waris, W. Liu, Y. Han, and F. Lombardi, “Design
Proc. IEEE, vol. 108, no. 12, pp. 2108–2135, 2020. and analysis of energy-efficient dynamic range approximate logarithmic
[12] V. Paliouras and T. Stouraitis, “Low-power properties of the logarithmic multipliers for machine learning,” TSUSC, vol. 6, no. 4, pp. 612–625,
number system,” in ARITH, 2001, pp. 229–236. 2020.
[13] O. Gustafsson and N. Hellman, “Approximate floating-point operations [39] M. S. Kim, A. A. Del Barrio, L. T. Oliveira, R. Hermida, and
with integer units by processing in the logarithmic domain,” in ARITH, N. Bagherzadeh, “Efficient Mitchell’s approximate log multipliers for
2021, pp. 45–52. convolutional neural networks,” TC, vol. 68, no. 5, pp. 660–675, 2018.
[14] J. Liang, J. Han, and F. Lombardi, “New metrics for the reliability of [40] R. Pilipović, P. Bulić, and U. Lotrič, “A two-stage operand trimming
approximate and probabilistic adders,” TC, vol. 62, no. 9, pp. 1760– approximate logarithmic multiplier,” TCAS I, vol. 68, no. 6, pp. 2535–
1771, 2012. 2545, 2021.
[15] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading accuracy for power [41] V. Mahalingam and N. Ranganathan, “Improving accuracy in Mitchell’s
with an underdesigned multiplier architecture,” in VLSID, 2011, pp. 346– logarithmic multiplication using operand decomposition,” TC, vol. 55,
351. no. 12, pp. 1523–1535, 2006.
[16] W. Liu, T. Zhang, E. McLarnon, M. O’Neill, P. Montuschi, and F. Lom- [42] R. Pilipović and P. Bulić, “On the design of logarithmic multiplier using
bardi, “Design and analysis of majority logic-based approximate adders radix-4 Booth encoding,” IEEE Access, vol. 8, pp. 64 578–64 590, 2020.
and multipliers,” TETC, vol. 9, no. 3, pp. 1609–1624, 2019. [43] U. Lotrič, R. Pilipović, and P. Bulić, “A hybrid radix-4 and approximate
[17] H. Jiang, J. Han, F. Qiao, and F. Lombardi, “Approximate radix-8 Booth logarithmic multiplier for energy efficient image processing,” Electron.,
multipliers for low-power and high-performance operation,” TC, vol. 65, vol. 10, no. 10, p. 1175, 2021.
no. 8, pp. 2638–2644, 2015. [44] M. S. Ansari, S. Gandhi, B. F. Cockburn, and J. Han, “Fast and low-
[18] H. Waris, C. Wang, and W. Liu, “Hybrid low radix encoding-based power leading-one detectors for energy-efficient logarithmic computing,”
approximate Booth multipliers,” TCAS II, vol. 67, no. 12, pp. 3367– IET Comput. Digit. Tech., vol. 15, no. 4, pp. 241–250, 2021.
3371, 2020. [45] K. H. Abed and R. E. Siferd, “VLSI implementations of low-power
[19] W. Liu, L. Qian, C. Wang, H. Jiang, J. Han, and F. Lombardi, “Design leading-one detector circuits,” in SECON, 2006, pp. 279–284.
of approximate radix-4 Booth multipliers for error-tolerant computing,” [46] W. Liu, J. Xu, D. Wang, C. Wang, P. Montuschi, and F. Lombardi,
TC, vol. 66, no. 8, pp. 1435–1441, 2017. “Design and evaluation of approximate logarithmic multipliers for low
[20] W. Liu, T. Cao, P. Yin, Y. Zhu, C. Wang, E. E. Swartzlander, and power error-tolerant applications,” TCAS I, vol. 65, no. 9, pp. 2856–
F. Lombardi, “Design and analysis of approximate redundant binary 2868, 2018.
multipliers,” TC, vol. 68, no. 6, pp. 804–819, 2018. [47] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-inspired
[21] T. Zhang, H. Jiang, H. Mo, W. Liu, F. Lombardi, L. Liu, and J. Han, imprecise computational blocks for efficient vlsi implementation of soft-
“Design of majority logic-based approximate Booth multipliers for error- computing applications,” TCAS I, vol. 57, no. 4, pp. 850–862, 2009.
tolerant applications,” TNANO, vol. 21, pp. 81–89, 2022. [48] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, “Low-power
[22] G. Zervakis, K. Tsoumanis, S. Xydis, D. Soudris, and K. Pekmestzi, digital signal processing using approximate adders,” TCAD, vol. 32,
“Design-efficient approximate multiplication circuits through partial no. 1, pp. 124–137, 2012.
product perforation,” TVLSI, vol. 24, no. 10, pp. 3105–3117, 2016. [49] D. Nandan, J. Kanungo, and A. Mahajan, “An error-efficient Gaussian
[23] K. Bhardwaj, P. S. Mane, and J. Henkel, “Power-and area-efficient ap- filter for image processing by using the expanded operand decomposition
proximate wallace tree multiplier for error-resilient systems,” in ISQED, logarithm multiplication,” JAIHC, pp. 1–8, 2018.
2014, pp. 263–269. [50] T. Cheng, Y. Masuda, J. Chen, J. Yu, and M. Hashimoto, “Logarithm-
[24] S. Hashemi, R. I. Bahar, and S. Reda, “Drum: A dynamic range unbiased approximate floating-point multiplier is applicable to power-efficient
multiplier for approximate applications,” in ICCAD, 2015, pp. 418–425. neural network training,” Integration, vol. 74, pp. 19–31, 2020.

You might also like