Academia.eduAcademia.edu

An Iterative Mitchell's Algorithm Based Multiplier

2008

This paper presents a new multiplier with possibility to achieve an arbitrary accuracy. The multiplier is based upon the same idea of numbers representation as Mitchell's algorithm, but does not use logarithm approximation. The proposed iterative algorithm is simple and efficient, achieving an error percentage as small as required, until the exact result. Hardware solution involves adders and shifters, so it is not gate and power consuming. Parallel circuits are used for error correction. The error summary for operands ranging from 8-bits to 16-bits operands indicates very low error percentage with only two parallel correction circuits.

An Iterative Mitchell’s Algorithm Based Multiplier Zdenka Babić Aleksej Avramović Patricio Bulić Faculty of Electrical Engineering University of Banja Luka Patre 5, Banja Luka, BiH zdenka@etfbl.net Faculty of Electrical Engineering University of Banja Luka Patre 5, Banja Luka, BiH alexej@spinter.net Faculty of Computer and Information Science University of Ljubljana Tržaška c. 25, Ljubljana, Slovenia patricio.bulic@fri.uni-lj.si Abstract— This paper presents a new multiplier with possibility to achieve an arbitrary accuracy. The multiplier is based upon the same idea of numbers representation as Mitchell’s algorithm, but does not use logarithm approximation. The proposed iterative algorithm is simple and efficient, achieving an error percentage as small as required, until the exact result. Hardware solution involves adders and shifters, so it is not gate and power consuming. Parallel circuits are used for error correction. The error summary for operands ranging from 8-bits to 16-bits operands indicates very low error percentage with only two parallel correction circuits. Keywords— Computer arithmetic, digital signal processing, multiplier, logarithmic number system. I. I NTRODUCTION Multiplication has always been hardware, time and power consuming arithmetic operation, especially for large value operands. This bottleneck is even more emphasized in digital signal processing applications that involve a huge number of multiplications. In many signal processing applications the results of arithmetic operations do not have to be exactly accurate. For example, in signal compression techniques, quantization is usually performed after cosine or some other transform. Therefore, calculation of true transform coefficients values is not necessary and rounded products after multiplication by signal transformation are acceptable. Also, many digital signal processing systems can deal with an extra noise introduced. In these signal processing applications, where speed of calculation is more important than accuracy, logarithm number system (LNS) for multiplication seems to be suitable method. The main advantage of this method is substitution of multiplication with addition, after conversion operands into logarithms. But this simple idea has a significant weakness: a necessity for approximation of logarithm and antilogarithm. Therefore, logarithmic based solutions are trade-off between time consumption and accuracy. In well known Mitchell’s algorithm (MA) [1] for multiplication in LNS, high error is caused due to the piecewise straight line approximation of logarithm and antilogarithm curve. Later, many methods for MA error correction are introduced with more or less success [2], [3], [4], [5], [6]. LNS multipliers can be divided into two categories, one based on methods that use a lookup tables and interpolations and the other based on Mitchell’s algorithm, even there is a lookup table approach in some of the MA based methods. MA based solutions suppressed lookup tables due 978-1-4244-3555-5/08/$25.00 ©2008 IEEE to hardware area savings. This paper presents a new iterative solution for multiplication, where error correction is realized with parallel correction circuits. This paper is further organized as follows: Section II presents a basic Mitchell’s algorithm and its modifications, with their benefits and weaknesses, Section III describes the proposed iterative solution, in Section IV hardware implementations of the proposed algorithm are discussed, Section V gives an experimental results overview, and Section VI is a conclusion. II. MA BASED M ULTIPLIERS Logarithmic number system is introduced to simplify multiplication, especially in cases when accuracy requirements are not rigorous. One of the most significant multiplication methods in LNS is Mitchell’s algorithm, witch approximates logarithm with a piecewise straight line function. MA multiplies two operands by finding their logarithms, adding them and looking for antilogarithm of the sum. Approximation of logarithm and antilogarithm is essential, and it is derived from binary representation of numbers: N = 2k (1 + k−1 X 2i−k Zi ) = 2k (1 + x) (1) i=j where k is a characteristic number or place of the most significant bit with the value of ’1’, Zi is bit value at i-th position, x is fraction or mantissa and j depends on number’s precision. By logarithm of the product computation, log2 (N1 · N2 ) = k1 + k2 + log2 (1 + x1 ) + log2 (1 + x2 ) (2) log2 (1+x1 ) is approximated with x1 and logarithm of the two numbers’ product is expressed as a sum of their characteristic numbers and mantissas: log2 (N1 · N2 ) ≈ k1 + k2 + x1 + x2 (3) Antilogarithm uses the similar approximation. The final MA approximation for multiplication, depending on carry bit from sum of mantissas is given by: ( k1 +k2 (1 + x1 + x2 ), x1 + x2 < 1 2 (N1 · N2 )MA = (4) 2k1 +k2 +1 (x1 + x2 ), x1 + x2 ≥ 1 The sum of characteristic numbers determines MSB of the product. The sum of mantissas is added to complete the final 303 result. MA produce a significant error percentage. Relative error increases with the number of bits with the value of ’1’ in the mantissas. The maximum possible relative error for MA multiplication is around 11% and the average error is around 3.8%. Mitchell analyzed this error and proposed analytical expressions for error correction. Summarizing, the most significant advantage of MA is simplicity, efficiency, i.e. non power-consuming. The most significant disadvantage is high error percentage. Algorithm 1 Mitchell’s Algorithm 1) N1 , N2 : n-bits binary multiplicands, Papprox = 0 : 2nbits approximate product 2) Calculate k1 : leading one position of N1 3) Calculate k2 : leading one position of N2 4) Calculate x1 : shift N1 to the left by n − k1 digits 5) Calculate x2 : shift N2 to the left by n − k2 digits 6) Calculate k12 = k1 + k2 7) Calculate x12 = x1 + x2 8) Decode k12 and insert ’1’ in that position of Papprox 9) Append x12 immediately after this one in Papprox 10) N1 · N2 = Papprox A step-by-step example illustrating the Mitchell’s algoritm based multiplication is shown in Example 1. Example 1: N1 N2 = = 234 = 11101010 134 = 10000110 Ptrue k1 = = 31356 0111, x1 = 1101010 k2 k1 + k2 = = 0111, x2 = 0000110 1110 x1 + x2 (logN1 N2 )approx = = 1110000 1110.1110000 Papprox = 111100000000000 = 30720 Er = 2.03% Numerous attempts have been made for improving the MA accuracy. Hall [4] derived different equations for error correction in logarithm and antilogarithm approximation in four separate regions, depending on mantissa value, reducing average error to 2%, but increasing complexity of realization. Abed and Siferd [5], [6] derived correction equations with coefficients that are power of two, reducing error and keeping the simplicity of solution. Among many methods that use look-up tables for error correction in MA algorithm, McLaren’s method [2], which uses lookup table with 64 correction coefficients calculated in dependence of mantissas values, can be selected as one with satisfactory accuracy and complexity. The recent approach for MA error correction reducing the number of bits with the value of ’1’ in mantissas by operand decomposition was presented by Mahalingam and Rangantathan [3]. The 978-1-4244-3555-5/08/$25.00 ©2008 IEEE proposed method decreases the error percentage of the MA by 44.7% on the average. III. P ROPOSED S OLUTION It is already mentioned that basic disadvantage in Mitchell’s algorithm and similar solutions comes from logarithm approximation. Therefore, proposed solution avoids logarithm approximation and introduces an iterative algorithm with various possibilities for achieving error as small as wanted and possibility for getting the exact result. The proposed solution is an iterative but not a recursive algorithm, so it can be realized with parallel circuits for error reduction. Looking at the binary representation of numbers in (1), we can derive a correct expression for multiplication: = N1 · N2 = 2k1 (1 + x1 ) · 2k2 (1 + x2 ) = 2k1 +k2 (1 + x1 + x2 ) + 2k1 +k2 (x1 x2 ) Ptrue (5) The similarity with MA is evident. The error in MA is caused by neglecting the second term in (5). To avoid the approximation error, we have to take into account next relation: x · 2k = N − 2k (6) The combination of (5) and (6) gives: Ptrue = (N1 · N2 ) = 2(k1 +k2 ) + +(N1 − 2k1 )2k2 + (N2 − 2k2 )2k1 + +(N1 − 2k1 ) · (N2 − 2k2 ) (7) P0 = 2(k1 +k2 ) + (N1 − 2k1 )2k2 + (N2 − 2k2 )2k1 (8) Let be the first approximation of the product. It is evident that Ptrue = P0 + (N1 − 2k1 ) · (N2 − 2k2 ) (9) If we approximate the product with (0) Papprox = P0 (10) then the proposed method is very similar to MA. Actually, P0 is equal to the first case in MA approximation (4). Mitchell proposed an exact correction term as in (9), but logarithm approximation based multiplying of given residues where not sufficient to achieve significant error decreasing. Avoiding logarithm approximation enables parallel error corrections and therefore more accurate product. For this reason proposed an iterative calculation of correction terms as follows. An absolute error after the first approximation is E (0) = (0) P − Papprox = P − P0 = = (N1 − 2k1 ) · (N2 − 2k2 ) (11) Note that E (0) ≥ 0. The two multiplicands in (11) are binary numbers that can be obtained simply by removing the leading 304 Algorithm 2 Iterative MA Based Algorithm with i correction terms 1) N1 , N2 : n-bits binary multiplicands, P0 = 0 : 2nbits first approximation, C (i) = 0 : 2n-bits i correction terms, Papprox = 0 : 2n-bits product 2) Calculate k1 : leading one position of N1 3) Calculate k2 : leading one position of N2 4) Calculate (N1 − 2k1 )2k2 : shift (N1 − 2k1 ) to the left by k2 digits 5) Calculate (N2 − 2k2 )2k1 : shift (N2 − 2k2 ) to the left by k1 digits 6) Calculate k12 = k1 + k2 7) Calculate 2(k1 +k2 ) : decode k12 8) Calculate P0 : add 2(k1 +k2 ) , (N1 − 2k1 )2k2 and (N2 − 2k2 )2k1 9) Repeat i-times: a) Set: N1 = N1 − 2k1 , N1 = N2 − 2k2 b) Calculate k1 : leading one position of N1 c) Calculate k2 : leading one position of N2 d) Calculate (N1 − 2k1 )2k2 : shift (N1 − 2k1 ) to the left by k2 digits e) Calculate (N2 − 2k2 )2k1 : shift (N2 − 2k2 ) to the left by k1 digits f) Calculate k12 = k1 + k2 g) Calculate 2(k1 +k2 ) : decode k12 h) Calculate C i : add 2(k1 +k2 ) , (N1 − 2k1 )2k2 and (N2 − 2k2 )2k1 P (i) 10) Papprox = P0 + i C (i) Then the final result is exact: Papprox = Ptrue . The number of iterations required for exact result is equal to the number of bits with the value of ’1’ in operand with smaller number of bits with the value of ’1’. The proposed iterative MA based multiplication is given in Algorithm 2. One of the advantages of the proposed solution is a possibility to achieve an arbitrary accuracy by selecting a number of iterations, i.e. number of parallel correction circuits. Two stepby-step examples illustrating the iterative MA based algoritm multiplication with two correction terms are shown in Example 2 and Example 2, respectively. In the Example 2 a correct result is achieved with two correction terms, while in Example 3 the relative error is decreased bellow 1%. Example 2: N1 N2 (0) Ptrue = P0 + C (1) +E (1) (13) P0 Er(0) (1) N1 (1) N2 (1) k1 (1) k2 (1) (1) k1 + k2 (1) (1) k1 k2 (N 1(1) − 2 (N 2 (1) = = P0 + C (j) (15) The procedure can be repeated achieving an error as small as necessary, or until at least one of mantissas becomes a zero. )2 (1) (1) k1 (1) (k1 +k2 ) (14) j=1 978-1-4244-3555-5/08/$25.00 ©2008 IEEE )2 (1) k2 (1) P0 + C (1) + C (2) + . . . + C (i) = i X −2 2 If we repeat this multiplication procedure with i correction terms we can approximate the product as (i) Papprox = 0111, N 2 − 2k2 = 00000110 = 00001100000000 = 100000000000000 We can now add the approximate value of E (0) to the approximate product Papprox as an correction term by which we decrease the error of approximation (1) Papprox = P0 + C (1) k2 (N 2 − 2k2 )2k1 2(k1 +k2 ) (1) where C is the approximate value of E and E is an absolute error when approximating E (0) . The combination of (9) and (12) gives = 31356 = 0111, N 1 − 2k1 = 01101010 = 1110 = 11010100000000 (12) (1) Ptrue k1 k1 + k2 (N 1 − 2k1 )2k2 ’1’ in the numbers N1 and N2 so we can repeat the proposed multiplication procedure with new multiplicands E (0) = C (1) + E (1) (N 1 = 234 = 11101010 = 134 = 10000110 (1) (N 2(1) − 2 = 00000110 (1) = 0110, N 1(1) − 2k1 = 00101010 (1) = 0010, N 2(1) − 2k2 = 00000010 = 1000 = 10101000 = 10000000 = 100000000 = 1000101000 = 552 = 111101000101000 = 31272 Er(1) = 0.27% )2 (1) k2 = 01101010 C (1) P0 (2) N1 (2) N2 (2) k1 (2) k2 (2) (2) k1 + k2 (1) (1) k1 k2 −2 = 111100000000000 = 30720 = 2.03% )2 (1) k1 = 00101010 = 00000010 (1) = 0101, N 1(1) − 2k1 = 00001010 (1) = 0001, N 2(1) − 2k2 = 00000000 = 0110 = 10100 = 00000 305 (1) 2(k1 (1) +k2 ) C (2) (2) = 1000000 = 1010100 = 84 IV. H ARDWARE I MPLEMENTATION P0 = 111101001111100 = 31356 Er(2) = 0.0% = P0 + C (1) + C (2) = 31356 Papprox Example 3: In order to evaluate the device utilization and the performance of the proposed multiplier, we implemented different multipliers on Xilinx xc3s500e-4fg320 FPGA [7]. We have implemented 5 16-bits multipliers: a sequential implementation of the proposed multiplier, a multiplier with no correction terms, and three multipliers with two, three and four correction terms, respectively. N1 = 234 = 11101010 A. Basic Block N2 Ptrue = = 255 = 11111111 59670 k1 k2 = = 0111, N 1 − 2k1 = 01101010 0111, N 2 − 2k2 = 01111111 k1 + k2 (N 1 − 2k1 )2k2 = = 1110 11010100000000 (N 2 − 2k2 )2k1 2(k1 +k2 ) = = 11111110000000 100000000000000 P0 = = 1011010010000000 = 46208 22.56% = 01101010 A basic block is the proposed multiplier with no correction terms. The task of the basic block is to calculate one approximate product according to Equation 8. The 16-bits basic block is presented in Figure 1. The 16-bits basic block consists of two leading one detector units (LOD), two encoders, two 32bit barrel shifters, a decoder unit and two 32-bit adders. Two input operands are given to LODs and the encoders. The LOD units are used to remove leading one from the operands, which are then passed to the barrel shifters. The barrel shifters are used to shift residues according to Equation 8. The decode unit decodes k1 + k2 , i.e. puts the leading one in the product. The leading one and two shifted residues are then added to form the approximate product. The basic block is used in further implemenations to calculate P0 and C (i) . = 01111111 = 0110, N 1(1) − 2k1 = 00101010 (N 1 (1) (N 2 (1) Er(0) (1) N1 (1) N2 (1) k1 (1) k2 (1) (1) k1 + k2 (1) (1) k1 k2 −2 −2 )2 (1) k2 (1) 2(k1 )2 (1) k1 (1) +k2 ) C (1) (1) P0 Er(1) (2) N1 (2) N2 (2) k1 (2) k2 (2) (2) k1 + k2 (1) (1) k1 k2 (N 1(1) − 2 (N 2(1) − 2 2 (1) (1) −2 (1) k2 = 00111111 = 0110, N 2 = 1100 = 101010000000 = 111111000000 = = 1000000000000 10101001000000 = 10816 = 1101111011000000 = 57024 = 4.43% 00101010 = 00111111 = 0101, N 1(1) − 2k1 = 00001010 = 0101, N 2(1) − 2k2 = 00011111 = 1010 = 0101000000 = 1111100000 = = 10000000000 101001100000 = 2656 P0 (2) = 1110100100100000 = 59680 Er(2) = = 0.52% P0 + C (1) + C (2) = 59680 )2 (1) k1 (1) (k1 +k2 ) C (2) Papprox (1) (1) 978-1-4244-3555-5/08/$25.00 ©2008 IEEE A sequential implementation of the proposed iterative multiplier consists of a data-path unit and a control unit. Core od the data path unit is one basic block form Figure 1. The control unit executes Algorithm 2. C. Parallel Implementation = )2 (1) k2 B. Sequential Implementation (1) To decrease the time overhead required by sequential implementation, we implemented multipliers with parallel correction circuits. To implement the proposed multipliers, we used the cascade of basic blocks. Block diagram of the proposed logarithmic multiplier with one parallel error correction circuit is shown in Figure 2. The multiplier is composed of two basic blocks of which the first one calculates the first approximation of the product (P0 ) while the second one calculates the error correction term C (1) . D. Device Utilization For design entry we used Xilinx ISE 10.1.02 - WebPACK and design with VHDL. The design was synthetised with Xilinx Xst Release 10.1.02 for Linux. Device utilization (number of slices, number of 4-input LUTs and number of input-output blocks) for all five implemented multipliers are given in Table 1. Maximum combinational path delays along with total power consumptions for the basic block and all three parallel implementations are given in Table 2. The estimated maximum frequency for the sequential implementation was 69.851 MHz. 306 Fig. 1. Error Block diagram of a basic block of the proposed iterative multiplier. 1CT Table 3. Error percentage rate [%]. 8 bits 12 bits 2CT 3CT 1CT 2CT 3CT 16 bits 2CT 3CT < 0,1 % 32.9 79.9 99.0 20.6 71.6 98,2 19.3 70.6 98.0 < 0,5 % 54.8 96.9 100 48.1 95.7 100 47.4 95.5 100 <1% 69.9 99.6 100 65.6 99.4 100 65.2 99.4 100 Table 1. Device utilization. Multiplier 4-input LUTs Slices V. E RROR A NALYSIS IOBs Basic Block 353 182 96 Basic Block + 1CT 736 381 96 Basic Block + 2CT 1088 577 96 Basic Block + 3CT 1438 751 96 Sequential 523 265 96 Multiplier 1CT Table 2. Synthesis results. Max. combinational Total power (W) delay (ns) Basic Block 24.818 Basic Block + 1CT 32.214 Basic Block + 2CT 37.261 Basic Block + 3CT 41.555 978-1-4244-3555-5/08/$25.00 ©2008 IEEE In order to evaluate, the proposed algorithm is applied to all combinations of n-bit non-negative numbers. Error percentage is calculated from well-known equation: Ei = Ptrue − Papprox · 100% Ptrue (16) For evaluation, average error percentage value is used: AE = N 1 X Ei N i=1 (17) where N is the number of multiplications performed. For example, for 12-bits numbers, all combinations of numbers from 1 to 4095 are multiplied and the average error percentage is calculated. Error calculation is done in four cases: without error correction parallel circuit, with one parallel correction, 307 Fig. 2. Block diagram of the proposed multiplier with one parallel error correction circuit. The multiplier is composed of two basic Mitchell’s logarithmic multipliers of which the first one calculates an approxmate product while the second one calculates an error correction term. Table 4. Average relative errors [%] for 0, 1, 2 and 3 correction terms. No. bits Basic MA 1 C. term 2 C. terms 3 C. terms 8 8.9131 0.8337 0.0708 0.0048 9 9.1336 0.8980 0.0845 0.0069 10 9.2595 0.9369 0.0936 0.0086 11 9.3301 0.9597 0.0994 0.0098 12 9.3692 0.9726 0.1029 0.0106 13 9.3906 0.9799 0.1049 0.0111 14 9.4023 0.9840 0.1060 0.0114 15 9.4086 0.9862 0.1067 0.0116 16 9.4124 0.9874 0.1070 0.0117 with two parallel corrections and with three parallel corrections. Results from Table 3 present error percentage rate for various cases. Results from Table 4 give more precise view, how algorithm can be modified to wanted error percentage. These results are compared with results from [3], since it is the latest paper with complete overview of various solutions. Authors presented combinations of their own proposal with other solutions and gave a similar error table. Comparing 8bits and 16-bits average error percentages, we can notice that our solution with three iterations outperforms the best propose combination of operand decomposition and Mitchell’s error correction term. The iterative approach improves the average error percentage and the error rate compared to the basic MA multiplication. With only three correction terms all multiplication results have less then 0.5% of error. The parallel implementation of the iterative MA multiplier with only one correction circuit almost doubles the area required when compared to the original MA multiplier, but the power consumptions increases only from 2% (one correction term) to 16% (three correction terms). This is still significantly less than the area and power required for a standard multiplier. The maximum combinational delay increases by 30-45% with the each added correction circuit, but this can be significantly improved by pipelining the three main stages in the basic MA based multiplier and pipelining the correction circuits. To evaluate properties of various number of parallel correction circuits the Gaussian filter is implemented for removing 2 % salt & pepper noise. Filter is implemented with proposed multiplier, with various number of correction term circuits. Error evaluation is done with average error percentage (AEP) and mean square error (MSE). Since we have integer multiplier, it can be used with integer value kernels and signals. Average AEP and MSE in this example for 1CT is 0.4 and 0.05 %, respectively and zero for 2CT-s, so this multiplication method could be useful for implementation of integer transforms. R EFERENCES [1] J.N. Mitchell, Computer multiplication and division using binary logarithms, IRE Transactions on Electronic Computers Computers, vol. EC11, pp. 512-517, August 1962. [2] D.J. Mclaren, Improved Mitchell-based logarithmic multiplier for lowpower DSP applications, Proceedings of IEEE International SOC Conference 2003 pp. 53-56, 17-20 September 2003. [3] V. Mahalingam, N. Rangantathan, Improving Accuracy in Mitchells Logarithmic Multiplication Using Operand Decomposition, IEEE Transactions on Computers, Vol. 55, No. 2, pp. 1523-1535, December 2006. [4] E.L. Hall, D.D. Lynch, S. J. Dwyer III, Generation of Products and Quotients Using Approximate Binary Logarithms for Digital Filtering Applications, IEEE Transactions on Computers, Vol. C-19, No. 2, pp. 97-105. February 1970. [5] K.H. Abed, R.E. Sifred, CMOS VLSI Implementation of a Low-Power Logarithmic Converter, IEEE Transactions on Computers, Vol. 52, No. 11, pp. 1421-1433, November 2003. [6] K.H. Abed, R.E. Sifred, VLSI Implementation of a Low-Power Antilogarithmic Converter, IEEE Transactions on Computers, Vol. 52, No. 9, pp. 1221-1228, September 2003. [7] Xilinx Inc. Spartan-3E FPGA Family: Complete Data Sheet, DS312. http://www.xilinx.com/support/documentation/data sheets/ds312.pdf, April 18, 2008. VI. C ONCLUSIONS In this paper, we have investgate and proposed a new approach to improve the accuracy in Mitchell’s algorithm based multiplication. The proposed method is based on iteratively calculating the correction terms. We have shown that the calculation of correction terms can be performed parallel in hardware. 978-1-4244-3555-5/08/$25.00 ©2008 IEEE 308