Floating Point Ds335
Floating Point Ds335
Floating Point Ds335
Product Specification
Introduction
The Xilinx Floating-Point core provides designers with the means to perform floating-point arithmetic on an FPGA. The core can be customized to allow optimization for operation, wordlength, latency, and interface.
Acknowledgement
Compliance with IEEE-754 Standard (with only minor documented deviations) Support for DSP48 on Virtex-4 FPGAs and DSP48E on Virtex-5 FPGAs Parameterized fraction and exponent wordlengths Optimizations for speed and latency
Fully synchronous design using a single clock For use with the CORE Generator which is available in the Xilinx ISE v8.2i.
Features
Available for Virtex-II, Virtex-II Pro, Virtex-4, Virtex-5, Spartan-II, Spartan-3, and Spartan-3E FPGA family members Supported operators: - multiply - add/subtract - divide - square-root - compare - conversion from floating-point to fixed-point - conversion from fixed-point to floating-point - conversion between floating-point types
Figure Top x-ref 1
Overview
The Xilinx Floating-Point core allows a range of floating-point arithmetic operations to be performed on FPGAs. The operation is specified when the core is generated, and each variant has a common interface. This interface is shown in Figure 1. When a user selects an operation that requires only one operand, the B input is omitted.
Floating-Point Operator
Result = A op B
RESULT
2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. The QinetiQ logo is a trademark of QinetiQ Ltd. All other trademarks are the property of their respective owners. Xilinx is providing this design, code, or information "as is." By providing the design, code, or information as one possible implementation of this feature, application, or standard, Xilinx makes no representation that this implementation is free from any claims of infringement. You are responsible for obtaining any rights you may require for your implementation. Xilinx expressly disclaims any warranty whatsoever with respect to the adequacy of the implementation, including but not limited to any warranties or representations that this implementation is free from claims of infringement and any implied warranties of merchantability or fitness for a particular purpose.
www.xilinx.com
Functional Description
The floating-point and fixed-point representations employed by the core are described in "Floating-Point Number Representation" on page 2 and "Fixed-Point Number Representation" on page 4.
s E
f1
The binary bits, b i , have weighting 2 , where the most significant bit b 0 is a constant 1. As such, the combination is bounded such that 0 b 0 .b 1 b 2 b p 1< 2 and the number is said to be normalized. To provide increased dynamic range, this quantity is scaled by a positive or negative power of 2 (denoted here as E). The sign bit provides a value that is negative when s = 1 , and positive when s = 0 . The binary representation of a floating-point number contains three fields as shown in Figure 2.
Figure Top x-ref 2
we-1
wf -1
e
wf -1 wf -2 w
f
wf -1 0
w -1
As b 0 is a constant, only the fractional part is retained, that is, f = b 1 b w 1 . This requires only f w f 1 bits. Of the remaining bits, one bit is used to represent the sign, and w e = w w f bits represent the exponent. The exponent field, e , employs a biased unsigned integer representation, whose value is given by:
we 1
e =
e2
i i=0
The index, i, of each bit within the exponent field is given in Figure 2. The value of the exponent, E , is obtained by removing the bias, that is, E = e ( 2
we 1
1) .
www.xilinx.com
In reality, w f is not the wordlength of the fraction, but the fraction with the hidden bit, b 0 , included. This terminology has been adopted to provide commonality with that used to describe fixed-point parameters (as employed by Xilinx System GeneratorTM for DSP). Special Values A number of values for s , e and f have been reserved for representing special numbers, such as Not a Number (NaN), Infinity ( ), zero (0), and de-normalized numbers. These special values are summarized in Table 1.
Table 1: Special Values
s field
we 1
e field
f field
most significant bit of fraction set (that is, f = 10...00 ) zero (that is, f = 00...00 )j zero (that is, f = 00...00 ) any non-zero field
2 2
0 0
we 1
0
denormalized
Note that in Table 1 the sign bit is undefined when a result is a NaN. Also, infinity and zero are signed. Where possible, the sign is handled in the same way as finite non-zero numbers. For example, 0 + ( 0 ) = 0 , 0 + 0 = 0 and + ( ) = . Whereas, a meaningless operation such as + will raise an invalid operation exception and produce a NaN as a result.
IEEE-754 Support
The Xilinx Floating-Point core complies with much of the IEEE-754 Standard. The deviations generally provide better trade-off of resources against functionality. Specifically, the core deviates in the following ways: Non-Standard Wordlengths Denormalized Numbers Rounding Modes Signalling and Quiet NaNs Non-Standard Wordlengths The Xilinx Floating-Point core supports a greater range of fraction and exponent wordlength than defined in the IEEE-754 Standard. Standard formats commonly implemented by programmable processors: Single Format - uses 32 bits, with a 24-bit fraction and 8-bit exponent. Double Format - uses 64 bits, with 53-bit fraction and 11-bit exponent. Less commonly implemented standard formats are: Single Extended - wordlength extensions of 43 bits and above Double Extended - wordlength extensions of 79 bits and above
www.xilinx.com
The Xilinx core supports formats with fraction and exponent wordlengths outside of these standard wordlengths. Denormalized Numbers Denormalized numbers are not supported by the Xilinx Floating-Point core. To provide robustness, the core treats denormalized numbers as zero (that takes on the sign of the denormalized number). Denormalized numbers are those where b 0 is 0. As such, b 0 .b 1 b 2 b p 1 < 1 , which for a given exponent wordlength allows numbers to be represented that are smaller than otherwise possible. But note that as the value becomes smaller, it is represented with fewer bits and the relative rounding error introduced by each operation increases. An alternative way of increasing dynamic range, which uses less resources, is to increase the wordlength of the exponent. The wordlength of the format can be maintained by increasing the exponent wordlength at the expense of the fraction. Note: The support for denormalized numbers cannot be switched off on some processors. Therefore, there may be very small differences between values generated by the Floating-Point core and a program running on a conventional processor when numbers are very small. If such differences must be avoided, the arithmetic model on the conventional processor should include a simple check for denormalized numbers. This check should set the output of an operation to zero when denormalized numbers are detected to correctly reflect what happens in the FPGA implementation. Rounding Modes Currently, only the default rounding mode, Round to Nearest, as defined by the IEEE-754 Standard, is supported. Signalling and Quiet NaNs The IEEE-754 Specification requires provision of Signalling and Quiet NaNs. However, the Xilinx Floating-Point core treats all NaNs as Quiet NaNs. When any NaN is supplied as one of the operands to the core, the result will be a Quiet NaN, and an invalid operation exception will not be raised (as would be the case for signalling NaNs). The exception to this rule is floating-point to fixed-point conversion. For detailed information, see the behavior of INVALID_OP.
int
wf wf -1 w
frac
wf -1 0
www.xilinx.com
In Figure 3, the bit position has been labelled with an index i. Based upon this, the value of a fixed-point number is given by:
s w 1 wf
v = ( 1 ) 2
+ b w 2 b w .b w
f
f1
b 1 b 0
w2
= ( 1 )
bw 1 w 1 wf
2
0
i wf
bi
For example, a 32-bit signed integer representation is obtained when a width of 32 and a fraction width of 0 are specified. Round to Nearest is employed within the conversion operations. Note: To provide for the sign bit, the width of the integer field must be at least 1, requiring that the fractional width be no larger that width-1.
Port Description
The ports employed by the core are shown in Figure 1. They are described in more detail in Table 2. All control signals are active high.
Table 2: Core Ports
Name
A1 B1
Width
w w
Direction
INPUT INPUT Operand A
Description
Operand B: Only present on binary operation. Operation: Specifies the operation to be performed. Implemented when the core is configured for both add and subtract operations, or as a programmable comparator.
OPERATION1
INPUT
1 1 1 1 w 1
OVERFLOW
OUTPUT
www.xilinx.com
Name
INVALID_OPERATION
Width
1
Direction
OUTPUT
Description
Invalid Operation: Set high by core when operands cause an invalid operation. Supplied in synchronism with associated RESULT. Divide By Zero: Set high by a divide operation to indicate that a division by zero was performed. Supplied in synchronism with associated RESULT. Output Ready: Set high by core when RESULT is valid.
DIVIDE_BY_ZERO
OUTPUT
RDY
OUTPUT
1. A, B and OPERATION are not registered on the input to the core. Should this be required, registers can be added to these inputs externally to the core.
A Operand A input. B Operand B input. CLK All signals are synchronous to the CLK input. CE When CE is deasserted, the clock is disabled, and the state of the core and its outputs are maintained. SCLR When SCLR is asserted, the core control is synchronously set to its initial state. Any incomplete results are discarded, and RDY will not be generated for them. While SCLR is asserted both OPERATION_RFD and RDY are deasserted. The core is ready for new input one cycle after SCLR is deasserted, at which point OPERATION_RFD is asserted. OPERATION OPERATION is present when add and subtract operations are selected together, or when a programmable comparator is selected. The operations are binary encoded as specified in Table 3.
Table 3: Encoding of OPERATION
FP operation
Add Subtract Unordered Less Than Compare (Programmable) Equal Less Than or Equal Greater Than Not Equal Greater Than or Equal
OPERATION (5 downto 0)
000000 000001 000100 001100 010100 011100 100100 101100 110100
www.xilinx.com
OPERATION_ND OPERATION_ND should be asserted when operands are valid on inputs A and B and the FP operation is valid on OPERATION (should it be required). Deasserting OPERATION_ND will prevent the initiation of new operations and the subsequent assertion of RDY. Note: OPERATION_ND is required to synchronize operations when the core is configured to perform a multi-cycle divide or square root. OPERATION_RFD OPERATION_RFD is asserted by the core to indicate that it is ready to accept new operands on inputs A, B, and OPERATION. A new operation will be initiated by the core when both OPERATION_ND and OPERATION_RFD are asserted together. RESULT If the operation is compare, then the valid bits within the result depend upon the compare operation selected. If the operation is one of those listed in Table 3, then only the least significant bit of the result indicates whether the comparison is true or false. If the operation is condition code, then 4 bits provide the results of the comparison using the encoding summarized in Table 4. See IEEE-754 Standard for a more complete listing of the meanings of all the valid comparison results.
Compare Operation
3 Programmable 2 1 0 0 1 Condition Code Unordered 0 0 0 0 0 0 1 > 0 0 0 1 1 1 < 0 1 1 0 0 1 See Standard EQ 1 0 1 0 1 0
Result
A OP B = False A OP B = True Meaning A=B A<B A <= B A>B A >= B A <> B A, B or both are NaN.
The following signals provide exception information. Additional detail on their behavior can be found in the IEEE-754 Standard. UNDERFLOW Underflow is signalled when the operation generates a non-zero result which is too small to be represented with the chosen precision. The result is set to zero. Underflow is detected after rounding. Note: A number that becomes de-normalized before rounding will be set to zero and underflow signalled.
www.xilinx.com
OVERFLOW Overflow is signalled when the operation generates a result that is too large to be represented with the chosen precision. The output is set to a correctly signed . INVALID_OP Invalid operation is signalled when the operation performed is invalid. According to the IEEE-754 Standard, the following are invalid operations: 1. 2. 3. 4. 5. 6. Any operation on a signalling NaN. (Note that this is not relevant here). Addition or subtraction of infinite values where the sign of the result cannot be determined. For example, magnitude subtraction of infinities such as (+ ) +(- ). Multiplication where 0 . Division where 0 0 or . Square root if the operand is less than zero. When the input of a conversion cannot correctly signalled by the result (for example NaN or infinity).
When an invalid operation occurs, the associated result is a Quiet NaN. In the case of floating-point to fixed-point conversion, NaN and infinity raise an invalid operation exception. If the operand is out of range, or an infinity, then an overflow exception is raised. By analyzing the two exception signals it is possible to determine which of the three types of operand were converted. (See Table 5.)
Table 5: Invalid Operation Summary
Operand
+ Out of Range - Out of Range + Infinity - Infinity NaN
Invalid Operation
0 0 1 1 1
Overflow
1 1 1 1 0
Result
011...11 100...00 011...11 100...00 100...00
When the operand is a NaN the result is set to the most negative representable number. When the operand is infinity or an out-of-range floating-point number, the result is saturated to the most positive or most negative number, depending upon the sign of the operand. Note: Floating-point to fixed-point conversion does not treat a NaN as a Quiet NaN, because NaN is not representable within the resulting fixed-point format, and so can only be indicated through an invalid operation exception. DIVIDE_BY_ZERO Division of a number by zero is signalled when a divide operation is performed where the divisor is zero and the dividend is a finite non-zero number. The result in this circumstance is a correctly signed . RDY RDY is asserted by the core to indicate that RESULT is valid. RDY can be used to qualify the result of a multi-cycle operation (i.e., divide or square root operations with rate greater than 1).
www.xilinx.com
Example Timing An example of signal timing is given in Figure 4 for square-root with latency 4 and rate 3. The result is provided four cycles after an active OPERATION_ND. In this example, new inputs are applied every three cycles, in accordance with the maximum rate and OPERATION_RFD output. (Data could be applied less frequently, in which case, OPERATION_RFD would stay High until OPERATION_ND was asserted with the new input.) The RDY output indicates when RESULT, and any exception flags, are valid. In this example, an overflow exception has been generated with result R2.
Figure Top x-ref 4
www.xilinx.com
Cancel: Cancels generation of the core and returns to the first screen of the GUI. View Data Sheet: Displays this document, a PDF file of the core product specification. Opening the Floating-Point Main Screen 1. Start the CORE Generator. 2. Select Floating-Point 3.0 from the Math Functions category at the left side of the application window. 3. Do one of the following to display the Floating-Point main configuration screen: - Double-click Floating-Point 3.0 in the Math Functions category. - Select Floating-Point in the Math Functions category; then click Customize at the right side of the CORE Generator window. Main Configuration Screen The main configuration screen allows the following parameters to be specified: Component Name Operation Type
Component Name
The component name is used as the base name of the output files generated for the core. Names must start with a letter and be composed using the following characters: a to z, 0 to 9, and _.
Operation Type
The floating-point operation may be one of the following: Add/Subtract Multiply Divide Square-root Compare Fixed-to-float Float-to-fixed Float-to-float When Add/Subtract is selected, it is possible for the core to perform both operations, or just add or subtract. When both are selected, the operation performed on a particular set of operands is controlled by the OPERATION input (with encoding defined earlier in Table 3). When Add/Subtract or Multiply is selected, the level of embedded multiplier usage can be specified as described in the Penultimate Configuration Screen section. When Compare is selected, the compare operation may be programmable or fixed. If programmable, then the compare operation performed should be supplied via the OPERATION input (with encoding defined earlier in Table 3). If a fixed operation is required, then the operation type should be selected. When Float-to-float conversion is selected, and exponent and fraction widths of the input and result are the same, the core provides a means to condition numbers, i.e., convert denormalized numbers to zero, and signal NaNs to quiet NaNs.
10
www.xilinx.com
Second and Third Configuration Screens Depending on the configuration you select from the first screen, the second and third configuration screens let you specify the precision of the operand and result.
Precision of the Operand and Results
This parameter defines the number of bits used to represent quantities. The type of the operands and results depend on the operation requested. For fixed-point conversion operations, either the operand or result is fixed-point. For all other operations, the output is specified as a floating-point type. Note: For compare, depending upon operation selected, RESULT(3 down to 0), is used to indicate the result of the comparison operation. Table 6 defines the general limits of the format widths.
Table 6: General Limits of Width and Fraction Width
Width Min
4 4
Max
64 63
Max
16 64
Max
64 64
There are also a number of further limits for specific cases: The exponent width (i.e., width - fraction width) should be chosen to support normalization of the fractional part. This can be calculated using: exponent width = ceil [ log2 ( fraction width+3 ) ] + 1 For example, a 24-bit fractional part requires an exponent of at least 6 bits (for example, {ceil [log2 (27)]+1}). The GUI enforces these limits. For the logic assisted multiplier (that is, when multiplier usage is medium), only double precision format is supported. For conversion operations, the exponent width of the floating-point input or output can be calculated using: exponent width = ceil [ log 2 ( width + 3 ) ] + 1 For example, a 32-bit integer will require a minimum exponent of 7 bits. A summary of the width limits imposed by exponent width is provided in Table 7.
Table 7: Summary of Exponent Width Limits
Exponent Width
4 5 6 7 8
www.xilinx.com
11
Penultimate Configuration Screen The final configuration screen lets you specify the following: Architecture Optimizations Family Optimizations Cycles Per Operation (Rate)
Architecture Optimizations
For addition/subtraction on Virtex-5 FPGAs, it is possible to specify a latency optimized architecture, or speed optimized architecture. The latency optimized architecture offers reduced latency at the expense of increased resources.
Family Optimizations
Multiplier Usage: Allows the type and level of embedded multiplier usage to be specified.
Multiplier Usage
The level of embedded multiplier usage can be specified. The level and type of multiplier usage also depend upon the operation and FPGA family. Table 8 summarizes these options for multiplication.
Table 8: Impact of Family and Multiplier Usage on the Implementation of the Multiplier
Multiplier Usage
No usage Medium usage Full usage Max usage
1.
Virtex-4
Logic DSP48+logic1 in multiplier body DSP48 used in multiplier body DSP48 multiplier body and rounder
Virtex-5
Logic DSP48E+logi1 in multiplier body DSP48E used in multiplier body DSP48E multiplier body and rounder
1. Logic-assisted multiplier variant is only available for single and double precision in Virtex-4 FPGAs and single precision in Virtex-5 FPGAs.
Virtex-4
Virtex-5
Other
Logic Not supported
Single
Logic 4 DSP48
Double
Logic 3 DSP48
Other
Logic Not supported
Single
Logic 2 DSP48E
Double
Logic 3 DSP48E
12
www.xilinx.com
Latency
This parameter describes the number of cycles between an operand input and result output. The latency of all operators (apart from the logic-assisted, double-precision multipliers on Virtex-4 devices) can be set between 0 and a maximum value that is dependent upon the parameters chosen. The maximum latency of the Floating-Point core is tabulated for a range of width and operation types in Table 10, Table 11, Table 12, Table 13, Table 14, Table 15, Table 16, Table 17, Table 18, and Table 19. The maximum latency of the divide and square root operations is fraction width + 4, and for compare operation it is three cycles. The float-to-float conversion operation is three cycles when either mantissa or exponent width is being reduced, otherwise it is two cycles. Note that it is two cycles, even when the input and result widths are the same, as the core provides conditioning in this situation (see Operation Type for further details). Note: The maximum latency of certain operations has been increased over Floating Point Operator v2.0 to increase maximum clock frequency. If the previous maximum latency value is specified, then an equivalent implementation will be obtained.
Table 10: Latency of Floating-Point Multiplication using Logic Only
Fraction Width
4 to 5 6 to 11 12 to 23 24 to 47 (single) 48 to 64 (double)
Fraction Width
4 to 17 18 to 34 (single) 35 to 51 52 to 64 (double)
Full Usage
6
Max Usage
8 11 16 23
91
10 15
172
22
www.xilinx.com
13
Full Usage
6
Max Usage
8 9 11 13 16 19 23
81
8 10 12 15 18 22
Table 14: Latency of Floating-Point Addition using Medium Usage and DSP48/DSP48E
DSP48E
16 15
Table 15: Latency of Floating-Point Addition using Logic on Families Other than Virtex-5 FPGAs
Maximum Latency (clock cycles) Fraction Width Virtex-II, Virtex-II Pro, Spartan-3E, Virtex-4
4, 5 6 to 14 15 16, 17 18 to 29 30 to 62 63, 64 9 10 11 12 13 14 15
Table 16: Latency of Floating-Point Addition using Logic and Low-Latency Optimization on Virtex-5 FPGAs
14
www.xilinx.com
Table 17: Latency of Floating-Point Addition using Logic and Speed Optimization on Virtex-5 FPGAs
Operand Width
4 to 8 9 to 32 31 to 64
This parameter describes the minimum number of cycles that must elapse between inputs. This rate can be specified. A value of 1 allows operands to be applied on every clock cycle, and results in a fully-parallel circuit. A value greater than 1 enables hardware reuse. The number of slices consumed by the core reduces as the number of cycles per operation is increased. A value of 2 approximately halves the number of slices used. A fully sequential implementation is obtained when the value is equal to fraction width+1 for the square-root operation, and fraction width+2 for the divide operation. Final Configuration Screen The final configuration screen lets you specify the Optional Control and Exception Pins.
Optional Control and Exception Pins
Pins for the following signals are optional: Control Signals: OPERATION_ND, OPERATION_RDY, RDY, CE and SCLR control signals are optional. Exception Signals: UNDERFLOW, OVERFLOW, INVALID_OPERATION and DIVIDE_BY_ZERO signals are optional. The DIVIDE_BY_ZERO signal is only available when the divide operation is selected.
www.xilinx.com
15
VHDL Interface
The Floating-Point core can be generated directly from a component instantiation in VHDL using XST. A component declaration has been provided in xilinxcorelib (as employed by XST). Also, a package of constant definitions has been provided to allow parameter values to be used that have meaningful names. Note: If the generics are out of range, then the core will fail to synthesize. If this happens, either simulate the VHDL behavioral model (see VHDL Simulation) to obtain the reason for failure, or use the GUI to identify a suitable set of generics. A list of valid generics and their limits and default values is listed in Table 20.
Table 20: Parameter File Information
VHDL Generic
C_FAMILY
Valid Values
virtex2, virtex2p, virtex4, virtex5, spartan3 Use one of these families for derivatives (such as spartan3e). FLT_PT_TRUE, FLT_PT_FALSE Can be used with C_HAS_SUBTRACT to obtain add and subtract capability. When both are selected the OPERATION port is used to specify required operation as defined in Table 3. FLT_PT_TRUE, FLT_PT_FALSE Can be used with C_HAS_ADD to obtain add and subtract capability. (See further comments under C_HAS_ADD). FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE Integer with range dependant upon other parameters as defined in Precision of the Operand and Results. Integer with range dependant upon other parameters as defined in Precision of the Operand and Results. Must be same as C_A_WIDTH. Must be same as C_A_FRACTION_WIDTH. Must be same as C_A_WIDTH for all operations, other than conversion. Must be same as C_A_FRACTION_WIDTH for all operations, other than conversion. virtex2
Default Value
C_HAS_ADD
FLT_PT_FALSE
C_HAS_SUBTRACT
FLT_PT_FALSE
C_A_FRACTION_WIDTH
24
32 24 32 24
16
www.xilinx.com
VHDL Generic
C_COMPARE_OPERATION
Valid Values
FLT_PT_PROGRAMMABLE, FLT_PT_LESS_THAN, FLT_PT_LESS_THAN, FLT_PT_EQUAL, FLT_PT_LESS_THAN_OR_EQUAL, FLT_PT_GREATER_THAN, FLT_PT_NOT_EQUAL, FLT_PT_GREATER_THAN_OR_EQUAL , FLT_PT_UNORDERED, FLT_PT_CONDITION_CODE. Specify when operation is compare. Integer with range dependant upon other parameters as defined in Final Configuration Screen. FLT_PT_SPEED_OPTIMIZED, FLT_PT_LOW_LATENCY. FLT_PT_NO_USAGE, FLT_PT_MEDIUM, FLT_PT_FULL_USAGE, FLT_PT_MAX_USAGE. Use as described in Multiplier Usage. FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE FLT_PT_TRUE, FLT_PT_FALSE
Default Value
FLT_PT_LESS_THAN
C_LATENCY
FLT_PT_MAX_LATENCY
C_OPTIMIZATION C_MULT_USAGE
FLT_PT_SPEED_OPTIMIZED FLT_PT_FULL_USAGE
Examples The following is an example of an instantiation in VHDL that generates a single-precision adder with a full set of exception and control signals.
library xilinxcorelib; -- XST version use xilinxcorelib.floating_point_v3_0_consts.all; -- constants package use xilinxcorelib.floating_point_v3_0_comp.all; -- component declaration ..... fp_add_single: floating_point_v3_0 generic map ( C_FAMILY => virtex4, C_HAS_ADD => FLT_PT_TRUE, C_A_WIDTH => 32, C_A_FRACTION_WIDTH => 24, C_B_WIDTH => 32, C_B_FRACTION_WIDTH => 24,
www.xilinx.com
17
C_RESULT_WIDTH C_RESULT_FRACTION_WIDTH C_HAS_SCLR C_HAS_OPERATION_ND C_HAS_OPERATION_RFD C_HAS_RDY C_HAS_UNDERFLOW C_HAS_OVERFLOW C_HAS_INVALID_OP ) port ( A B OPERATION_ND OPERATION_RFD CLK SCLR RESULT UNDERFLOW OVERFLOW INVALID_OP RDY );
=> => => => => => => => => => =>
VHDL Component Declaration This component declaration is provided within xilinxcorelibs in package floating_point_v3_0_comp. It is included below for reference purposes. Note that there are a number of generics over-and-above those defined in Table 20. The values of these additional generics should not be changed from the defaults, and can be left unspecified when the component is instantiated (as done in the example). There are also a number of ports in addition to those listed in Table 2. These should be left unconnected. Inputs will default to suitable values.
component floating_point_v3_0 is generic ( C_FAMILY : string := C_FAMILY_DEFAULT; C_HAS_ADD : integer := C_HAS_ADD_DEFAULT; C_HAS_MULTIPLY : integer := C_HAS_MULTIPLY_DEFAULT; C_HAS_DIVIDE : integer := C_HAS_DIVIDE_DEFAULT; C_HAS_SQRT : integer := C_HAS_SQRT_DEFAULT; C_HAS_COMPARE : integer := C_HAS_COMPARE_DEFAULT; C_HAS_FIX_TO_FLT : integer := C_HAS_FIX_TO_FLT_DEFAULT; C_HAS_FLT_TO_FIX : integer := C_HAS_FLT_TO_FIX_DEFAULT; C_HAS_FLT_TO_FLT : integer := C_HAS_FLT_TO_FLT_DEFAULT; C_A_WIDTH : integer := C_A_WIDTH_DEFAULT; C_A_FRACTION_WIDTH : integer := C_A_FRACTION_WIDTH_DEFAULT; C_B_WIDTH : integer := C_B_WIDTH_DEFAULT; C_B_FRACTION_WIDTH : integer := C_B_FRACTION_WIDTH_DEFAULT; C_RESULT_WIDTH : integer := C_RESULT_WIDTH_DEFAULT; C_RESULT_FRACTION_WIDTH: integer := C_RESULT_FRACTION_WIDTH_DEFAULT; C_COMPARE_OPERATION : integer := C_COMPARE_OPERATION_DEFAULT; C_LATENCY : integer := C_LATENCY_DEFAULT; C_OPTIMIZATION : integer := C_OPTIMIZATION_DEFAULT; C_MULT_USAGE : integer := C_MULT_USAGE_DEFAULT;
18
www.xilinx.com
C_RATE C_HAS_ACLR C_HAS_CE C_HAS_SCLR C_HAS_A_NEGATE C_HAS_B_NEGATE C_HAS_A_ND C_HAS_A_RFD C_HAS_B_ND C_HAS_B_RFD C_HAS_OPERATION_ND C_HAS_OPERATION_RFD C_HAS_RDY C_HAS_CTS C_HAS_UNDERFLOW C_HAS_OVERFLOW C_HAS_INVALID_OP C_HAS_INEXACT C_HAS_DIVIDE_BY_ZERO C_HAS_STATUS C_HAS_EXCEPTION C_STATUS_EARLY ); port ( A B A_NEGATE B_NEGATE OPERATION A_ND A_RFD B_ND B_RFD OPERATION_ND OPERATION_RFD CLK SCLR ACLR CE RESULT STATUS EXCEPTION UNDERFLOW OVERFLOW INVALID_OP INEXACT DIVIDE_BY_ZERO RDY CTS ); end component;
: : : : : : : : : : : : : : : : : : : : : :
integer integer integer integer integer integer integer integer integer integer integer integer integer integer integer integer integer integer integer integer integer integer
:= := := := := := := := := := := := := := := := := := := := := :=
C_RATE_DEFAULT; C_HAS_ACLR_DEFAULT; C_HAS_CE_DEFAULT; C_HAS_SCLR_DEFAULT; C_HAS_A_NEGATE_DEFAULT; C_HAS_B_NEGATE_DEFAULT; C_HAS_A_ND_DEFAULT; C_HAS_A_RFD_DEFAULT; C_HAS_B_ND_DEFAULT; C_HAS_B_RFD_DEFAULT; C_HAS_OPERATION_ND_DEFAULT; C_HAS_OPERATION_RFD_DEFAULT; C_HAS_RDY_DEFAULT; C_HAS_CTS_DEFAULT; C_HAS_UNDERFLOW_DEFAULT; C_HAS_OVERFLOW_DEFAULT; C_HAS_INVALID_OP_DEFAULT; C_HAS_INEXACT_DEFAULT; C_HAS_DIVIDE_BY_ZERO_DEFAULT; C_HAS_STATUS_DEFAULT C_HAS_EXCEPTION_DEFAULT C_STATUS_EARLY_DEFAULT
: : : : : : : : : : : : : : : : : : : : : : : : :
in std_logic_vector(C_A_WIDTH-1 downto 0); in std_logic_vector(C_B_WIDTH-1 downto 0):=(others=>'0'); in std_logic:='0'; in std_logic:='0'; in std_logic_vector(5 downto 0):=(others=>0); in std_logic:='1'; out std_logic; in std_logic:='1'; out std_logic; in std_logic:='1'; out std_logic; in std_logic; in std_logic:='0'; in std_logic:='0'; in std_logic:='0'; out std_logic_vector(C_RESULT_WIDTH-1 downto 0); out std_logic_vector(2 downto 0); out std_logic; out std_logic; out std_logic; out std_logic; out std_logic; out std_logic; out std_logic; in std_logic:='1'
www.xilinx.com
19
VHDL Constants Package Default generic values and constant terms for valid values of the generics are provided within xilinxcorelibs as the package floating_point_v3_0_consts. Some useful constants and the default generics are as follows:
constant FLT_PT_TRUE constant FLT_PT_FALSE constant FLT_PT_SPEED_OPTIMIZED constant constant constant constant constant FLT_PT_NO_USAGE FLT_PT_MEDIUM_USAGE FLT_PT_FULL_USAGE FLT_PT_MAX_USAGE FLT_PT_MAX_LATENCY : integer := 1; : integer := 0; : integer := 1; : : : : : integer integer integer integer integer := := := := := 0; 1; 2; 3; 1000;
-- Compare operation values constant FLT_PT_UNORDERED constant FLT_PT_LESS_THAN constant FLT_PT_EQUAL constant FLT_PT_LESS_THAN_OR_EQUAL constant FLT_PT_GREATER_THAN constant FLT_PT_NOT_EQUAL constant FLT_PT_GREATER_THAN_OR_EQUAL constant FLT_PT_CONDITION_CODE constant FLT_PT_PROGRAMMABLE constant FLT_PT_OPERATION_WIDTH
: : : : : : : : :
:= := := := := := := := :=
0; 1; 2; 3; 4; 5; 6; 7; 8;
: integer := 6;
-- defaults for generics constant C_FAMILY_DEFAULT : string := "virtex2"; constant C_HAS_ADD_DEFAULT : integer := FLT_PT_FALSE; constant C_HAS_MULTIPLY_DEFAULT : integer := FLT_PT_FALSE; constant C_HAS_DIVIDE_DEFAULT : integer := FLT_PT_FALSE; constant C_HAS_SQRT_DEFAULT : integer := FLT_PT_FALSE; constant C_HAS_COMPARE_DEFAULT : integer := FLT_PT_FALSE; constant C_HAS_FIX_TO_FLT_DEFAULT : integer := FLT_PT_FALSE; constant C_HAS_FLT_TO_FIX_DEFAULT : integer := FLT_PT_FALSE; constant C_A_WIDTH_DEFAULT : integer := 32; constant C_A_FRACTION_WIDTH_DEFAULT : integer := 24; constant C_B_WIDTH_DEFAULT : integer := 32; constant C_B_FRACTION_WIDTH_DEFAULT : integer := 24; constant C_RESULT_WIDTH_DEFAULT : integer := 32; constant C_RESULT_FRACTION_WIDTH_DEFAULT: integer := 24; constant C_COMPARE_OPERATION : integer := FLT_PT_LESS_THAN; constant C_LATENCY_DEFAULT : integer := FLT_PT_MAX_LATENCY; constant C_OPTIMIZATION_DEFAULT : integer := FLT_PT_SPEED_OPTIMIZED; constant C_MULT_USAGE_DEFAULT : integer := FLT_PT_FULL_USAGE; constant C_RATE_DEFAULT : integer := 1; constant C_HAS_ACLR_DEFAULT : integer := FLT_PT_FALSE; constant C_HAS_CE_DEFAULT : integer := FLT_PT_FALSE; constant C_HAS_SCLR_DEFAULT : integer := FLT_PT_FALSE; constant C_HAS_A_NEGATE_DEFAULT : integer := FLT_PT_FALSE;
20
www.xilinx.com
constant constant constant constant constant constant constant constant constant constant constant constant constant constant constant
C_HAS_B_NEGATE_DEFAULT C_HAS_A_ND_DEFAULT C_HAS_A_RFD_DEFAULT C_HAS_B_ND_DEFAULT C_HAS_B_RFD_DEFAULT C_HAS_OPERATION_ND_DEFAULT C_HAS_OPERATION_RFD_DEFAULT C_HAS_RDY_DEFAULT C_HAS_CTS_DEFAULT C_HAS_UNDERFLOW_DEFAULT C_HAS_OVERFLOW_DEFAULT C_HAS_INVALID_OP_DEFAULT C_HAS_INEXACT_DEFAULT C_HAS_DIVIDE_BY_ZERO_DEFAULT C_HAS_STATUS_DEFAULT
: : : : : : : : : : : : : : :
integer integer integer integer integer integer integer integer integer integer integer integer integer integer integer
:= := := := := := := := := := := := := := :=
FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE; FLT_PT_FALSE;
Simulation
VHDL Simulation
A cycle-accurate, bit-true VHDL simulation model exists for the Xilinx Floating-Point core within xilinxcorelib library. For multi-cycle divide or square-root, in which case RATE>1, the model RESULT and exception flags may differ from the core in between valid outputs. RDY indicates when the RESULT and exception flags are valid and can be used to qualify these outputs from the model. Also note that the sign of a NaN is undefined, and the model and core may differ in this respect. The xilinxcorelib library can be compiled using COMPXLIB for your particular simulator. See ISE documentation for further details. If the core has been directly instantiated within VHDL, then the xilinxcorelib library will already have been referenced in the code, and the availability of the xilinxcorelib library to the simulator will result in the behavioral model being used to simulate the core. Note: When direct instantiation is used, XST will employ its own version of xilinxcorelib. The XST library is made available to XST when the IP download is installed. Alternatively, a simulation wrapper file can be generated by CORE Generator. Within the wrapper file, the generics on the component instance are set to the same values used to generate the core. For further details on how to generate a simulation wrapper file, see the VHDL Design Flow within the CORE Generator documentation.
Verilog Simulation
A Verilog model of the Xilinx Floating-Point core is not supplied within the Verilog version of xilinxcorelib. However, a Verilog structural simulation model for a specific core can be generated by CORE Generator. See the Generation Panel under Project Options for the controls to enable this. For further details, see Verilog Design Flow within the CORE Generator documentation.
www.xilinx.com
21
Number
1 0
LUTs
92 339 438 166 195 66 13 82 460 172 308 160
FFs
110 401 430 176 190 89 52 24 751 186 447 161
1. Maximum frequency obtained with map switches -ol high and -cm speed, and par switches -pl high and -rl high.
Table 22: Characterization of 17-Bit Fraction and 24-Bit Total Wordlength on Virtex-4 FPGA
Number
2 1 0
LUTs
89 103 345
-10
396 354 267
22
www.xilinx.com
Table 22: Characterization of 17-Bit Fraction and 24-Bit Total Wordlength on Virtex-4 FPGA (Continued)
Number
LUTs
388 166 217 87 33 79 460 200 308 159
-10
324 349 346 322 497 338 313 290 325 304
1. Maximum frequency obtained with map switches -ol high and -cm speed, and par switches -pl high and -rl high.
Table 23: Characterization of 17-Bit Fraction and 24-Bit Total Wordlength on Virtex-5 FPGA
Number
2 1 0
LUTs
74 93 340 341 139 166 74 35 66 457 179 332 135
-1
450 406 328 399 355 396 418 478 394 394 343 430 411
1. Maximum frequency obtained with map switches -ol high and -cm speed, and par switches -pl high and -rl high.
www.xilinx.com
23
Single-Precision Format The resource requirements and maximum clock rates achievable with single-precision format on Spartan-3E FPGAs are summarized in Table 24, on Virtex-4 in Table 25, and on Virtex-5 in Table 26.
Table 24: Characterization of Single-Precision Format on Spartan-3E FPGA
Number
4 0 0
LUTs
185 630 580 221 251 15 100 824 234 513 214
FFs
275 696 591 227 237 101 24 1,370 229 787 206
1. Maximum frequency obtained with map switches -ol high and -cm speed, and par switches -pl high and -rl high.
Number
5 4 1 0 4 0
LUTs
116 139 509 641 372 578 226 282 43 97
FFs
235 259 562 698 466 594 233 238 101 24
24
www.xilinx.com
Number
LUTs
824 262 513 213
FFs
1,370 229 787 206
1. Maximum frequency obtained with map switches -ol high and -cm speed, and par switches -pl high and -rl high.
Number
3 2 1 0 2 0 0
LUTs
88 126 294 641 267 429 536 181 218 44 80 788 227 542 175
-1
450 429 375 357 410 395 372 398 373 466 393 365 316 398 388
1. Maximum frequency obtained with map switches -ol high and -cm speed, and par switches -pl high and -rl high.
www.xilinx.com
25
Double-Precision Format The resource requirements and maximum clock rates achievable with double-precision format on Spartan-3E FPGAs are summarized in Table 27, on Virtex-4 in Table 28, and on Virtex-5 in Table 29.
Table 27: Characterization of Double-Precision Format on Spartan-3E FPGA
Number
16 0 0
LUTs
681 2,296 1,272 563 523 95 164 3,335 437 1,904 444
-4
104 124 192 103 145 168 172 126 120 132 131
1. Maximum frequency obtained with map switches -ol high and -cm speed, and par switches -pl high and -rl high.
Number
17 16 9 0 3 0
LUTs
551 550 1,332 2,311 1,220 1,274 565 523 121 161
FFs
759 774 1,658 2,457 1,139 1,139 506 447 113 24
26
www.xilinx.com
Number
LUTs
3,335 437 1,904 445
FFs
6,002 401 3,234 392
1. Maximum frequency obtained with map switches -ol high and -cm speed, and par switches -pl high and -rl high.
Number
13 12 0 3 0 0
LUTs
416 424 2,309 821 804 1,045 402 396 107 142 3,228 354 1,940 355
FFs
654 669 2,457 976 1,060 1,185 504 446 131 24 6,002 399 3,234 391
1. Maximum frequency obtained with map switches -ol high and -cm speed, and par switches -pl high and -rl high.
www.xilinx.com
27
Ordering Information
This core may be downloaded from the Xilinx IP Center for use with the Xilinx CORE Generator v8.2i. The Xilinx CORE Generator is bundled with the ISE Foundation Series Development software at no additional charge. Information about additional Xilinx LogiCORE modules is available on the Xilinx IP Center or by contacting your local Xilinx sales representative.
Support
Provided by Xilinx, Inc. @ www.xilinx.com/support.
References
1. ANSI/IEEE, IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985. IEEE-754.
Revision History
This table shows the revision history of this document. Date
04/28/05 07/27/05 01/18/06 09/28/06
Version
1.0 1.1 2.0 3.0 Initial Xilinx release.
Revision
Document modified to include minor corrections and section on simulation. Updated to version 2.0 of core, Xilinx tools v8.1i. Updated to version 3.0 of core, Xilinx tools v8.2i.
28
www.xilinx.com