Ch02 Data Representation
Ch02 Data Representation
Ch02 Data Representation
These slides are derived from those of Null & Lobur + the work of others.
Chapter 2: Data Representation
Chapter 2 Objectives
Understand the fundamentals of numerical data representation and manipulation in digital computers both integer and floating point. Master the skill of converting between various radix systems. Understand how errors can occur in computations because of overflow and truncation. Gain familiarity with the most popular character codes. Understand the concepts of error detecting and correcting codes.
2.1 Introduction
REVIEW:
A bit is the most basic unit of information in a computer. It is a state of on or off in a digital circuit. Sometimes these states are high or low voltage instead of on or off.. A byte is a group of eight bits. A byte is the smallest possible addressable (can be found via its location) unit of computer storage. A word is a contiguous group of bytes. Words can be any number of bits (16, 32, 64 bits are common). A group of four bits is called a nibble (or nybble). Bytes, therefore, consist of two nibbles: a high-order nibble, and a low-order nibble.
Our decimal system is the base-10 system. It uses powers of 10 for each position in a number.
Any integer quantity can be represented exactly using any base (or radix).
1 24+ 1 23 + 0 22 + 0 21 + 1 20
= 16
+ 0
+ 1 = 25
When the radix of a number is something other than 10, the base is denoted by a subscript.
Sometimes, the subscript 10 is added for emphasis: 110012 = 2510
Subtraction Method
Subtraction Method
Subtraction Method
Subtraction Method
Note: The book also talks about the Multiplication Method which you may prefer.
Chapter 2: Data Representation
10
HEX
The binary numbering system is the most important radix system for digital computers. But, its difficult to read long strings of binary numbers-- and even a modest decimal number becomes a very long binary number. For example: 110101000110112 = 1359510 For compactness and ease of reading, binary values are usually expressed using the hexadecimal, or base-16, numbering system. The hexadecimal numbering system uses the numerals 0 through 9 and the letters A through F. The decimal number 12 is C16. The decimal number 26 is 1A16. It is easy to convert between base 16 and base 2, because 16 = 24. Thus, to convert from binary to hexadecimal, all we need to do is group the binary digits into groups of four.
11
HEX
12
There are three ways in which signed binary numbers may be expressed:
Signed magnitude, Ones complement we wont do much with this here Twos complement.
13
14
The simplicity of this system makes it possible for digital circuits to carry out arithmetic operations. We will describe these circuits in Chapter 3.
Lets see how the addition rules work with signed magnitude numbers . . .
Chapter 2: Data Representation
15
In the second bit, we have a carry, so we note it above the third bit.
Chapter 2: Data Representation
16
Example: Using signed magnitude binary arithmetic, find the sum of 107 and 46. The carry from the seventh bit overflows and is discarded, giving us the erroneous result: 107 + 46 = 25.
Chapter 2: Data Representation
17
Mixed sign addition (or subtraction) is done the same way. Example: Using signed magnitude binary arithmetic, find the sum of +46 and -25.
The sign of the result gets the sign of the number that is larger. Chapter 2: Data Representation Note the borrows from the second and sixth bits. 18
Another disadvantage: it allows two different representations for zero: positive zero and negative zero.
So computers systems employ complement systems for numeric value representation.
19
20
21
22
Using the same number of bits, unsigned integers can express twice as many values as signed numbers.
23
24
25
Use unsigned magnitude to represent both positive and negative numbers by using a bias, or excess, representation The entire numbering system is shifted up some positive amount. To convert a number from excess-8 to decimal. For a number in excess-8, we subtract 8 from the number to get the real value (1001 in excess-8 is 1001 1000 = 00012 = +1) To convert a number from decimal to excess-8. +6 = 01102. Add on the bias like this: 0110 + 1000 = 1110 the is the excess-8 number. To use the representation numbers have to be shifted, then stored, and then shifted back when retrieved Unnecessarily complicated for regular use. Useful to represent exponents in our floating point representation (shown next) Chapter 2: Data Representation
26
Computers use a form of scientific notation for floating-point representation Numbers written in scientific notation have three components:
27
The one-bit sign field is the sign of the stored value. The size of the exponent field, determines the range of values that can be represented. The size of the significand determines the precision of the representation.
Chapter 2: Data Representation
28
The IEEE-754 single precision floating point standard uses an 8-bit exponent and a 23-bit significand. The IEEE-754 double precision standard uses an 11bit exponent and a 52-bit significand.
For illustrative purposes, (and to keep life VERY simple) we will use a 14-bit model with a 5-bit exponent and an 8bit significand.
Chapter 2: Data Representation
29
Using this information, we put 110 (= 610) in the exponent field and 1 in the significand as shown.
Chapter 2: Data Representation
30
Another problem with our system is that we have made no allowances for negative exponents. We have no way to express 0.5 (=2 -1)! (There is no sign in the exponent field!)
All of these problems can be fixed with no changes to our Chapter 2: Data Representation 31 basic model.
In our simple instructional model, we will not use the implied bits mechanism the 1 will BE there.
To provide for negative exponents, well use a biased exponent (remember our detour??) We have a 5-bit exponent. Well use 16 for our bias. This is called excess-16 representation. In our model, exponent values less than 16 are negative, meaning that the number were representing is negative.
Chapter 2: Data Representation
32
33
34
Sign bit = 0 (positive) Exponent = -2 (01110 10000 = -2) Significand = .10000000 We shift the decimal point 2 positions to the left, giving us 0.001 = +.125 Sign bit = 1 (negative) Exponent = 3 (10011 10000 = 3) Significand = .11010100 We shift the decimal point 3 positions to the right, giving 110.101= -6.625
Chapter 2: Data Representation
35
The double precision standard has a bias of 1023 for its 11-bit exponent.
The special exponent value for a double precision number is 2047, instead of the 255 used by the single precision standard.
Both the 14-bit model that we have used and the IEEE-754 floating point standard allow two representations for zero.
Zero is indicated by all zeros in the exponent and the significand, but the sign bit can be either 0 or 1.
36
Example:
Find the sum of 1210 and 1.2510 using the 14-bit floating-point model. We find 1210 = 0.1100 x 2 4. And 1.2510 = 0.101 x 2 1 = 0.000101 x 2 4. Thus, our sum is 0.110101 x 2 4.
Chapter 2: Data Representation
37
Example:
Find the product of 1210 and 1.2510 using the 14-bit floating-point model. We find 1210 = 0.1100 x 2 4. And 1.2510 = 0.101 x 2 1. Thus, our product is
0.0111100 x 2 5 = 0.1111 x 2 4.
The normalized product requires an exponent of 2210 = 101102.
Chapter 2: Data Representation
38
39
128.5
40
41
42
43
Binary-coded decimal (BCD) was one of these early codes. It was used by IBM mainframes in the 1950s and 1960s.
Chapter 2: Data Representation
44
45
46
47
With more parity bits, we can not only detect an error, but correct it, or detect 2 errors. Can also have odd parity total number of 1s is odd. Chapterwhere 2: Data Representation 48
Chapter 2 Summary
Computers store data in the form of bits, bytes, and words using the binary numbering system.
Signed integers can be stored in ones complement, twos complement, or signed magnitude representation.
Floating-point numbers are usually coded using the IEEE 754 floating-point standard. Floating-point numbers may have errors in representation, overflow, & underflow. Character data is stored using ASCII, EBCDIC, or Unicode.
Error detecting and correcting codes are necessary because we can expect no transmission or storage medium to be perfect.
Chapter 2: Data Representation
49
50
51
Longer data streams require more economical and sophisticated error detection mechanisms. Cyclic redundancy checking (CRC) codes provide error detection for large blocks of data.
52
53
You will fully understand why modulo 2 arithmetic is so handy after you study digital circuits in Chapter 3.
Chapter 2: Data Representation
54
As with traditional division, we note that the dividend is divisible once by the divisor. We place the divisor under the dividend and perform modulo 2 subtraction.
55
Now we bring down the next bit of the dividend. We see that 00101 is not divisible by 1101. So we place a zero in the quotient.
56
57
58
59
60
Thus, to provide data integrity over the long term, error correcting codes are required.
61
Because the mathematics of Hamming codes is much simpler than Reed-Soloman, we discuss Hamming codes in detail.
62
The minimum Hamming distance for a code is the smallest Hamming distance between all pairs of words in the code.
63
64
Thus, a Hamming distance of 2k + 1 is required to be able to correct k errors in any data word. Hamming distance is provided by adding a suitable number of parity bits to a data word.
65
66
(n + 1) 2 m 2 n
Because n = m + r, we can rewrite the inequality as: (m + r + 1) 2 m 2 m + r or (m + r + 1) 2 r
This inequality gives us a lower limit on the number of check bits that we need in our code words.
67
3.
This means to build a code with 4-bit data words that will correct single-bit errors, we must add 3 check bits. Finding the number of check bits is the hard part. The rest is easy.
68
69
1 = 20 2 = 21 +2
0
5 = 22 + 2 0 6 = 22 + 2 1 7 = 22 + 21 + 2 0
9 = 23 10 = 2 3 11 = 2 3
+
+
21
21
3 = 21+ 2 0
4 = 22 8 = 23 12 = 2 3 + 2 2 1 (= 20) contributes to all of the odd-numbered digits. 2 (= 21) contributes to the digits, 2, 3, 6, 7, 10, and 11. . . . And so forth . . .
70
71
72
The completed code word is shown above. Bit 1checks the digits, 3, 5, 7, 9, and 11, so its value is 1. Bit 4 checks the digits, 5, 6, 7, and 12, so its value is 1. Bit 8 checks the digits, 9, 10, 11, and 12, so its value is also 1. Using the Hamming algorithm, we can not only detect single bit errors in this code word, but also correct them!
73
Suppose an error occurs in bit 5, as shown above. Our parity bit values are: Bit 1 checks digits, 3, 5, 7, 9, and 11. Its value is 1, but should be zero. Bit 2 checks digits 2, 3, 6, 7, 10, and 11. The zero is correct. Bit 4 checks digits, 5, 6, 7, and 12. Its value is 1, but should be zero. Bit 8 checks digits, 9, 10, 11, and 12. This bit is correct.
74
We have erroneous bits in positions 1 and 4. With two parity bits that dont check, we know that the error is in the data, and not in a parity bit. Which data bits are in error? We find out by adding the bit positions of the erroneous bits. Simply, 1 + 4 = 5. This tells us that the error is in bit 5. If we change bit 5 to a 1, all parity bits check and our data is restored.
Chapter 2: Data Representation
75
Chapter 2 Summary
Computers store data in the form of bits, bytes, and words using the binary numbering system. Hexadecimal numbers are formed using four-bit groups called nibbles (or nybbles). Signed integers can be stored in ones complement, twos complement, or signed magnitude representation. Floating-point numbers are usually coded using the IEEE 754 floating-point standard. Floating-point operations are not necessarily commutative or distributive. Character data is stored using ASCII, EBCDIC, or Unicode. Error detecting and correcting codes are necessary because we can expect no transmission or storage medium to be perfect. CRC, Reed-Soloman, and Hamming codes are three important error control codes.
Chapter 2: Data Representation
76