IGCSE - ComputerScience - Chapter1 (Data Representation)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

DATA REPRESENTATION Zin Mar soe

DATA & INFORMATION


➢ Data is number, symbols or alphanumeric characters in their raw format before
processing
➢ Information is organized or classified data, which has some meaningful values for
the receiver.
➢ Analogue data as human we use, such as sound or light waves and impulses on our
skin.
➢ Computers are only capable of processing digital data.
DENARY AND BINARY NUMBER
➢ Denary number system is in our daily lives we used. A system of numbers uses the digit
0-9 and it is called a base-10 number system.
➢ Binary number system is a base 2 number system only the two ‘values’ 0 and 1
BINARY ADDITION
➢ In binary addition carry over is needed if the result is greater than 1
➢ Overflow Error , this condition occurs when a calculation produces a result that is
greater than the computer can deal with or store.

1 1 0 1 0 1 1 0
0 1 1 0 0 1 1 1
1 0 0 1 1 1 1 0 1
(carry over)
OVERFLOW ERROR
➢ The maximum denary value of an 8-bit binary number is 255 (which is 28 – 1).
➢ The generation of a 9th bit is a clear indication that the sum has exceeded this
value.
➢ This is known as an overflow error and in this case is an indication that a number
is too big to be stored in the computer using 8 bits.
SIGN-AND-MAGNITUDE (SM)
➢ If the sign bit is “0”, the number is positive. If the sign bit is “1”, then the number
is negative.
➢ The remaining bits in the number are used to represent the magnitude of the
binary number in the usual unsigned binary number format way.

▪ Positive Signed Binary Numbers ▪ Negative Signed Binary Numbers


SIGN-AND-MAGNITUDE (SM)
If we have 4 bits to represent a signed binary number, (1-bit for the Sign bit and
3-bits for the Magnitude bits).
TWO’S COMPLEMENT (METHOD-1)
To find −109
128-109 = 19
Put 1 at 128 for negative sign

128 64 32 16 8 4 2 1
1 0 0 1 0 0 1 1

128+16+2+1 = -109
TWO’S COMPLEMENT (METHOD-1)
To find −28
128−28 = 100
Put 1 at 128 for negative sign

128 64 32 16 8 4 2 1
1 1 1 0 0 1 0 0

-128+64+32+4=-28
TWO’S COMPLEMENT (METHOD-2)
Step 1: Write the absolute value of the given number in binary form. Prefix this
number with 0 indicate that it is positive.

Step 2: Take the complement of each bit by changing zeroes to ones and ones to
zero.

Step 3: Add 1 to your result. This is the two’s complement representation of the
negative integer.
TWO’S COMPLEMENT (METHOD-2)

1. Convert to binary value without sign

2. Flip → 1 to 0, 0 to 1

3. Add 1 → +1
TWO’S COMPLEMENT (METHOD-2)
27 26 25 24 23 22 21 20

128 64 32 16 8 4 2 1

69 0 1 0 0 0 1 0 1

Flip 1 0 1 1 1 0 1 0

Add 1 1 0 1 1 1 0 1 1
EXAMPLE: FIND THE TWO’S COMPLEMENT OF -17 WITH METHOD-2
Step 1: 1710 = 0001 00012

Step 2: Take the complement: 1110 1110

Step 3: Add 1: 1110 1110 + 1 = 1110 1111.

Find the two’s complement for

a. -11

b. -43

c. -123
TWO’S COMPLEMENT BINARY BACK TO BASE TEN

Step 1: Subtract 1: 1110 1111 - 1 = 1110 1110

Step 2: Take the complement of the complement: 0001 0001

Step 3: Change from base 2 back to base 10 :16 + 1 = 17

Step 4: Rewrite this as a negative integer: -17


TWO’S COMPLEMENT BINARY SUBTRACTION
95 – 68
First convert the two numbers into binary: 95 = 0 1 0 1 1 1 1 1
68 = 0 1 0 0 0 1 0 0
Now find the two’s complement of 68: 10111011
+1
−68 = 1 0 1 1 1 1 0 0

The additional ninth bit is simply ignored leaving the binary number: 0 0 0 1 1 0 1 1 (denary
equivalent of 27 which is the correct result of the subtraction).
LOGICAL SHIFT
Multiple → shift left
e.g: multiple 22 means shift left two places

Divided → shift right

Eg: divided 22 means shift right two places


MEASURING MEMORY SIZE
Unit Name Memory Value
1 Bits (b) A single 1 or 0
1 Nibble 4 Bits
1 Byte (B) 8 Bits
1 KB (Kilobyte) 1024 Bytes
1 MB (Megabyte) 1024 KB
1 GB (GigaByte) 1024 MB
1 TB (TeraByte) 1024 GB
1 PB (PetaByte) 1024 TB
1 EB (ExaByte) 1024 PB
MEASURING MEMORY SIZE
➢ There are two types of Storage units:
➢ SI units (base on the 1000 bytes) and IEC units (base on the 1024 bytes)
➢ According to the SI (International System of Units) standard, there are 1000 bytes in a kilobyte
can also be called base 10.
➢ There is another standard called IEC (International Electrotechnical Commission) that has 1024
bytes in a kibibyte, people will call it base 2.
➢ KiB is 1024 bytes, where as a kB is 1000 bytes.
Example
You can buy a 3 TB (3 terabyte drive) but when you plug it in to your PC, your PC will say its 2.7
TiB (windows might actually use the incorrect shorthand of 2.7 TB, missing the little “i”). So every
3 TB you buy, you lose 300 GBs (or at least it feels like you do). In reality you don’t lose
anything. You simply bought 3 TB, or 3000,000,000,000 bytes of storage. Your getting all
3000,000,000,000 bytes of storage.
3000 gigabytes of storage, or 3 terabytes of storage – we simply divide by 1000 to get to the
next unit. But computers don’t divide by 1000, as they use base 1024 (which is just base 2 to the
10th power), so we need to divide by 1024 then we get 2793 gibibytes, or 2.72 tebibytes.
MEASURING MEMORY SIZE
With a base of 10 With a base of 2 With a base of 1024 With a base of 1000

KB = 103 103 KB = 210 210 KB = 10241 10241 KB = 10001 10001

MB = 106 106 MB = 220 220 MB = 10242 10242 MB = 10002 10002

GB = 109 109 GB = 230 230 GB = 10243 10243 GB = 10003 10003

TB = 1012 1012 TB = 240 240 TB = 10244 10244 TB = 10004 10004

EB = 10^15 1015 EB = 2^50 250 EB = 1024^5 10245 EB = 1000^5 10005

Terabyte is exactly 1,000,000,000,000 Byte 10^12 or 1000^4

TebiByte is 1,099,511,627,776 Bytes 2^40 or 1024^4


USING BINARY NUMBER
➢ Register is small piece of memory built into the central processing unit (CPU) of the
computer system.
➢ Value and instruction are temporarily held.
➢ Extremely fast read and write. Eg: small amount of data processing like calculations.
➢ There are different type of register, such as processor registers and hardware registers.
➢ Processor registers for example: program counter (PC), the accumulator and the memory
address register (MAR).
➢ Hardware registers are specific to different types of hardware and are used to convey
a signal.
➢ Example: Robot arm that has various motors to perform different operations like rise
arm, open the grip and close the grip. Digital instrument
BINARY CODED DECIMAL (BCD)
The Binary Coded Decimal (BCD) system uses a 4-bit code to represent each denary
digit.

Therefore, the denary number, 3 1 6 5, would be 0 0 1 1 0 0 0 1 0 1 1 0 0 1 0 1 in


BCD format.
HEXADECIMAL NUMBER SYSTEM
➢ A system of numbers with a base of 16. Each unit used increases by the power of 16.
➢ Uses 16 numbers, 0-9 and the letter A-F.

De 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

He 0 1 2 3 4 5 6 7 8 9 A B C D E F

➢ Computer do not actually process hexadecimal, they convert it into binary before
processing it.
➢ Programmer work with hexadecimal as it is easier for humans to read than binary.
➢ It is a much shorter way of representing a byte of data as reading and understanding.
➢ Easier to debug than binary.
USING HEXADECIMAL NUMBER
Colours within HTML or CSS
The 6 digit hex colour code should be considered in three parts.
▪First two digits represents the amount of red in the colour (max FF, or 255)
▪The next two digits represent the amount of green in the colour (max FF, or 255)
▪The final two digits represent the amount of blue in the colour (max FF, or 255)
By changing the intensities of red, green and blue, we can create almost any colour.
E.g. orange can be represented as #FFA500, which is (255 red, 165 green, 0 blue)
USING HEXADECIMAL NUMBER
MAC Addresses
▪ A Media Access Control (MAC) address is a number which uniquely identifies a device
on the internet. NN:NN:NN:DD:DD:DD
▪ The MAC address relates to the network interface card (NIC) inside of the device.
▪ e.g. D5-BE-E9-8D-44-9C
▪ NN:NN:NN is (OUI-Organizationally Unique Identifier), Vender number/ company
number
▪ DD:DD:DD is (UAA- Universally Administered Address), device unique serial number
▪ Sometime UAA as LAA (Locally Administered Address)
▪ Expressing MAC addresses in hexadecimal format makes them easier to read and
work with.
USING HEXADECIMAL NUMBER
HTTP Error Message
▪ Standard windows error message codes are given in hexadecimal notation,
▪ Example: Code 404 (meaning “File not Found”)
DATA STORAGE
Character Set
▪ All the characters and symbols that can be represented by
a computer system.
▪ Each character and symbol is assigned a unique value
ASCII (American Standard Code for Information Interchange)
Uses 8 bits, giving a possible 256 characters, suitable for Standard English

Unicode
Uses 16 bits, up to 65,536 different characters to be represented... more than enough
for every language and some special characters.
ASCII AND UNICODE
➢ Text is converted to binary to be processed by a computer
➢ Unicode allows for a greater range of characters and symbols than ASCII,
including different languages and emojis
➢ Unicode requires more bits per character than ASCII.
➢ The standard ASCII code character set consists of 7-bit codes (0 to 127 in denary
or 00 to 7F in hexadecimal) that represent the letters, numbers and characters
found on a standard keyboard, together with 32 control codes (that use codes 0
to 31 (denary) or 00 to 19 (hexadecimal)).
ASCII AND UNICODE
➢ Extended ASCII uses 8-bit codes (0 to 255 in denary or 0 to FF in hexadecimal).
➢ The main disadvantage is that it does not represent characters in non-Western
languages.
➢ Unicode can represent all languages of the world, thus supporting many
operating systems, search engines and internet browsers used globally.
➢ The first 128 (English) characters are the same, but Unicode can support several
thousand different characters in total.
➢ encoded as 16-bit or 32-bit code
➢ create a universal standard that covered all languages and all writing systems
➢ produce a more efficient coding system than ASCII
PICTURES
➢ An images are analogue data
➢ Need to convert analogue data into digital data for a computer to process it.
➢ Images are made up of pixels. A pixel is a tiny dot on the screen
➢ If an image is black and white, each pixel will be black(1) and white(0).
➢ Each pixel can only be one single colour at a time.
➢ If an image is more than two colors and that colors are made of red, green and blue
(RGB), more large files and high quality.
➢ Each pixel can determine what colour to display as it is represented by a binary value.
e.g. 11101 might be dark green.
➢ CMYK. Stands for "Cyan Magenta Yellow Black." These are the four basic colors used
for printing color images.
➢ RGB color mode is best for digital work, while CMYK is used for print products.
RESOLUTION
➢ Resolution is the concentration of pixels that are within a specific
area i.e. an image. The greater the number of pixels within a
specific area, the higher the image quality.
➢ The resolution is the number of pixels in the image
➢ How many pixels wide and how many pixels high an image
➢ Image metadata includes details relevant to the image.
➢ The colour depth is the number of bits used to represent each
colour
➢ The file size and quality of the image increases as the resolution
and colour depth increase
METADATA
REPRESENTATION OF (BITMAP) IMAGES
➢ Bitmap images are made up of pixels (picture elements); an image is made up of a two-dimensional
matrix of pixels. Pixels can take different shapes such as: Red Green Blue.
➢ Each pixel can be represented as a binary number:
▪ a black and white image only requires 1 bit per pixel – this means that each pixel can be one of
two colours, corresponding to either 1 or 0
▪ if each pixel is represented by 2 bits, then each pixel can be one of four colours (22 = 4),
corresponding to 00, 01, 10, or 11
▪ if each pixel is represented by 3 bits then each pixel can be one of eight colours (23 = 8),
corresponding to 000, 001, 010, 011, 100, 101, 110, 111.
➢ The number of bits used to represent each colour is called the colour depth.
➢ An 8 bit colour depth means that each pixel can be one of 256 colours (28= 256)
IMPACTS HIGH RESOLUTION IMAGES
➢ The main drawback of using high resolution images is the increase in file size.
➢ As the number of pixels used to represent the image is increased, the size of the file
will also increase.
➢ This impacts on how many images can be stored on, for example, a hard drive.
➢ It also impacts on the time to download an image from the internet or the time to
transfer images from device to device.
➢ A certain amount of reduction in resolution of an image is possible before the loss of
quality becomes noticeable.
SOUND
➢ Analogue sound signals are continuous where as digital signals are discrete:
SOUND
➢ A sound wave is sampled for sound to be converted to binary, which is processed by a
computer
➢ The sample rate is the number of samples taken in a second
➢ The sample resolution is the number of bits per sample
➢ The accuracy of the recording and the file size increases as the sample rate and
resolution increase
SOUND SAMPLING
➢ The process of converting an analogue signal into a digital one is known as
sampling. Other way, Sampling means measuring the amplitude of the sound
wave. This is done using an analogue to digital converter (ADC).
➢ Sampling involves taking a sample of the analogue signal at set intervals:

• The red line which represents the analogue sound wave.


• The black line represents the digital signal and the sampling process.
REPRESENTATION OF SOUND
➢ Each sound wave has a frequency, wavelength and amplitude. The amplitude
specifies the loudness of the sound.
REPRESENTATION OF SOUND
➢ Sound waves vary continuously. This means that sound is analogue.
➢ Computers cannot work with analogue data, so sound waves need to be sampled in
order to be stored in a computer.
➢ Sampling means measuring the amplitude of the sound wave.
➢ This is done using an analogue to digital converter (ADC).
➢ To convert the analogue data to digital, the sound waves are sampled at regular time
intervals.
➢ The amplitude of the sound cannot be measured precisely, so approximate values are
stored.
REPRESENTATION OF SOUND
REPRESENTATION OF SOUND
➢ 4 binary bits can be used to represent each amplitude value (for example, 9 would be
binary value 1001).
➢ Increasing the number of possible values used to represent sound amplitude also
increases the accuracy of the sampled sound (for example, using a range of 0 to 127
gives a much more accurate representation of the sound sample than using a range of,
for example, 0 to 10).
REPRESENTATION OF SOUND

➢ The number of bits per sample is known as the sampling resolution (also known as the
bit depth).
➢ Sampling rate is the number of sound samples taken per second. This is measured in
hertz (Hz), where 1 Hz means ‘one sample per second’.
➢ The higher the sampling rate and/or sampling resolution, the greater the file size.
RECORD A SOUND CLIP

➢ The amplitude of the sound wave is first determined at set time intervals (the sampling
rate)
➢ This gives an approximate representation of the sound wave
➢ Each sample of the sound wave is then encoded as a series of binary digits.
BENEFITS AND DRAWBACKS OF USING A LARGER SAMPLING RESOLUTION

Benefits Drawbacks

Larger dynamic range Produces larger file size

Better sound quality Takes longer to transmit /


download music files

Less sound distortion Requires greater processing power


CALCULATION OF FILE SIZE

The file size of an image is calculated as:


➢ image resolution (in pixels) × colour depth (in bits)
➢ The size of a mono sound file is calculated as:
➢ sample rate (in Hz) × sample resolution (in bits) × length of sample (in seconds)
➢ For a stereo sound file, you would then multiply the result by two.
➢ sample rate (in Hz) × sample resolution (in bits) × length of sample (in seconds) x 2
EXAMPLES:
1. A photograph is 1024 × 1080 pixels and uses a colour depth of 32 bits. How
many photographs of this size would fit onto a memory stick of 64 GiB?
2. A camera detector has an array of 2048 by 2048 pixels and uses a colour
depth of 16. Find the size of an image taken by this camera in MiB.
3. An audio CD has a sample rate of 44 100 and a sample resolution of 16 bits.
The music being sampled uses two channels to allow for stereo recording.
Calculate the file size for a 60-minute recording.
sample rate (in Hz) × sample resolution (in bits) × length of sample (in seconds)

1. 15 534 photos
2. 8 MiB
3. 605 MiB
DATA COMPRESSION
➢ Data compression is data in a file will become smaller in size. This means that less storage
space .
➢ So the file will be easier to transmit from one device to another.
➢ Data compression is done by using compression algorithms (a step-by-step set of
instruction).
➢ These algorithms normally manipulate the data so that repeating data is removed, either
on a temporary or permanent basis, depending on the method used.
➢ The purpose of and need for data compression is when less bandwidth required, less
storage space required and shorter transmission time
➢ There are two main methods for compressing data: lossy and lossless.
LOSSY COMPRESSION
➢ In lossy compression method, data that is deemed redundant or unnecessary is removed
permanently in the compression process, so it is effectively “lost”.
➢ This way the size of the file is reduced.
➢ Used for multimedia such as audio, video and image files.
➢ This is mostly done when streaming these files, as a file can be streamed much more
effectively if it is smaller in size.
LOSSLESS COMPRESSION
➢ In lossless compression method, loses no data in the process.
➢ The compressed data can be reversed to reconstruct the data file exactly as it was.
➢ There are many different lossless compression algorithms, most work using a shorthand to
store the data that can be then reconstructed when the file is opened.
➢ If a lossless compression method is used on a music file it will not lose any of the data from
the file.
➢ The data in the lossless method would be to look for repeating patterns in the music or file.
Then store how many times it is repeated. This way repeating data is reduced.
➢ When the music or data file is opened, the file or music can be reconstructed.
➢ Lossless compression can also be used when storing text files and want to the highest
quality of music file.
RUN-LENGTH ENCODING (RLE)
▪ Run-length encoding (RLE) can be used for lossless compression of a number of different file
formats:
▪ it is a form of lossless/reversible file compression
▪ it reduces the size of a string of adjacent, identical data (e.g. repeated colours in an image)
▪ a repeating string is encoded into two values:
➢ the first value represents the number of identical data items (e.g. characters) in the run
➢ second value represents the code of the data item (such as ASCII code if it is a keyboard
character)
▪ RLE is only effective where there is a long run of repeated units/bits.
USING RLE ON TEXT DATA
‘aaaaabbbbccddddd’

a a a a a b b b b c c d d d d d

0597 0498 0299 05100


ISSUE OF RLE
▪ One issue occurs with a string such as ‘cdcdcdcdcd’ where RLE compression isn’t very
effective.
▪ To cope with this, we use a flag. A flag preceding data indicates that what follows are
the number of repeating units.
▪ When a flag is not used, the next byte(s) are taken with their face value and a run of 1
USING RLE ON TEXT DATA
String aaaaaaaa bbbbbbbbbb c d c d c d eeeeeeee

Code 08 97 10 98 01 99 01 100 01 99 01 100 01 99 01 100 08 101

• The original string contains 32 characters and would occupy 32 bytes of storage.
• The coded version contains 18 values and would require 18 bytes of storage.
• Introducing a flag (255 in this case) produces:

255 08 97 255 10 98 99 100 99 100 99 100 255 08 101

• This has 15 values and would, therefore, require 15 bytes of storage. This is a reduction in
file size of about 53%
BLACK AND WHITE IMAGE

The 8 × 8 grid would need 64 bytes; the compressed RLE format has 30 values, and
therefore needs only 30 bytes to store the image.
COLOURED IMAGES

2 0 0 0 4 0 255 0 3 0 0 0 6 255 255 255 1 0 0 0 2 0 255 0 4 255 0 0 4 0 255 0


1 255 255 255 2 255 0 0 1 255 255 255 4 0 255 0 4 255 0 0 4 0 255 0 4 255
255 255 2 0 255 0 1 0 0 0 2 255 255 255 2 255 0 0 2 255 255 255 3 0 0 0 4 0
255 0 2 0 0 0. (92 bytes)
COLOURED IMAGES
▪ The original image (8 × 8 square) would need 3 bytes per square (to include all three
RGB values).
▪ Therefore, the uncompressed file for this image is 8 × 8 × 3 = 192 bytes.
▪ The RLE code has 92 values, which means the compressed file will be 92 bytes in size.
▪ This gives a file reduction of about 52%.
▪ It should be noted that the file reductions in reality will not be as large as this due to other
data which needs to be stored with the compressed file.
KEYS NOTES

▪ Lossless compression reduces the file size without permanent loss of data, e.g. run
length encoding (RLE)
▪ Lossy compression reduces the file size by permanently removing data, e.g. reducing
resolution or colour depth, reducing sample rate or resolution
Thank You !!

You might also like