Data Type and Data Structure
Data Type and Data Structure
Data Type and Data Structure
The primary purpose of most computer programs is to store and retrieve data rather than to
perform calculations. There are many different ways to organize data for storage and
retrieval, and each type of organization is well suited to solving certain kinds of problems
and poorly suited to solving problems of other kinds. Consequently, the way in which a
program's data is organized may have a profound effect on the program's running time
and memory requirements. The finite set of values along with set of rules for different
operations is called data type and the study of data organization is called data structures and
is the primary focus of this topic.
Learning objectives
After studying this chapter, student should be able to:
Give examples of data types and state how their represented in the memory.
Appreciate and use concepts of data structure as array, pointer, structure, linked
list, …
Appreciate the advantages of abstract data types as a software development
strategy.
To develop significant facility with the Abstract Data Type like Stack, Queue, and
binary tree from the client perspective.
Implement Abstract Data Type used data structures
Contents
I. INTRODUCTION TO DATA TYPE....................................................................................................... 2
II. PRIMITIVE DATA TYPES ................................................................................................................... 2
III. COMPOSITE DATA TYPE ....................................................................................................... 3
IV. ABSTRACT DATA TYPES ...................................................................................................... 8
To control the range of numbers and storage space, C has three classes of integer storage
namely short int, int and long int. All three data types have signed and unsigned forms. A
short int requires half the amount of storage than normal integer. Unlike signed
integer, unsigned integers are always positive and use all the bits for the magnitude of
the number. Therefore, the range of an unsigned integer will be from 0 to 65535. The long
integers are used to declare a longer range of values and it occupies 4 bytes of storage space.
Data Type Bytes Range Format
short int or signed short int 2 -32768 to +32767 %d
unsigned short int 2 0 to 65535 %u
long int or signed long int or long 4 -2147483648 to +2147483647 %ld
unsigned long int or unsigned long 4 0 to 4294967295 %Ld
The floating point data type is used to store fractional numbers (real numbers) with 6
digits of precision. Floating point numbers are denoted in C by the keyword float. When the
accuracy of the floating point number is insufficient, we can use the double to define the
number. The double is same as float but with longer precision and takes double space (8
bytes) than float. To extend the precision further we can use long double which occupies
12 bytes of memory space.
Data Type Bytes Range Format
float 4 -3.4e 38 to +3.4e38 %f
double 8 -1.7e 308 to +1.7e 308 %lf
long double 12 -1.7e 4932 to +1.7e 4932 %Lf
In computer science, a composite data type is any data type which can be constructed in a
program using its programming language's primitive data types and other composite types.
The act of constructing a composite type is known as composition.
III.1 Record
A record also called a structure is a group of related data items stored in fields, each with its
own name and datatype. Suppose you have various data about an employee such as name,
salary, and hire date. These items are logically related but dissimilar in type. A record
containing a field for each item lets you treat the data as a logical unit. Thus, records make it
easier to organize and represent information.
Such a declaration can be as follow:
pseudocode In C
Type employee = record struct person
Name: string {
Salary: number char name[50];
float salary;
sex: character
char sex;
endrecord };
The size of a structure depends on the data type of its each field. For instance, for example
with the structure defined above,
Sizeof (struct person) = 50 * sizeof(char) + sizeof(float) + sizeof(char)
= 50*1 + 4 + 1
= 55 bytes
III.2 Array data type
An array is a sequenced collection of elements of the same data type with a single
identifier name. Arrays can have multiple axes (more than one axis). Each axis is a
dimension. Thus a single dimension array is also known as a vector. A two dimension array
is commonly known as a matrix (a spreadsheet like Excel is a two dimension array).
Example: Tab is an 21 45 65 32 75 98 15 34 11 20
array
We refer to the individual values as members (or elements) of the array. There is
only one identifier name assigned to the array. The position of an element in an array is called
index and is written in bracket. If Tab is an array, Tab[i] represent the element at the ith
position. Example: Tab[4] = 32, Tab[7] = 15
Declaration
In pseudocode In C
Age = array[1 to 5] of integer int age[5];
Age is an array of 5 integers. Notice that in C the index is not defined by the user but the first
is always 0
Here, the size of array age is 5 times the size of int (ie 20 bytes) because there are 5
elements. Suppose, the starting address of age[0] is 2120 and the size of int be 4 bytes. Then,
the next address (address of a[1]) will be 2124, address of a[2] will be 2128 and so on.
In pseudocode In C
Age[2] ← 4 Age[2] = 4;
Read(Age[1]) Scanf(“%d”, &Age[1])
Multidimensional Arrays
Multidimensional array is also called arrays of arrays or matrix. For example:
In pseudocode In C
A=array[1to 3, 1 to 6] of real float A[3][6];
Here, A is an array of two dimension, which is an example of multidimensional array. This
array has 3 rows and 6 columns In C the first element of the array is A[0][0], then the next
A[0][1], A[0][2], …
For better understanding of multidimensional arrays, array elements of above example can be
thinked of as below:
Col 1 Col 2 Col 3 Col 4 Col 5 Col 6
Row 1 A[1,1] A[1,2] A[1,3] A[1,4] A[1,5] A[1,6]
Row 2 A[2,1] A[2,2] A[2,3] A[2,4] A[2,5] A[2,6]
Row 3 A[3,1] A[3,2] A[3,3] A[3,4] A[3,5] A[3,6]
In pseudocode In C
FOR i FROM 1 TO 3 DO for (i=0; i<3; i++)
FOR j FROM 1 TO 6 DO for (j=0; j<6; j++)
Print(A[i, j]) printf(“%d”, A[i][j]);
ENDFOR
ENDFOR
III.3 String
A string is any finite sequence of characters (i.e., letters, numerals, symbols and punctuation
marks). An important characteristic of each string is its length, which is the number of
characters in it.
In C, a string is represented as an array of characters
In pseudocode In C
Name : string Char name[20]
Here the variable name cannot be more than 20 characters. In the variable name contain
“placide”, then we have name[0]=’p’, name[1]=’l’, name[2]=’a’ and so on.
III.4 Pointer
Pointers are widely used in programming; they are used to refer to memory location of
another variable without using variable identifier itself. They are mainly used in linked lists
and call by reference functions. The figure below illustrates the idea of pointers. As you can
see below; Yptr is pointing to memory address 100.
You can find out the memory address of a variable by simply using the address operator &.
Here is an example of its use: &v = FFD2
The above expression should be read as “address of v”, and it returns the memory address of
the variable v.
Pointer Declaration
Like all other C variables, pointers must be declared before they are used. The syntax for
pointer declaration is as follows:
Datatype * identifier;
Examples int* p; // p is a pointer to an integer
double* offset; // offset is a pointer to a double
Note that the prefix *defines the variable to a pointer. In the above example, p is the type
“pointer to integer” and offset is the type “pointer to double”.
Once a pointer has been declared, it can be assigned an address. This is usually done with the
address operator. For example,
After this assignment, we say that p is “referring to” the variable count or “pointing to” the
variable count. The pointer p contains the memory address of the variable count.
For the purpose of this document, we shall define a list as a pointer to a structure of type cell
like this:
Type list = record struct list
Element: integer {
next: list int element;
endrecord struct list next;
};
In a doubly linked list, each node contains, besides the next-node link, a second link field
pointing to the previous node in the sequence. The two links may be called forward(s) and
backwards, or next and prev(ious).
There exist two main types of abstract data types: Linear ADT (vector, queue, stack, …)
and NON-Linear ADT (binary tree, …)
IV.2 Common examples of ADT
IV.2.1 The Queue
a) Definition
A queue is a linear collection of items, where an item to be added to the queue must be
placed at the rear (tail) of the queue and items that are removed from the queue must be
removed from the front. A queue is referred to as a FIFO data structure: First In, First Out.
b) The operations
c) Example
Figure b: The simple queue after the fourth item is added and before an item is
removed
a) The definition
A stack is a linear collection of similar items, where an item to be
added to the stack must be placed on top of the stack and items that
are removed from the stack must be removed from the top. The top
of the stack is known as the top. The term push means to add an
item to the stack, and the term pop means to remove an item from
the stack. A stack is referred to as a LIFO data structure: Last In,
First Out.
b) The operations
There are actually very few operations on a stack.
Operation Signature specification
Push() stack x item → stack add an item to the stack
Pop() stack → item remove an item from the stack
Top() stark → item access item at the top of the stack
emptyStack() Stack → Boolean test if the stack is empty
fullStack() stark → Boolean test if the stack is full
c) Using a stack to evaluate postfix
Infix, prefix and posfix notation
In most programming languages, mathematical expressions are written with the operator
between the two operands, as in 1 + 2. This format is called infix. An alternative used by
some calculators is called postfix. In postfix, the operator follows the operands, as in 1 2 +.
Example: Let’s consider the following expression: x / y + 4*z
The prefix expression is: + / x y * 4 z
The postfix expression is: x y / 4 z * +
The reason postfix is sometimes useful is that there is a natural way to evaluate a postfix
expression using a stack:
1. Starting at the beginning of the expression, get one term (operator or operand) at a time.
o If the term is an operand, push it on the stack.
o If the term is an operator, pop two operands off the stack, perform the
operation on them, and push the result back on the stack.
2. When you get to the end of the expression, there should be exactly one operand left on
the stack. That operand is the result.
d) Implementation of a stack
Implementation of a stack can also be done using an array or a linked list
IV.2.3 Binary Tree
a) Abstract idea of a tree:
A tree is another data structure that you can use to store information.
Unlike stacks and queues, which are linear data structures, trees are
hierarchical data structures. Here is an example of a tree holding letters:
b) Tree Vocabulary
The left subtree of a node contains only nodes with keys less
than the node's key.
The right subtree of a node contains only nodes with keys
greater than the node's key.
The left and right subtree each must also be a binary search tree.
There must be no duplicate nodes.
The major advantage of binary search trees over other data structures is
that the related sorting algorithms and search algorithms such as in-order traversal can be
very efficient. Binary search trees are a fundamental data structure used to construct more
abstract data structures such as sets, multisets, and associative arrays.
f) Tree’s operations:
As mentioned, there are different kinds of trees (e.g., binary search trees, 2-3 trees, AVL
trees, tries, just to name a few). What operations we will need for a tree, and how they work,
depends on what kind of tree we use. However, there are some common operations we can
mention:
Add(): Places an element in the tree (where elements end up depends on the kind
of tree).
Remove(): Removes something from the tree (how the tree is reorganized after a
removal depends on the kind of tree).
IsMember: Reports whether some element is in the tree.
Other operations may be necessary, depending on the kind of tree we use.
g) Tree Traversals
A tree traversal is a specific order in which to trace the nodes of a tree. To perform a traversal
of a data structure, we use a method of visiting every node in some predetermined order.
Traversals can be used
to test data structures for equality
to display a data structure
to construct a data structure of a give size
to copy a data structure
There are 3 common tree traversals.
1. in-order: left, root, right
2. pre-order: root, left, right
3. post-order: left, right, root
In order to illustrate few of the binary tree traversals, let us
consider the below binary tree:
Therefore, the Preorder traversal of the above tree will outputs: 15, 5, 3, 12, 10, 6, 7, 13, 16,
20, 18, 23
2) Inorder traversal: To traverse a binary tree in Inorder, following operations are carried-
out
(i) Traverse the left most subtree starting at the left external node,
(ii) Visit the root, and
(iii) Traverse the right subtree starting at the left external node.
Therefore, the Inorder traversal of the above tree will outputs: 3, 5, 6, 7, 10, 12, 13, 15, 16,
18, 20, 23
3) Postorder traversal: To traverse a binary tree in Postorder, following operations are
carried-out
(i) Traverse all the left external nodes starting with the left most subtree which is then
followed by bubble-up all the internal nodes,
(ii) Traverse the right subtree starting at the left external node which is then followed
by bubble-up all the internal nodes, and
(iii) Visit the root.
Therefore, the Postorder traversal of the above tree will outputs: 3, 7, 6, 10, 13, 12, 5, 18, 23,
20, 16, 15
Another example
up of two parts: an array (the actual table where the data to be searched is stored) and a
mapping function, known as a hash function. The hash function provides a way for assigning
numbers to the input data such that the data can then be stored at the array index
corresponding to the assigned number.
Let's take a simple example. First, we start with a hash table array of strings (we'll use strings
as the data being stored and searched in this example). Let's say the hash table
Hash Table size is 10:
0 Next we need a hash function. There are many possible ways to construct a
1 hash function. Some hash functions will take an integer key and turn it into an
2 index. A common hash method is the division method.
3
4 Let's say you had the following numbers or keys that you Hash Table
5 wanted to map into an array of 10 elements: 123456, 123467, 0 123450
6 123450 1
7 2
To apply the division method, you could divide the number by
8 3
10 (or the maximum number of elements in the array) and use
9 4
the remainder (the modulo) as an index. The following would
5
result:
6 123456
123456 % 10 = 6 (the remainder is 6 when dividing by 10) 7 123467
123467 % 10 = 7 (the remainder is 7) 8
123450 % 10 = 0 (the remainder is 0) 9
These numbers would be inserted into the array at positions 6, 7, and 0 respectively. It might
look something like this:
Case of non-integer keys
Hash Table
For now, let's assume a simple hash function that takes a string as input. 0 Placide
The returned hash value will be the sum of the ASCII characters that 1
make up the string mod the size of the table: 2
Again, the idea is that we will insert items into the hash table using the 3 Spark
key and applying the hash function(s) to get the index. 4
5
Placide --> 690 % 10 --> 0
6
Spark --> 593 % 10 --> 3
Steve --> 519 % 10 --> 9 7
8
A problem occurs when two keys yield the same index. For Instance, say 9 Steve, John
we wanted to include: John --> 399 % 10 --> 9.
We have a collision because Steve is already stored at array index 9. We need a method to
resolve this. The resolution comes in how you create your hash table. In our case we will use
linked list to solve the problem.
Applications of a Hash Table
Hash tables are good in situations where you have enormous amounts of data from which you
would like to quickly search and retrieve information. A few typical hash table
implementations would be in the following situations:
For driver's license record's. With a hash table, you could quickly get information
about the driver (ie. name, address, age) given the licence number.
EXERCISES
1. For the following binary trees perform the following:Pre-order traversal, In-order
traversal and Post-order traversal
2. Give the inorder and postorder traversal for the tree whose preorder traversal is A B C - -
D - - E - F - -. The letters correspond to labeled internal nodes; the minus signs to
external nodes.
3. Give the preorder, inorder and postorder traversals of the following binary trees.
4. For each tree shown in the Figure below show the order in which the nodes are visited
during the following tree traversals:
a) preorder traversal,
b) inorder traversal (if defined),
c) postorder traversal, and
5. What values are dequeued by the 6. What are the value popped by the following
following algorithm? algorithm?
Enqueue(5) Push(5)
Enqueue(4) Push(4)
Enqueue(4) Push(4)
Dequeue() Pop()
Dequeue() Pop()
Enqueue() Push(3)
Pop ()
Pop ()
7. A letter means push and an asterisk means pop in the following sequence. Give the
sequence of values returned by the pop operations when this sequence of operations is
performed on an initially empty LIFO stack.
EAS*Y*QUE***ST***IO*N***
8. A letter means enqueue and an asterisk means dequeue in the following sequence. Give
the sequence of values returned by the get operation when this sequence of operations is
performed on an initially empty FIFO queue.
EAS*Y*QUE***ST***IO*N***
9. Introduce int variables x and y and int* pointer variables p and q. Set x to 2, y to 8, p to
the address of x, and q to the address of y. Then print the following information:
(1) The address of x and the value of x.
(2) The value of p and the value of *p.
(3) The address of y and the value of y.
(4) The value of q and the value of *q.
(5) The address of p (not its contents!).
(6) The address of q (not its contents!).