Unit 1
Unit 1
DATA STRUCTURES
Data may be organized in many different ways. The logical or mathematical model of a
particular organization of data is called a data structure.
1. Primitive data Structures: Primitive data structures are the fundamental data types which
are supported by a programming language. Basic data types such as integer, real, character
and Boolean are known as Primitive data Structures. These data types consists of characters
that cannot be divided and hence they also called simple data types.
2. Non- Primitive data Structures: Non-primitive data structures are those data structures
which are created using primitive data structures. Examples of non-primitive data structures
is the processing of complex numbers, linked lists, stacks, trees, and graphs.
POINTERS
A pointer is a variable which contains the address in memory of another variable.
The two most important operator used with the pointer type are
& - The unary operator & which gives the address of a variable
* - The indirection or dereference operator * gives the content of the object pointed to
by a pointer.
Declaration
int i, *pi;
pi = &i;
Here, &i returns the address of i and assigns it as the value of pi
3. Pointer is dangerous when use of explicit type casts in converting between pointer types
Ex: pi = malloc (sizeof (int));
pf = (float*) pi;
4. In some system, pointers have the same size as type int, since int is the default type specifier,
some programmers omit the return type when defining a function. The return type defaults to
int which can later be interpreted as a pointer. This has proven to be a dangerous practice on
some computer and the programmer is made to define explicit types for functions.
1. malloc( ):
The function malloc allocates a user- specified amount of memory and a pointer to the start of
the allocated memory is returned.
If there is insufficient memory to make the allocation, the returned value is NULL.
Syntax:
Where,
data_ty x= (data_type *) malloc(size);
pe *x;
x is a pointer variable of data_type
size is the number of bytes
2. calloc( ):
The function calloc allocates a user- specified amount of memory and initializes the allocated
memory to 0 and a pointer to the start of the allocated memory is returned.
If there is insufficient memory to make the allocation, the returned value is NULL.
Syntax:
data_type *x;
x= (data_type *) calloc(n, size);
Where,
x is a pointer variable of type int
n is the number of block to be allocated
size is the number of bytes in each block
Ex: int *x
x= calloc (10, sizeof(int));
The above example is used to define a one-dimensional array of integers. The capacity of this
array is n=10 and x [0: n-1] (x [0, 9]) are initially 0
Macro CALLOC
#define CALLOC (p, n, s)\
if ( ! ((p) = calloc (n, s)))\
{\
fprintf(stderr, “Insuffiient memory”);\
exit(EXIT_FAILURE);\
}\
3. realloc( ):
Before using the realloc( ) function, the memory should have been allocated using
malloc( ) or calloc( ) functions.
The function relloc( ) resizes memory previously allocated by either mallor or calloc,
which means, the size of the memory changes by extending or deleting the allocated
memory.
If the existing allocated memory need to extend, the pointer value will not change.
If the existing allocated memory cannot be extended, the function allocates a new block
and copies the contents of existing memory block into new memory block and then
deletes the old memory block.
When realloc is able to do the resizing, it returns a pointer to the start of the new block
and when it is unable to do the resizing, the old block is unchanged and the function
returns the value NULL
Syntax:
data_type *x;
x= (data_type *) realloc(p, s );
The size of the memory block pointed at by p changes to S. When s > p the additional s-p
memory block have been extended and when s < p, then p-s bytes of the old block are freed.
Macro REALLOC
#define REALLOC(p,S)\
if (!((p) = realloc(p,s))) \
{\
fprintf(stderr, "Insufficient memory");\
exit(EXIT_FAILURE);\
}\
4. free( )
Dynamically allocated memory with either malloc( ) or calloc ( ) does not return on its own.
The programmer must use free( ) explicitly to release space.
Syntax:
free(ptr);
This statement cause the space in memory pointer by ptr to be deallocated
ALGORITHM SPECIFICATION
Algorithm specification in data structure
Definition
An algorithm is a finite set of instructions that, if followed, accomplishes a
particular task. In addition, all algorithms must satisfy the following criteria:
(1) Input. There are zero or more quantities that are externally supplied.
(2) Output. At least one quantity is produced.
(3) Definiteness. Each instruction is clear and unambiguous.
(4) Finiteness. If we trace out the instructions of an algorithm, then for all cases,
the algorithm terminates after a finite number of steps.
(5) Effectiveness. Every instruction must be basic enough to be carried out, in principle, by a person
using only pencil and paper. It is not enough that each operation be definite as in (3); it also must be
feasible.
Describing Algorithms
Natural language
English, Chinese
Instructions must be definite and effectiveness
Graphic representation
Flowchart
work well only if the algorithm is small and simple
Pseudo language
Readable
Instructions must be definite and effectiveness
Combining English and C++
In this text
Translating a Problem into an Algorithm
Problem
Devise a program that sorts a set of n>= 1 integers
Step I - Concept
From those integers that are currently unsorted, find the smallest and place it
next in the sorted list
Step II - Algorithm
for (i= 0; i< n; i++){
Examine list[i] to list[n-1] and suppose that the smallest integer is list[min];
Interchange list[i] and list[min];
}
Algorithm for Selection Sort
Initialize the Array:
Define an array arr of size 10 and initialize it with specific integer values.
Set the Size of the Array:
Define an integer n and set it to the size of the array (10 in this case).
Outer Loop:
Iterate over the array with an index i from 0 to n-2:
Set position to i (assume the current index is the minimum).
Inner Loop:
For each i, iterate with an index j from i+1 to n-1:
If arr[position] is greater than arr[j], update position to j (find the index of the smallest
element).
Swap Elements:
After the inner loop, check if position has changed:
If it has changed (i.e., a smaller element was found), swap arr[i] with arr[position].
Print the Sorted Array:
After sorting is complete, iterate through the array and print each element.
DATA ABSTRACTION
The process of separating logical properties of data from implementation details of data.
Data Type
• A data type is a collection of objects and a set of operations that act on those objects.
ARRAYS
An Array is defined as, an ordered set of similar data items. All the data items of an
array are stored in consecutive memory locations.
The data items of an array are of same type and each data items can be accessed using
the same name but different index value.
An array is a set of pairs, <index, value >, such that each index has a value associated
with it. It can be called as corresponding or a mapping
Ex: <index, value>
< 0 , 25 > list[0]=25
< 1 , 15 > list[1]=15
< 2 , 20 > list[2]=20
< 3 , 17 > list[3]=17
< 4 , 35 > list[4]=35
Here, list is the name of array. By using, list [0] to list [4] the data items in list can be
accessed.
Array in C
Declaration: A one dimensional array in C is declared by adding brackets to the name of a
variable.
Ex: int list[5], *plist[5];
The array list[5], defines 5 integers and in C array start at index 0, so list[0], list[1],
list[2], list[3], list[4] are the names of five array elements which contains an integer
value.
The array *plist[5], defines an array of 5 pointers to integers. Where, plist[0], plist[1],
plist[2], plist[3], plist[4] are the five array elements which contains a pointer to an
integer.
Implementation:
When the complier encounters an array declaration, list[5], it allocates five consecutive
memory locations. Each memory is enough large to hold a single integer.
The address of first element of an array is called Base Address. Ex: For list[5] the
address of list[0] is called the base address.
If the memory address of list[i] need to compute by the compiler, then the size of the
int would get by sizeof (int), then memory address of list[i] is as follows:
Output:
Address Content
12244868 0
12344872 1
12344876 2
12344880 3
12344884 4
DYNAMICALLY ALLOCATED ARRAYS
While writing computer programs, if finds ourselves in a situation where we cannot determine
how large an array to use, then a good solution to this problem is to defer this decision to run
time and allocate the array when we have a good estimate of the required array size.
Example:
int i, n, *list;
printf(“Enter the number of numbers to generate:”);
scanf(“%d”, &n);
if(n<1)
{
fprintf (stderr, “Improper value of n \n”);
exit(EXIT_FAILURE);
}
MALLOC (list, n*sizeof(int));
The programs fails only when n<1 or insufficient memory to hold the list of numbers that are
to be sorted.
Array-of-arrays representation
C find element x[i][j] by first accessing the pointer in x[i].
Where x[i] = α+ i* sizeof(int), which give the address of the zeroth element of row i of the
array.
Then adding j*sizeof(int) to this pointer ( x[i] ) , the address of the [j]th element of row i is
determined.
x[i] = α+ i* sizeof(int)
x[j] = α+ j* sizeof(int)
x[i][j] = x[i]+ i* sizeof(int)
int **myArray;
myArray = make2dArray(5,10);
myArray[2][4]=6;
The second line allocates memory for a 5 by 10 two-dimensional array of integers and the
third line assigns the value 6 to the [2][4] element of this array.
STRUCTURES
In C, a way to group data that permits the data to vary in type. This mechanism is called the
structure, for short struct.
A structure (a record) is a collection of data items, where each item is identified as to its type
and name.
Syntax: struct
{ data_type member 1;
data_type member 2;
………………………
………………………
data_type member n;
} variable_name;
Ex: struct {
char name[10];
int age;
float salary;
} Person;
The above example creates a structure and variable name is Person and that has three fields:
name = a name that is a character array
age = an integer value representing the age of the person
salary = a float value representing the salary of the individual
Ex: strcpy(Person.name,“james”);
Person.age = 10;
Person.salary = 35000;
Type-Defined Structure
The structure definition associated with keyword typedef is called Type-Defined Structure.
Syntax 1: typedef struct
{
data_type member 1;
data_type member 2;
………………………
………………………
data_type member n;
}Type_name;
Where,
typedef is the keyword used at the beginning of the definition and by using typedef
user defined data type can be obtained.
struct is the keyword which tells structure is defined to the complier
The members are declare with their data_type
Type_name is not a variable, it is user defined data_type.
In above example, humanBeing is the name of the type and it is a user defined data type.
This statement declares the variable person1 and person2 are of type humanBeing.
Structure Operation
The various operations can be performed on structures and structure members.
1. The structures are defined separately and a variable of structure type is declared inside
the definition of another structure. The accessing of the variable of a structure type that are
nested inside another structure in the same way as accessing other member of that structure
Example: The following example shows two structures, where both the structure are defined
separately.
typedef struct {
int month;
int day;
int year;
}date;
typedef struct {
char name[10];
int age;
float salary;
date dob;
} humanBeing;
humanBeing person1;
A person born on February 11, 1944, would have the values for the date struct set as:
person1.dob.month = 2;
person1.dob.day = 11;
person1.dob.year = 1944;
2. The complete definition of a structure is placed inside the definition of another structure.
Example:
typedef struct {
char name[10];
int age;
float salary;
struct {
int month;
int day;
int year;
} date;
} humanBeing;
SELF-REFERENTIAL STRUCTURES
A self-referential structure is one in which one or more of its components is a pointer to itself.
Self-referential structures usually require dynamic storage management routines (malloc and
free) to explicitly obtain and release memory.
Consider as an example:
typedef struct {
char data;
struct list *link ;
} list;
Each instance of the structure list will have two components data and link.
Data: is a single character,
Link: link is a pointer to a list structure. The value of link is either the address in
memory of an instance of list or the null pointer.
Consider these statements, which create three structures and assign values to their respective
fields:
Structures item1, item2 and item3 each contain the data item a, b, and c respectively, and the
null pointer. These structures can be attached together by replacing the null link field in item
2 with one that points to item 3 and by replacing the null link field in item 1 with one that points
to item 2.
item1.link = &item2;
item2.1ink = &item3;
Unions:
A union is similar to a structure, it is collection of data similar data type or dissimilar.
Syntax: union{
data_type member 1;
data_type member 2;
………………………
………………………
data_type member n;
}variable_name;
Example:
union{
int children;
int beard;
} u;
Union Declaration:
A union declaration is similar to a structure, but the fields of a union must share their memory
space. This means that only one field of the union is "active" at any given time.
union{
char name;
int age;
float salary;
}u;
The major difference between a union and a structure is that unlike structure members which
are stored in separate memory locations, all the members of union must share the same memory
space. This means that only one field of the union is "active" at any given time.
Example:
#include <stdio.h>
union job {
char name[32];
float salary;
int worker_no;
}u;
int main( ){
printf("Enter name:\n");
scanf("%s", &u.name);
printf("Enter salary: \n");
scanf("%f", &u.salary);
printf("Displaying\n Name :%s\n",u.name);
printf("Salary: %.1f",u.salary);
return 0;
}
POLYNOMIALS
What is a polynomial?
“A polynomial is a sum of terms, where each term has a form axe , where x is the variable, a is
the coefficient and e is the exponent.”
The largest (or leading) exponent of a polynomial is called its degree. Coefficients that are
zero are not displayed. The term with exponent equal to zero does not show the variable since
x raised to a power of zero is 1.
Now if a is a variable and is of type polynomial and n < MAX_DEGREE, the polynomial
A(x) = Σai xi would be represented as:
a.degree = n
a.coef[i] = an-i , 0 ≤ i ≤ n
In this representation, the coefficients is stored in order of decreasing exponents, such that
a.coef [i] is the coefficient of xn-i provided a term with exponent n-i exists;
Otherwise, a.coef [i] =0. This representation leads to very simple algorithms for most of the
operations, it wastes a lot of space.
To preserve space an alternate representation that uses only one global array, terms to store
all polynomials.
Polynomial Addition
C function is written that adds two polynomials, A and B to obtain D =A + B.
To produce D (x), padd( ) is used to add A (x) and B (x) term by term. Starting at
position avail, attach( ) which places the terms of D into the array, terms.
If there is not enough space in terms to accommodate D, an error message is printed to
the standard error device & exits the program with an error condition
void padd(int startA, int finishA, int startB, int finishB, int *startD,int *finishD)
{ /* add A(x) and B(x) to obtain D(x) */
float coefficient;
*startD = avail;
while (startA <= finishA && startB <= finishB)
switch(COMPARE(terms[startA].expon, terms[startB].expon))
{
case -1: /* a expon < b expon */
attach (terms [startB].coef, terms[startB].expon);
startB++;
break;
if (coefficient)
attach (coefficient, terms[startA].expon);
startA++;
startB++;
break;
case 1: /* a expon > b expon */
attach (terms [startA].coef, terms[startA].expon);
startA++;
}
Analysis of padd( ):
The number of non-zero terms in A and B is the most important factors in analyzing the time
complexity.
Let m and n be the number of non-zero terms in A and B, If m >0 and n > 0, the while loop is
entered. Each iteration of the loop requires O(1) time. At each iteration, the value of startA or
startB or both is incremented. The iteration terminates when either startA or startB exceeds
finishA or finishB.
The number of iterations is bounded by m + n -1
𝑛 𝑛
A(x) = ∑𝑖=0 𝑥 2𝑖 and B(x) = ∑𝑖=0 𝑥 2𝑖+1
The time for the remaining two for loops is bounded by O(n + m) because we cannot iterate
the first loop more than m times and the second more than n times. So, the asymptotic
computing time of this algorithm is O(n +m).
SPARSE MATRICES
A matrix contains m rows and n columns of elements as illustrated in below figures. In this
figure, the elements are numbers. The first matrix has five rows and three columns and the
second has six rows and six columns. We write m x n (read "m by n") to designate a matrix
with m rows and n columns. The total number of elements in such a matrix is mn. If m equals
n, the matrix is square.
Important Note:
A sparse matrix can be represented in 1-Dimension, 2- Dimension and 3- Dimensional array.
When a sparse matrix is represented as a two-dimensional array as shown in
Figure B, more space is wasted.
Example: consider the space requirements necessary to store a 1000 x 1000 matrix that has
only 2000 non-zero elements. The corresponding two-dimensional array requires space for
1,000,000 elements. The better choice is by using a representation in which only the nonzero
elements are stored.
Sparse Matrix Representation
An element within a matrix can characterize by using the triple <row,col,value> This
means that, an array of triples is used to represent a sparse matrix.
Organize the triples so that the row indices are in ascending order.
The operations should terminate, so we must know the number of rows and columns,
and the number of nonzero elements in the matrix.
The below figure shows the representation of matrix in the array “a” a[0].row contains
the number of rows, a[0].col contains the number of columns and a[0].value contains
the total number of nonzero entries.
Positions 1 through 8 store the triples representing the nonzero entries. The row index is
in the field row, the column index is in the field col, and the value is in the field value.
The triples are ordered by row and within rows by columns.
a[0] 6 6 8 b[0] 6 6 8
[1] 0 0 15 [1] 0 0 15
[2] 0 3 22 [2] 0 4 91
[3] 0 5 -15 [3] 1 1 11
[4] 1 1 11 [4] 2 1 3
[5] 1 2 3 [5] 2 5 28
[6] 2 3 -6 [6] 3 0 22
[7] 4 0 91 [7] 3 2 -6
[8] 5 2 28 [8] 5 0 -15
Fig (a): Sparse matrix stored as triple Fig (b): Transpose matrix stored as triple
Transposing a Matrix
To transpose a matrix, interchange the rows and columns. This means that each element
a[i][j] in the original matrix becomes element a[j][i] in the transpose matrix.
If we process the original matrix by the row indices it is difficult to know exactly where to
place element <j, i, value> in the transpose matrix until we processed all the elements that
precede it.
This can be avoided by using the column indices to determine the placement of elements in
the transpose matrix. This suggests the following algorithm:
The columns within each row of the transpose matrix will be arranged in ascending order.
void transpose (term a[], term b[])
{ /* b is set to the transpose of a */
int n, i, j, currentb;
n = a[0].value; /* total number of elements */
b[0].row = a[0].col; /* rows in b = columns in a */
b[0].col = a[0].row; /* columns in b = rows in a */
b[0].value = n;
if (n > 0)
{ currentb = 1;
for (i = 0; i < a[O].col; i++)
for (j= 1; j<=n; j++)
if (a[j].col == i)
{
b[currentb].row = a[j].col;
b[currentb].col = a[j].row;
b[currentb].value = a[j].value;
currentb++;
}
}
}
Transpose of a sparse matrix
STRING
BASIC TERMINOLOGY:
Each programming languages contains a character set that is used to communicate with the
computer. The character set include the following:
Alphabet: ABCDEFGHIJKLMNOPQRSTUVW XYZ
Digits: 0123456789
Special characters: + - / * ( ) , . $ = „ _ (Blank space)
Concatenation: Let S1 and S2 be the strings. The string consisting of the characters of S1
followed by the character S2 is called Concatenation of S1 and S2.
Ex: „THE‟ // „END‟ = „THEEND‟
„THE‟ // „ ‟ // „END‟ = „THE END‟
Substring: A string Y is called substring of a string S if there exist string X and Z such that
S = X // Y // Z
If X is an empty string, then Y is called an Initial substring of S, and Z is an empty string then
Y is called a terminal substring of S.
Ex: „BE OR NOT‟ is a substring of „TO BE OR NOT TO BE‟
„THE‟ is an initial substring of „THE END‟
STRINGS IN C
In C, the strings are represented as character arrays terminated with the null character \0.
Declaration 1:
#define MAX_SIZE 100 /* maximum size of string */
char s[MAX_SIZE] = {“dog”};
char t[MAX_SIZE] = {“house”};
s[0] s[1] s[2] s[3] t[0] t[1] t[2] t[3] t[4] t[4]
d o g \0 h o u s e \0
The above figure shows how these strings would be represented internally in memory.
Declaration 2:
char s[ ] = {“dog”};
char t[ ] = {“house”};
Using these declarations, the C compiler will allocate just enough space to hold each word
including the null character.
There are several useful operations we could specify for strings. Some of these operations are
similar to those required for other ADTs: creating a new empty string, reading a string or printing
it out, appending two strings together (called concatenation), or copying a string.
However, there are other operations that are unique to our new ADT, including comparing
strings, inserting a substring into a string, removing a substring from a string, or finding a pattern
in a string. Which contains our specification of the string ADT.
Now suppose that the first string contains "amobile" and the second contains "uto" .
We want to insert "uto" starting at position 1 of the first string, thereby producing the word "automobile."
Pattern Matching
Now let us develop an algorithm for a more sophisticated application of strings. Assume that we have two strings, string
and pat, where pat is a pattern to be searched for in string.
The easiest way to determine if pat is in string is to use the built-in function strstr.
If we have the following declarations:
Simulation of nfind
Suppose pat = "aab" and string = "ababbaabaa." Figure shows how nfind compares the characters from pat with those of
string.
The end of the string and pat arrays are held by lasts and lastp, respectively.
First nfind compares string [endmatch} and pat [lastp}.
If they match, nfind uses i and 7 to move through the two strings until a mismatch occurs or until all of pat has been
matched.
The variable start is used to reset i if a mismatch occurs.
Knuth, Morris, and Pratt(KMP) have developed a pattern matching algorithm that works
in this way and has linear complexity.
Using their example, suppose
pat = 'abcabcacab'
Sxsc AQWEQ