Singly Linked Lists Algorithm

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Singly Linked Lists

Introduction
A linked list is data structure based on connected nodes rather than a
backbone array. Each node in a linked list has an element (variable value)
and a pointer to the next node in the list (variable next). That is why is
called a linked list, because the nodes in it are “chained” to one another by
pointers. The linked list has only two components: an element counter to
keep count of how many elements are stored in the list (variable count), and a pointer to the first node
in the list (variable named head). From the first node we can reach any other node in the list.

Similar to a vector, the first node in a linked list is


called the head node or just the head, while the last
node is called the tail node or just the tail. Unlike a
vector, in a linked list there are just as many nodes as
there are elements, so memory is not wasted as much
as in a vector.1 Also, whenever a new element must be
added, only a single new node is created and inserted to the list in the specified position, no shifting of
elements, or growth operation takes place. Similarly, the remove operation just deletes the node
without needing to shift the elements.

As we have learned, a vector is an excellent data structure that, when used properly, provides a very
efficient way to store and access elements. Part of this efficiency is due to the locality of data in
memory (i.e., elements are stored in a contiguous chunk of memory as opposed to in fragmented parts),
as well as due to array-based indexing (i.e., random / immediate access). Nevertheless, those
advantages may be overshadowed by the need to find
and allocate a large chunk of memory that may not even
be used by the data structure in its entirety (e.g., as
when doubling the underlying array capacity to store just
one more element). In addition, some operations like
removing and inserting an element at the head or at the
middle of the vector not only are linear-time (O(n))
operations, but they require several write-access to
memory to shift elements around, or copy them into a
new array, which typically are more costly than read-
access.

1
A pointer is a value in memory that represents a memory address, and thus, it occupies a small, fixed amount of
space in memory no matter to which data type it points to. How much memory a pointer occupies exactly
depends on the computer architecture, but typically it occupies roughly 4 bytes (equivalent to an int) in a 32-bit
machine, or 8 bytes (equivalent to a long) in a 64-bits machine.
That said, linked lists are not necessarily better or more efficient than vectors, it all depends for what
they are used. For instance, linked lists, due to their linked nature, have sequential access to their
elements, that is, to reach the tenth (10th) element in a linked list you will have to traverse the list from
the first (1st) node, to the second (2nd) node, then to the third (3rd) node, and so on until you reach the
tenth (10th) node. Vectors, on the other hand, just jump to the desired slot instantly, no matter the
position. Sequential access is slow for fetching elements, and thus, it makes searching and sorting
algorithms VERY inefficient. Yet, linked lists are better suited for situations where memory is very
fragmented or limited, and when writing-access to memory is costly or inefficient (i.e., linked lists do not
shift elements around like vectors do, thus, they could be more efficient in that particular aspect).

Linked lists like vectors are data structures with a vast range of uses. They may be used to represent
bounded or unbounded queues, lists, or stacks, which in turn may be used to store items when we do
not know the number of items beforehand or we know the number of records will increase eventually.

Operations
Like a vector, the linked list data structure allows us to insert, remove, or fetch elements to and from the
list it represents. To do that, it provides a set of methods which allow us to insert, remove, or fetch an
element at the beginning (or head) of the list, or at the end (or tail) of the list. Optionally, an
implementation may include an operation to insert, remove, or fetch an element in the ith position (or at
the middle), or an operation that allows us to replace (set) an element at a given position rather than
insert a new one, to name a few. This document discusses the insert, remove and fetch operations on a
linked list.

Insertion
Insertion on a linked list occurs by creating a new node with the element value and inserting it in the list
in the specified position. For the operation to proceed, the position must be a valid position. A valid
position is that between the first element (0) or the head, and the position just past the last element
(count) or the tail.

If the position is valid, we traverse the list from node to node until we reach the node just before the
position we want. This node is pointed by the pointer ptr. Then, we make the new node point to the
node next to the node that ptr is pointing to, and then make the node pointed by ptr point to the new
node, effectively inserting the new node at the specified position in the list.

Those steps work for every position except when inserting at the head (or position 0). The reason why it
does not work on the head is because there is no node before the head node. So inserting at the head is
a special case that must be treated separately from the rest. When inserting a node at the head we just
make the new node point to the first node on the list, and then make the pointer to that first node
(called head) point to the new node. Remember that a linked list consists of two components an
element counter called count and a pointer to the first element called head? Well, that head pointer is
the one we change to point to the new node. This all may seem confusing at first but keep reading and
you will understand better.
After successfully inserting an element we should increment the variable count by one to indicate that
the list holds one more element than it did before. Doing this is also necessary to determine which
range of positions is valid later on.

Inserting at the Head


Inserting an element at the head of the list is the most simple and most efficient insert operation in a
linked list. Since there is a variable head pointing to the first node on the list, it takes a fixed amount of
steps to insert a node at that position no matter how many nodes there are on the list. Due to this fact,
inserting an element at the head of the list is a O(1) or constant-time operation on the average case.
(Read Appendix A to learn more about the Big-O notation.)

Imagine we have a list with just one element as


shown and that we want to add the element ‘H’ to
the head of the list.

The first step to insert a new element on the list is


to create a new node with the element ‘H’ as its
value, and its next pointer pointing to NULL.

The second step is to make the next pointer in the


new node to point to the node that the head
pointer is pointing to (i.e., the first node). At this
time both the new node next pointer and the head
pointer are pointing at the same node.

The third step is to make the head pointer to point


to the new node. After this step, the new node has
been effectively inserted at the head of the list.

As the last step, we increase the element count by


one to indicate that the list holds one more
element than it did before.
The previous example worked for a list with just one element. Would the steps to insert a new element
at the head be different if the list contained more than one element? Would they be different if the list
was empty or had no elements? Draw a linked list with more elements and follow the steps to see the
answer for yourself. Here is an example diagram for when the list is empty:

Initial state.

Step 1.

Step 2.

Step 3.

Final state.
(It works!)
Inserting at the Tail
Contrary to inserting an element at the head, inserting an element at the tail requires that we reach the
last (or tail node). Since access in a linked list is sequential, this means that we need to traverse the list
from the first node up to the last node. Due to this fact, inserting an element at the tail of the list is an
O(n) or linear-time operation on the average case. (Read Appendix A to learn more about the Big-O
notation.)

Imagine we have a list with three elements as shown


and that we want to add the element ‘H’ to the tail
of the list.

The first step to insert a new element on the list it is


to create a new node with the element ‘H’ as its
value, and its next pointer pointing to NULL.

The second step is to traverse the list from the first


node up to the last node. The pointer ptr is used to
mark our progress as we jump from node to node
beginning at the first node (or head node). So we
start with ptr = head.

On each iteration we say that ptr is going to point to


the next node pointed by ptr. Remember that a
node contains the element value and a pointer
called next that points to the next node on the list.
So each iteration we say that ptr = ptr->next.

To reach the last node we keep count of how many


jumps we do, and use the ptr pointer to keep track
of on which node on the list we are. For the last
node we must do count-1 jumps. In this example,
count is 3 so we make 3-1 = 2 jumps, from B to M,
and from M to Z to reach the tail.
When we reach the last node, we make the new
node point to where the last node is pointing. In this
example, it points to NULL.

NOTE: In some situations (e.g., sublists, other


implementations) the last node may not always
point to NULL.

Then, we make the node pointed by ptr (i.e., the last


node), to point to the new node, effectively making
the new node the tail node.

As the last step, we increase the element count by


one to indicate that the list holds one more element
than it did before.

Inserting at the Middle


Similar to inserting an element at the tail, inserting an element at the middle requires that we reach the
node just before the specified position (node at position i-1). For example, if we want to insert a node
at position 8, we must reach the node at position 7. Since access in a linked list is sequential, this means
that we need to traverse the list from the first node up to the node we want. Due to this fact, inserting
an element at the middle of the list is an O(n) or linear-time operation on the average case. (Read
Appendix A to learn more about the Big-O notation.)

Imagine we have a list with three elements as shown


and that we want to add the element ‘H’ at position
2 on the list.

The node at position 0 has the value B, the node at


position 1 has M, and the node at position 2 has Z.
The new node with value H should displace (not
replace) the node with value Z.
The first step to insert a new element on the list it is
to create a new node with the element ‘H’ as its
value, and its next pointer pointing to NULL.

The second step is to traverse the list from the first


node (position 0) up to the node at position 1 (2 – 1
= 1). The pointer ptr is used to mark our progress as
we jump from node to node beginning at the first
node (or head node). So we start with ptr = head.

On each iteration we jump to the next node until we


reach the node at position 1. In this example it only
took one jump, but it may take more for different
positions.

When ptr reaches the node we want, we make the


new node point to where the node pointed by ptr
was pointing. In this example, that node holds the
value Z.

Then, we make the node pointed by ptr (i.e., the


node at position 1) point to the new node,
effectively making the new node the node at
position 2 as we wanted.

As the last step, we increase the element count by


one to indicate that the list holds one more element
than it did before.
Insert Implementations

Inserting at the Middle

bool addMiddle(int index, T value)


{
if(index >= 0 && index <= count)
{
Node<T>* newNode = new (nothrow) Node<T>(value, NULL) ;

if(newNode == NULL) { return false ; }

if(index == 0)
{
newNode->next = head ;
head = newNode ;
}
else
{
Node<T>* ptr = getNode(index-1) ;
newNode->next = ptr->next ;
ptr->next = newNode ;
}

count++ ;
return true ;
}

return false ;
}

Inserting at the Head

bool addFirst(T value)


{
return addMiddle(0, value) ;
}

Inserting at the Tail

bool addLast(T value)


{
return addMiddle(count, value) ;
}
Traversing the List

Node<T>* getNode(int index)


{
Node<T>* ptr = head ;

for(int i=0 ; i<index ; i++)


{
ptr = ptr->next ;
}

return ptr ;
}

Removal
Removal on a linked list occurs by re-chaining the nodes and deleting the node in the specified position.
For the operation to proceed, the position must be a valid position. A valid position is that between the
first element (0) or the head, and the position of the last element (count-1) or the tail.

If the position is valid, we traverse the list from node to node until we reach the node just before the
position we want to remove (ptr). Then, we create a temporary pointer (tmp) to the node at the position
we want to remove (i.e., the node next to ptr), and then make node ptr point to the node next to the
one pointed by tmp, basically skipping the node tmp. After that, we delete the content of the node
pointed by tmp and return the allocated space to memory.

Those steps work for every position except when removing at the head (or position 0). The reason why
it does not work on the head is because there is no node before the head node. So, removing at the
head is a special case that must be treated separately from the rest. When removing a node at the head
we just create a temporary pointer (tmp) point to the first node on the list (just like the pointer head
does), then we make the head pointer point to the node next to the node pointed by tmp, basically
skipping the first node. After that, we delete the content of the node pointed by tmp (i.e., the ex-first
node) and return the allocated space to memory. It may seem confusing at first but keep reading and
you will understand better.

After successfully removing an element we should decrement the variable count by one to indicate that
the list holds one less element than it did before. Doing this is also necessary to determine which range
of positions is valid later on.
Removing at the Head
Removing an element at the head of the list is one of the most simple and most efficient operations in a
linked list. Since there is a variable head pointing to the first node on the list, it takes a fixed amount of
steps to remove a node at that position no matter how many nodes there are on the list. Due to this
fact, removing an element at the head of the list is a O(1) or constant-time operation on the average
case. (Read Appendix A to learn more about the Big-O notation.)

Imagine we have a list with three elements as


shown and that we want to remove the element
‘B’ at the head.

The first step is to create a temporary pointer


(tmp) to keep track of the node we are going to
delete.

The second step is to make the pointer head point


to the second node on the list effectively making it
the head node.

The third step is to delete the node pointed by tmp


to recover the memory allocated to it. This step is
very, VERY important, because if we omit it, the
memory occupied by that node is never recovered
and we end up with what is known as a memory
leak.

As the last step, we decrease the element count by


one to indicate that the list holds one less element
than it did before.
Removing at the Tail
Similar to inserting an element at the tail, removing an element at the tail requires that we reach the
second to last (or penultimate node). Since access in a linked list is sequential, this means that we need
to traverse the list from the first node up to the penultimate node. Due to this fact, removing an
element at the tail of the list is an O(n) or linear-time operation on the average case. (Read Appendix A
to learn more about the Big-O notation.)

Imagine we have a list with three elements as


shown and that we want to remove the element ‘Z’
at the tail.

The first step is to traverse the list from node to


node until we reach the node just before the tail.

The pointer ptr is used to mark our progress as we


jump from node to node beginning at the first
node (or head node). So we start with ptr = head.

On each iteration we say that ptr is going to point


to the next node pointed by ptr. So each iteration
we say that ptr = ptr->next.

The next step is to create a temporary pointer


(tmp) to keep track of the node we are going to
delete which is the node next to ptr.

Then, we make the node pointed by ptr to point to


where the node pointed by tmp is pointing. In this
example, it points to NULL.

NOTE: In some situations (e.g., sublists, other


implementations) the last node may not always
point to NULL.
The next step is to delete the node pointed by tmp
to recover the memory allocated to it. This step is
very, VERY important, because if we omit it, the
memory occupied by that node is never recovered
and we end up with what is known as a memory
leak.

As the last step, we decrease the element count by


one to indicate that the list holds one less element
than it did before.

Removing at the Middle


Similar to removing an element at the tail, removing an element at the middle requires that we reach
the node just before the specified position (node at position i-1). For example, if we want to remove a
node at position 8, we must reach the node at position 7. Since access in a linked list is sequential, this
means that we need to traverse the list from the first node up to the node we want. Due to this fact,
removing an element at the middle of the list is an O(n) or linear-time operation on the average case.
(Read Appendix A to learn more about the Big-O notation.)

Imagine we have a list with three elements as


shown and that we want to remove the element
‘M’ at position 1 on the list.

The node at position 0 has the value B, the node at


position 1 has M, and the node at position 2 has Z.

The first step is to traverse the list from node to


node until we reach the node just before the one
we want to remove.

The pointer ptr is used to mark our progress as we


jump from node to node beginning at the first
node (or head node). So we start with ptr = head.

The next step is to create a temporary pointer


(tmp) to keep track of the node we are going to
delete which is the node next to ptr.
Then, we make the node pointed by ptr to point to
where the node pointed by tmp is pointin,
effectively skipping the node tmp.

The next step is to delete the node pointed by tmp


to recover the memory allocated to it. This step is
very, VERY important, because if we omit it, the
memory occupied by that node is never recovered
and we end up with what is known as a memory
leak.

As the last step, we decrease the element count by


one to indicate that the list holds one less element
than it did before.

Remove Implementations

Remove at the Middle

bool removeMiddle(int index)


{
if(index >= 0 && index < count)
{
if(index == 0)
{
Node<T>* tmp = head ;
head = head->next ;
delete tmp ;
}
else
{
Node<T>* ptr = getNode(index-1) ;
Node<T>* tmp = ptr->next ;
ptr->next = tmp->next ;
delete tmp ;
}

count-- ;
return true ;
}

return false ;
}
Remove at the Head

bool removeFirst()
{
return removeMiddle(0) ;
}

Remove at the Tail

bool removeLast()
{
return removeMiddle(count-1) ;
}

Fetching
Access on a list is sequential. To fetch an element in a specific position we need to traverse the list from
the head node up to the node we want. Once we get the node at that position we extract the element
value from it. Of course, the position we fetch must be a valid position. A valid position is that between
the first element (at position 0) and the last element (at position count-1). Fetching the element at the
head always takes a fixed amount of steps so, it is an O(1) or constant-time operation. Fetching an
element at the tail requires to traverse all the nodes on the list so it is an O(n) or linear-time operation
on the average case, while fetching an element at the middle requires a similar process which in the
average case also makes it an O(n) or linear-time operation on the average case. (Read Appendix A to
learn more about the Big-O notation.)

Fetch Implementations

Fetching at the Middle

T getMiddle(int index)
{
if(index >= 0 && index < count)
{
Node<T>* ptr = getNode(index) ;

return ptr->value ;
}

return NULL ;
}
Fetching at the Head

T getFirst()
{
return getMiddle(0) ;
}

Fetching at the Tail

T getLast()
{
return getMiddle(count-1) ;
}

Advanced Topics
Iterators
Fetching the ith element in a linked list takes ith steps due to sequential access. This may not seem as
much for short lists but when the element count reaches large quantity it becomes very inefficient.
Furthermore, when iterating through a list as to display every element, or process each individual
element in whichever way, we may be tempted to use the following snippet of code:

Iterating through a linked list using an inefficient method

int count = list.getCount() ;

for(int i=0 ; i<count ; i++)


{
cout << list.getMiddle(i) << endl ;
}

At first sight it looks fine, but if we analyze it we can easily see that it becomes very, VERY inefficient to
do it this way. For example, imagine that the element count is 10. If we follow the code, it takes one
jump for getMidddle(0) to reach the first position, then it takes two jumps for getMiddle(1) for the
second position, then three jumps for getMiddle(2) to the third position, and so on. So, in total, it takes
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55 jumps to iterate through a list of only 10 elements!
In general, it would take [n (n + 1) / 2] steps for a list with n elements using that method, that is, O(n2) or
quadratic-time operation. In contrast, iterating through a vector of n elements using the same method
just takes n steps, that is, O(n) or linear-time operation, due to its random access property.

Iterators are an internal mechanism used by many data structures, particularly linked list, to optimize
iteration-based operations. Rather than iterate through a list using the getMiddle() operation,
whenever a data structure needs to iterate through its elements we obtain a list Iterator object, and use
its interface to iterate through the list more efficiently. The interface of an Iterator object consists of
two methods:

Iterator Interface

template <class T>

class Iterator<T>
{
public:

virtual bool hasNext() = 0 ;


virtual T getNext() = 0 ;
} ;

The following sample code does the same as the previous example, but using an Iterator:

Iterating through a linked list using an Iterator

Iterator<T>* i = list.getIterator() ;

while(i->hasNext())
{
cout << i->getNext() << endl ;
}

Linked List getIterator() Method Implementation

Iterator<T>* getIterator()
{
return new LinkedListIterator<T>(head) ;
}
In the case of a linked list the iterator will keep track of the last element it visited so consecutives calls to
the getNext() method will return the next value without starting from the first element like the
getMiddle() method does. Here is a simple implementation for the Iterator interface for linked lists:

Iterator Implementation for a Linked List

template <class T>

class LinkedListIterator<T> : public Iterator<T>


{
private:

Node<T>* ptr ;

public:

LinkedListIterator<T>(Node<T>* startNode)
{
ptr = startNode ;
}

bool hasNext()
{
return (ptr != NULL) ;
}

T getNext()
{
T value = ptr->value ;
ptr = ptr->next ;

return value ;
}
} ;

Optionally, an Iterator interface may include methods that allow inserting, and removing from and to
the current position of the iterator. Another common method could be one that allow us to traverse
the list bidirectionally (e.g., hasPrevious(), getPrevious()), but that is normally an exclusive operation of
double linked lists.
Exercises
1. Figure a way to know how many elements are in a linked list without using the variable count.

2. Unlike vectors, linked lists do not have an array, so indexing is not possible. Figure a way to optimize
the performance of a linked list to access the last node in O(1) or constant-time? Figure a way to
optimize the performance of a linked list to reach the ith position in no more than n/2 steps? Where
n is the number of elements in the list.

3. In future documents you are going to learn that the circular linked list and the double linked list
implementations are possible answers for the previous exercise. Can you think of other possible
answers? Explain your answer(s) and implement your solution(s).

4. Due to the fact that the head node has no previous node, the standard algorithm to insert or
remove a node at the head of a linked list does not work properly, so, we must treat this as special
case like the code snippet shows. Is there a way to modify the code so there is no need to treat
inserting or removing a node at the head of the linked list like a special case? Explain your answer(s)
and implement your solution(s).

References
 D.S. Malik. C++ Programming: Program Design Including Data Structures, 5th Edition. Course
Technology, 2010. ISBN-10: 0538798092 ISBN-13: 978-0538798099
 R. Lafore, Data Structures and Algorithms in Java, 2nd Edition. Sams, 2002 [Classic]. ISBN-10:
0672324539, ISBN-13: 978-0672324536
 T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd Edition. The
MIT Press, 2009. ISBN-10: 0262033844, ISBN-13: 978-0262033848
 From Wikipedia, the free encyclopedia:
o http://en.wikipedia.org/wiki/Linked_list
o http://en.wikipedia.org/wiki/Queue_(data_structure)
o http://en.wikipedia.org/wiki/Stack_(data_structure)
o http://en.wikipedia.org/wiki/Iterator

Created by: Henry F. Bruckman Vargas


Last Revised: 2013/09/07

You might also like