Computer Science Data Structure Nodes Reference
Computer Science Data Structure Nodes Reference
Computer Science Data Structure Nodes Reference
computer science, a linked list (or more clearly, "singly linked list") is a data structure that
consists of a sequence of nodes each of which contains a reference (i.e., a link) to the next
node in the sequence.
A linked list whose nodes contain two fields: an integer value and a link to the next node
Linked lists are among the simplest and most common data structures. They can be used to
implement several other common abstract data structures, including stacks, queues,associative
arrays, and symbolic expressions, though it is not uncommon to implement the other data
structures directly without using a list as the basis of implementation.
The principal benefit of a linked list over a conventional array is that the list elements can easily
be added or removed without reallocation or reorganization of the entire structure because the
data items need not be stored contiguously in memory or on disk. Linked lists allow insertion
and removal of nodes at any point in the list, and can do so with a constant number of
operations if the link previous to the link being added or removed is maintained during list
traversal.
On the other hand, simple linked lists by themselves do not allow random access to the data
other than the first node's data, or any form of efficient indexing. Thus, many basic operations
— such as obtaining the last node of the list (assuming that the last node is not maintained as
separate node reference in the list structure), or finding a node that contains a given datum, or
locating the place where a new node should be inserted — may require scanning most or all of
the list elements.
Contents
[hide]
1 History
2 Basic concepts and nomenclature
o 2.1 Linear and circular lists
o 2.2 Singly, doubly, and multiply
linked lists
o 2.3 Sentinel nodes
o 2.4 Empty lists
o 2.5 Hash linking
o 2.6 List handles
o 2.7 Combining alternatives
3 Tradeoffs
o 3.1 Linked lists vs. dynamic arrays
o 3.2 Singly linked linear lists vs.
other lists
o 3.3 Doubly linked vs. singly
linked
o 3.4 Circularly linked vs. linearly
linked
o 3.5 Using sentinel nodes
4.2.1 Algorithms
5 Linked lists using arrays of nodes
6 Language support
7 Internal and external storage
o 7.1 Example of internal and
external storage
8 Speeding up search
9 Related data structures
10 Notes
11 References
12 External links
[edit]History
LISP, standing for list processor, was created by John McCarthy in 1958 while he was at MIT
and in 1960 he published its design in a paper in the Communications of the ACM, entitled
"Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I". One
of LISP's major data structures is the linked list. By the early 1960s, the utility of both linked lists
and languages which use these structures as their primary data representation was well
established. Bert Green of the MIT Lincoln Laboratory published a review article entitled
"Computer languages for symbol manipulation" in IRE Transactions on Human Factors in
Electronics in March 1961 which summarized the advantages of the linked list approach. A later
review article, "A Comparison of list-processing computer languages" by Bobrow and Raphael,
appeared in Communications of the ACM in April 1964.
The TSS/360 operating system, developed by IBM for the System 360/370 machines, used a
double linked list for their file system catalog. The directory structure was similar to Unix, where
a directory could contain files and/or other directories and extend to any depth. A utility flea was
created to fix file system problems after a crash, since modified portions of the file catalog were
sometimes in memory when a crash occurred. Problems were detected by comparing the
forward and backward links for consistency. If a forward link was corrupt, then if a backward link
to the infected node was found, the forward link was set to the node with the backward link. A
humorous comment in the source code where this utility was invoked stated "Everyone knows a
flea collar gets rid of bugs in cats".
The field of each node that contains the address of the next node is usually called
the next link or next pointer. The remaining fields are known as
the data, information, value,cargo, or payload fields.
The head of a list is its first node, and the tail is the last node (or a pointer thereto). In Lisp and
some derived languages, the tail may be called the CDR (pronounced could-er) of the list, while
the payload of the head node may be called the CAR.
A singly linked list whose nodes contain two fields: an integer value and a link to the next node
In a doubly linked list, each node contains, besides the next-node link, a second link field
pointing to the previous node in the sequence. The two links may be called forward(s)
andbackwards, or next and prev(ious).
A doubly linked list whose nodes contain three fields: an integer value, the link forward to the next node, and the link backward to the
previous node
The technique known as XOR-linking allows a doubly linked list to be implemented using a
single link field in each node. However, this technique requires the ability to do bit operations on
addresses, and therefore may not be available in some high-level languages.
In a multiply linked list, each node contains two or more link fields, each field being used to
connect the same set of data records in a different order (e.g., by name, by department, by date
of birth, etc.). (While doubly linked lists can be seen as special cases of multiply linked list, the
fact that the two orders are opposite to each other leads to simpler and more efficient
algorithms, so they are usually treated as a separate case.)
In the case of a doubly circular linked list, the only change that occurs is the end, or "tail" of the
said list is linked back to the front, "head", of the list and vice versa.
[edit]Sentinel nodes
Main article: Sentinel node
[edit]Empty lists
An empty list is a list that contains no data records. This is usually the same as saying that it
has zero nodes. If sentinel nodes are being used, the list is usually said to be empty when it has
only sentinel nodes.
[edit]Hash linking
The link fields need not be physically part of the nodes. If the data records are stored in an array
and referenced by their indices, the link field may be stored in a separate array with the same
indices as the data records.
[edit]List handles
Since a reference to the first node gives access to the whole list, that reference is often called
the address, pointer, or handle of the latter. Algorithms that manipulate linked lists usually get
such handles to the input lists and return the handles to the resulting lists. In fact, in the context
of such algorithms, the word "list" often means "list handle". In some situations, however, it may
be convenient to refer to a list by a handle that consists of two links, pointing to its first and last
nodes.
[edit]Combining alternatives
The alternatives listed above may be arbitrarily combined in almost every way, so one may have
circular doubly linked lists without sentinels, circular singly linked lists with sentinels, etc.
[edit]Tradeoffs
As with most choices in computer programming and design, no method is well suited to all
circumstances. A linked list data structure might work well in one case, but cause problems in
another. This is a list of some of the common tradeoffs involving linked list structures.
Insertion/deletion at
Θ(1) N/A Θ(n)
beginning
search time +
Insertion/deletion in middle N/A Θ(n)
Θ(1)[1]
A dynamic array is a data structure that allocates all elements contiguously in memory, and
keeps a count of the current number of elements. If the space reserved for the dynamic array is
exceeded, it is reallocated and (possibly) copied, an expensive operation.
Linked lists have several advantages over dynamic arrays. Insertion or deletion of an element at
a specific point of a list, assuming that we have a pointer to the node (before the one to be
removed, or before the insertion point) already, is a constant-time operation, whereas insertion
in a dynamic array at random locations will require moving half of the elements on average, and
all the elements in the worst case. While one can "delete" an element from an array in constant
time by somehow marking its slot as "vacant", this causes fragmentationthat impedes the
performance of iteration.
Moreover, arbitrarily many elements may be inserted into a linked list, limited only by the total
memory available; while a dynamic array will eventually fill up its underlying array data structure
and have to reallocate — an expensive operation (although the cost of the reallocation can be
averaged over insertions, and the cost of insertions would still be amortized O(1), the same as
for linked lists), one that may not even be possible if memory is fragmented. Similarly, an array
from which many elements are removed may have to be resized in order to avoid wasting too
much space.
On the other hand, dynamic arrays (as well as fixed-size array data structures) allow constant-
time random access, while linked lists allow only sequential access to elements. Singly linked
lists, in fact, can only be traversed in one direction. This makes linked lists unsuitable for
applications where it's useful to look up an element by its index quickly, such asheapsort.
Sequential access on arrays and dynamic arrays is also faster than on linked lists on many
machines, because they have optimal locality of reference and thus make good use of data
caching.
Another disadvantage of linked lists is the extra storage needed for references, which often
makes them impractical for lists of small data items such as characters or boolean values,
because the storage overhead for the links may exceed by a factor of two or more the size of
the data. In contrast, a dynamic array requires only the space for the data itself (and a very
small amount of control data).[note 1] It can also be slow, and with a naïve allocator, wasteful, to
allocate memory separately for each new element, a problem generally solved usingmemory
pools.
Some hybrid solutions try to combine the advantages of the two representations. Unrolled linked
lists store several elements in each list node, increasing cache performance while decreasing
memory overhead for references. CDR coding does both these as well, by replacing references
with the actual data referenced, which extends off the end of the referencing record.
A good example that highlights the pros and cons of using dynamic arrays vs. linked lists is by
implementing a program that resolves the Josephus problem. The Josephus problem is an
election method that works by having a group of people stand in a circle. Starting at a
predetermined person, you count around the circle n times. Once you reach the nth person,
take them out of the circle and have the members close the circle. Then count around the circle
the same n times and repeat the process, until only one person is left. That person wins the
election. This shows the strengths and weaknesses of a linked list vs. a dynamic array, because
if you view the people as connected nodes in a circular linked list then it shows how easily the
linked list is able to delete nodes (as it only has to rearrange the links to the different nodes).
However, the linked list will be poor at finding the next person to remove and will need to search
through the list until it finds that person. A dynamic array, on the other hand, will be poor at
deleting nodes (or elements) as it cannot remove one node without individually shifting all the
elements up the list by one. However, it is exceptionally easy to find the nth person in the circle
by directly referencing them by their position in the array.
The list ranking problem concerns the efficient conversion of a linked list representation into an
array. Although trivial for a conventional computer, solving this problem by a parallel algorithm is
complicated and has been the subject of much research.
For one thing, a singly linked linear list is a recursive data structure, because it contains a
pointer to a smaller object of the same type. For that reason, many operations on singly linked
linear lists (such as merging two lists, or enumerating the elements in reverse order) often have
very simple recursive algorithms, much simpler than any solution using iterative commands.
While one can adapt those recursive solutions for doubly linked and circularly linked lists, the
procedures generally need extra arguments and more complicated base cases.
Linear singly linked lists also allow tail-sharing, the use of a common final portion of sub-list as
the terminal portion of two different lists. In particular, if a new node is added at the beginning of
a list, the former list remains available as the tail of the new one — a simple example of
a persistent data structure. Again, this is not true with the other variants: a node may never
belong to two different circular or doubly linked lists.
In particular, end-sentinel nodes can be shared among singly linked non-circular lists. One may
even use the same end-sentinel node for every such list. In Lisp, for example, every proper list
ends with a link to a special node, denoted by nil or (), whose CAR and CDR links point to itself.
Thus a Lisp procedure can safely take the CAR or CDR of any list.
Indeed, the advantages of the fancy variants are often limited to the complexity of the
algorithms, not in their efficiency. A circular list, in particular, can usually be emulated by a linear
list together with two variables that point to the first and last nodes, at no extra cost.
With a circular list, a pointer to the last node gives easy access also to the first node, by
following one link. Thus, in applications that require access to both ends of the list (e.g., in the
implementation of a queue), a circular structure allows one to handle the structure by a single
pointer, instead of two.
A circular list can be split into two circular lists, in constant time, by giving the addresses of the
last node of each piece. The operation consists in swapping the contents of the link fields of
those two nodes. Applying the same operation to any two nodes in two distinct lists joins the two
list into one. This property greatly simplifies some algorithms and data structures, such as
the quad-edge and face-edge.
The simplest representation for an empty circular list (when such a thing makes sense) is a null
pointer, indicating that the list has no nodes. With this choice, many algorithms have to test for
this special case, and handle it separately. By contrast, the use of null to denote an
empty linear list is more natural and often creates fewer special cases.
However, sentinel nodes use up extra space (especially in applications that use many short
lists), and they may complicate other operations (such as the creation of a new empty list).
However, if the circular list is used merely to simulate a linear list, one may avoid some of this
complexity by adding a single sentinel node to every list, between the last and the first data
nodes. With this convention, an empty list consists of the sentinel node alone, pointing to itself
via the next-node link. The list handle should then be a pointer to the last data node, before the
sentinel, if the list is not empty; or to the sentinel itself, if the list is empty.
The same trick can be used to simplify the handling of a doubly linked linear list, by turning it
into a circular doubly linked list with a single sentinel node. However, in this case, the handle
should be a single pointer to the dummy node itself.[2]
Our node data structure will have two fields. We also keep a variable firstNode which always
points to the first node in the list, or is null for an empty list.
record Node {
data; // The data being stored in the node
Node next // A reference to the next node, null for last node
}
record List {
Node firstNode // points to first node of list; null for empty list
}
Traversal of a singly linked list is simple, beginning at the first node and following each next link
until we come to the end:
node := list.firstNode
while node not null
(do something with node.data)
node = node.next
The following code inserts a node after an existing node in a singly linked list. The diagram
shows how it works. Inserting a node before an existing one cannot be done directly; instead,
you have to keep track of the previous node and insert a node after it.
Inserting at the beginning of the list requires a separate function. This requires
updating firstNode.
Similarly, we have functions for removing the node after a given node, and for removing a node
from the beginning of the list. The diagram demonstrates the former. To find and remove a
particular node, one must again keep track of the previous element.
Notice that removeBeginning() sets list.firstNode to null when removing the last node in the list.
Since we can't iterate backwards, efficient "insertBefore" or "removeBefore" operations are not
possible.
Appending one linked list to another can be inefficient unless a reference to the tail is kept as
part of the List structure, because we must traverse the entire first list in order to find the tail,
and then append the second list to this. Thus, if two linearly linked lists are each of length n, list
appending has asymptotic time complexity of O(n). In the Lisp family of languages, list
appending is provided by the append procedure.
Many of the special cases of linked list operations can be eliminated by including a dummy
element at the front of the list. This ensures that there are no special cases for the beginning of
the list and renders both insertBeginning() and removeBeginning() unnecessary. In this case,
the first useful data in the list will be found at list.firstNode.next.
Both types of circularly linked lists benefit from the ability to traverse the full list beginning at any
given node. This often allows us to avoid storing firstNode and lastNode, although if the list may
be empty we need a special representation for the empty list, such as a lastNode variable which
points to some node in the list or is null if it's empty; we use such a lastNodehere. This
representation significantly simplifies adding and removing nodes with a non-empty list, but
empty lists are then a special case.
[edit]Algorithms
Assuming that someNode is some node in a non-empty circular singly linked list, this code
iterates through that list starting with someNode:
function iterate(someNode)
if someNode ≠ null
node := someNode
do
do something with node.value
node := node.next
while node ≠ someNode
Notice that the test "while node ≠ someNode" must be at the end of the loop. If it were replaced
by the test "" at the beginning of the loop, the procedure would fail whenever the list had only
one node.
This function inserts a node "newNode" into a circular linked list after a given node "node". If
"node" is null, it assumes that the list is empty.
Suppose that "L" is a variable pointing to the last node of a circular linked list (or null if the list is
empty). To append "newNode" to the end of the list, one may do
insertAfter(L, newNode)
L := newNode
insertAfter(L, newNode)
if L = null
L := newNode
As an example, consider the following linked list record that uses arrays instead of pointers:
record Entry {
integer next; // index of next entry in array
integer prev; // previous entry (if double-linked)
string name;
real balance
}
By creating an array of these structures, and an integer variable to store the index of the first
element, a linked list can be built:
integer listHead
Entry Records[1000]
Links between elements are formed by placing the array index of the next (or previous) cell into
the Next or Prev field within a given element. For example:
Pre
Index Next Name Balance
v
2
4 -1 Adams, Adam 0.00
(listHead)
Ignore,
3 999.99
Ignatius
In the above example, ListHead would be set to 2, the location of the first entry in the list.
Notice that entry 3 and 5 through 7 are not part of the list. These cells are available for any
additions to the list. By creating a ListFree integer variable, a free list could be created to keep
track of what cells are available. If all entries are in use, the size of the array would have to be
increased or some elements would have to be deleted before new entries could be stored in the
list.
The following code would traverse the list and display names and account balance:
i := listHead
while i ≥ 0 // loop through the list
print i, Records[i].name, Records[i].balance // print entryi :=
Records[i].next
Deletion in stack
procedure delete(var item : items);
{remove top element from the stack stack and put it in the item}
begin
if top = 0 then stackempty;
item := stack(top);
top := top-1;
end; {of delete}
These two procedures are so simple that they perhaps need no more explanation.
Procedure delete actually combines the functions TOP and DELETE, stackfull and
stackempty are procedures which are left unspecified since they will depend upon the
particular application. Often a stackfull condition will signal that more storage needs
to be allocated and the program re-run. Stackempty is often a meaningful condition.
Deletion in a queue
procedure deleteq (var item : items);
{delete from the front of q and put into item}
begin
if front = rear then queueempty
else begin
front := front+1
item := q[front];
end;
end; {of deleteq}
Deletion in stack
procedure delete(var item : items);
{remove top element from the stack stack and put it in the item}
begin
if top = 0 then stackempty;
item := stack(top);
top := top-1;
end; {of delete}
These two procedures are so simple that they perhaps need no more explanation.
Procedure delete actually combines the functions TOP and DELETE, stackfull and
stackempty are procedures which are left unspecified since they will depend upon the
particular application. Often a stackfull condition will signal that more storage needs
to be allocated and the program re-run. Stackempty is often a meaningful condition.
Deletion in a queue
procedure deleteq (var item : items);
{delete from the front of q and put into item}
begin
if front = rear then queueempty
else begin
front := front+1
item := q[front];
end;
end; {of deleteq}
Deletion in stack
procedure delete(var item : items);
{remove top element from the stack stack and put it in the item}
begin
if top = 0 then stackempty;
item := stack(top);
top := top-1;
end; {of delete}
These two procedures are so simple that they perhaps need no more explanation.
Procedure delete actually combines the functions TOP and DELETE, stackfull and
stackempty are procedures which are left unspecified since they will depend upon the
particular application. Often a stackfull condition will signal that more storage needs
to be allocated and the program re-run. Stackempty is often a meaningful condition.
Deletion in a queue
procedure deleteq (var item : items);
{delete from the front of q and put into item}
begin
if front = rear then queueempty
else begin
front := front+1
item := q[front];
end;
end; {of deleteq}
Recursion is an elegant device for describing this traversal. A second form of traversal
is preorder:
procedure preorder(currentnode:treepointer);
{currentnode is a pointer to a node in a binary tree. For full
tree traversal, pass preorder the ponter to the top of the tree}
begin {preorder}
if currentnode <> nil
then
begin
write(currentnode^.data);
preorder(currentnode^.leftchild);
preorder(currentnode^.rightchild);
end {of if}
end; {of preorder}
In words we would say "visit a node, traverse left and continue again. When you
cannot continue, move right and begin again or move back until you can move right
and resume. At this point it should be easy to guess the next thraversal method which
is called postorder:
procedure postorder(currentnode:treepointer);
{currentnode is a pointer to a node in a binary tree. For full
tree traversal, pass postorder the pointer to the top of the tree}
begin {postorder}
if currentnode<> nil
then
begin
postorder(currentnode^.leftchild);
postorder(currentnode^.rightchild);
write(currentnode^.data);
end {of if}
end; {of postorder} To understand the subject better let's look at an example.