Unit Iv: List Python Collections (Arrays) : Example

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

UNIT IV: List

Python Collections (Arrays)


There are four collection data types in the Python programming language:

 List is a collection which is ordered and changeable. Allows duplicate


members.
 Tuple is a collection which is ordered and unchangeable. Allows duplicate
members.
 Set is a collection which is unordered and unindexed. No duplicate
members.
 Dictionary is a collection which is unordered, changeable and indexed.
No duplicate members.

When choosing a collection type, it is useful to understand the properties of that


type. Choosing the right type for a particular data set could mean retention of
meaning, and, it could mean an increase in efficiency or security.

List
A list is a collection which is ordered and changeable. In Python lists are written
with square brackets.

Example
Create a List:

thislist = ["apple", "banana", "cherry"]


print(thislist)

Access Items
You access the list items by referring to the index number:
Example
Print the second item of the list:

thislist = ["apple", "banana", "cherry"]


print(thislist[1])
Try it Yourself »

Negative Indexing
Negative indexing means beginning from the end, -1 refers to the last item, -
2 refers to the second last item etc.

Example
Print the last item of the list:

thislist = ["apple", "banana", "cherry"]


print(thislist[-1])

Range of Indexes
You can specify a range of indexes by specifying where to start and where to
end the range.

When specifying a range, the return value will be a new list with the specified
items.

Example
Return the third, fourth, and fifth item:

thislist =
["apple", "banana", "cherry", "orange", "kiwi", "melon", "mango"]
print(thislist[2:5]

Note: The search will start at index 2 (included) and end at index 5 (not
included).

Remember that the first item has index 0.

By leaving out the start value, the range will start at the first item:

Example
This example returns the items from the beginning to "orange":

thislist =
["apple", "banana", "cherry", "orange", "kiwi", "melon", "mango"]
print(thislist[:4])

By leaving out the end value, the range will go on to the end of the list:

Example
This example returns the items from "cherry" and to the end:

thislist =
["apple", "banana", "cherry", "orange", "kiwi", "melon", "mango"]
print(thislist[2:])
Try it Yourself »

Range of Negative Indexes


Specify negative indexes if you want to start the search from the end of the list:

Example
This example returns the items from index -4 (included) to index -1 (excluded)

thislist =
["apple", "banana", "cherry", "orange", "kiwi", "melon", "mango"]
print(thislist[-4:-1])

Change Item Value


To change the value of a specific item, refer to the index number:
Example
Change the second item:

thislist = ["apple", "banana", "cherry"]


thislist[1] = "blackcurrant"
print(thislist)
Loop Through a List
You can loop through the list items by using a for loop:

Example
Print all items in the list, one by one:

thislist = ["apple", "banana", "cherry"]


for x in thislist:
print(x)

Check if Item Exists


To determine if a specified item is present in a list use the in keyword:

Example
Check if "apple" is present in the list:

thislist = ["apple", "banana", "cherry"]


if "apple" in thislist:
print("Yes, 'apple' is in the fruits list")

List Length
To determine how many items a list has, use the len() function:

Example
Print the number of items in the list:

thislist = ["apple", "banana", "cherry"]


print(len(thislist))
Add Items
To add an item to the end of the list, use the append() method:

Example
Using the append() method to append an item:

thislist = ["apple", "banana", "cherry"]


thislist.append("orange")
print(thislist)

To add an item at the specified index, use the insert() method:

Example
Insert an item as the second position:

thislist = ["apple", "banana", "cherry"]


thislist.insert(1, "orange")
print(thislist)

Remove Item
There are several methods to remove items from a list:

Example
The remove() method removes the specified item:

thislist = ["apple", "banana", "cherry"]


thislist.remove("banana")
print(thislist)

Example
The pop() method removes the specified index, (or the last item if index is not
specified):
thislist = ["apple", "banana", "cherry"]
thislist.pop()
print(thislist)

Example
The del keyword removes the specified index:

thislist = ["apple", "banana", "cherry"]


del thislist[0]
print(thislist)

Example
The del keyword can also delete the list completely:

thislist = ["apple", "banana", "cherry"]


del thislist

Example
The clear() method empties the list:

thislist = ["apple", "banana", "cherry"]


thislist.clear()
print(thislist)

Copy a List
You cannot copy a list simply by typing list2 = list1, because: list2 will only be
a reference to list1, and changes made in list1 will automatically also be made
in list2.

There are ways to make a copy, one way is to use the built-in List
method copy().

Example
Make a copy of a list with the copy() method:
thislist = ["apple", "banana", "cherry"]
mylist = thislist.copy()
print(mylist)

Another way to make a copy is to use the built-in method list().

Example
Make a copy of a list with the list() method:

thislist = ["apple", "banana", "cherry"]


mylist = list(thislist)
print(mylist)

Join Two Lists


There are several ways to join, or concatenate, two or more lists in Python.

One of the easiest ways are by using the + operator.

Example
Join two list:

list1 = ["a", "b" , "c"]


list2 = [1, 2, 3]

list3 = list1 + list2


print(list3)

Another way to join two lists are by appending all the items from list2 into list1,
one by one:

Example
Append list2 into list1:

list1 = ["a", "b" , "c"]


list2 = [1, 2, 3]

for x in list2:
list1.append(x)
print(list1)

Or you can use the extend() method, which purpose is to add elements from one
list to another list:

Example
Use the extend() method to add list2 at the end of list1:

list1 = ["a", "b" , "c"]


list2 = [1, 2, 3]

list1.extend(list2)
print(list1)

The list() Constructor


It is also possible to use the list() constructor to make a new list.

Example
Using the list() constructor to make a List:

thislist = list(("apple", "banana", "cherry")) # note the double round-


brackets
print(thislist)

List Methods
Python has a set of built-in methods that you can use on lists.

Method Description
append() Adds an element at the end of the list

clear() Removes all the elements from the list

copy() Returns a copy of the list

count() Returns the number of elements with the specified value

extend() Add the elements of a list (or any iterable), to the end of the current list

index() Returns the index of the first element with the specified value

insert() Adds an element at the specified position

pop() Removes the element at the specified position

remove() Removes the item with the specified value

reverse() Reverses the order of the list

sort() Sorts the list


Tuple
A tuple is a collection which is ordered and unchangeable. In Python tuples are
written with round brackets.

Example
Create a Tuple:

thistuple = ("apple", "banana", "cherry")


print(thistuple)

Access Tuple Items


You can access tuple items by referring to the index number, inside square
brackets:

Example
Print the second item in the tuple:

thistuple = ("apple", "banana", "cherry")


print(thistuple[1])

Negative Indexing
Negative indexing means beginning from the end, -1 refers to the last item, -
2 refers to the second last item etc.

Example
Print the last item of the tuple:

thistuple = ("apple", "banana", "cherry")


print(thistuple[-1])

Range of Indexes
You can specify a range of indexes by specifying where to start and where to
end the range.

When specifying a range, the return value will be a new tuple with the specified
items.

Example
Return the third, fourth, and fifth item:

thistuple =
("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")
print(thistuple[2:5])

Note: The search will start at index 2 (included) and end at index 5 (not
included).

Remember that the first item has index 0.

Range of Negative Indexes


Specify negative indexes if you want to start the search from the end of the
tuple:

Example
This example returns the items from index -4 (included) to index -1 (excluded)

thistuple =
("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")
print(thistuple[-4:-1])

Change Tuple Values


Once a tuple is created, you cannot change its values. Tuples
are unchangeable, or immutable as it also is called.

But there is a workaround. You can convert the tuple into a list, change the list,
and convert the list back into a tuple.
Example
Convert the tuple into a list to be able to change it:

x = ("apple", "banana", "cherry")


y = list(x)
y[1] = "kiwi"
x = tuple(y)

print(x)

Loop Through a Tuple


You can loop through the tuple items by using a for loop.

Example
Iterate through the items and print the values:

thistuple = ("apple", "banana", "cherry")


for x in thistuple:
print(x)

Check if Item Exists


To determine if a specified item is present in a tuple use the in keyword:

Example
Check if "apple" is present in the tuple:

thistuple = ("apple", "banana", "cherry")


if "apple" in thistuple:
print("Yes, 'apple' is in the fruits tuple")
Tuple Length
To determine how many items a tuple has, use the len() method:

Example
Print the number of items in the tuple:

thistuple = ("apple", "banana", "cherry")


print(len(thistuple))

Add Items
Once a tuple is created, you cannot add items to it. Tuples are unchangeable.

Example
You cannot add items to a tuple:

thistuple = ("apple", "banana", "cherry")


thistuple[3] = "orange" # This will raise an error
print(thistuple)

Create Tuple With One Item


To create a tuple with only one item, you have add a comma after the item,
unless Python will not recognize the variable as a tuple.

Example
One item tuple, remember the commma:

thistuple = ("apple",)
print(type(thistuple))

#NOT a tuple
thistuple = ("apple")
print(type(thistuple))

Remove Items
Note: You cannot remove items in a tuple.

Tuples are unchangeable, so you cannot remove items from it, but you can
delete the tuple completely:

Example
The del keyword can delete the tuple completely:

thistuple = ("apple", "banana", "cherry")


del thistuple
print(thistuple) #this will raise an error because the tuple no longer
exists

Join Two Tuples


To join two or more tuples you can use the + operator:

Example
Join two tuples:

tuple1 = ("a", "b" , "c")


tuple2 = (1, 2, 3)

tuple3 = tuple1 + tuple2


print(tuple3)

The tuple() Constructor


It is also possible to use the tuple() constructor to make a tuple.

Example
Using the tuple() method to make a tuple:

thistuple = tuple(("apple", "banana", "cherry")) # note the double round-


brackets
print(thistuple)

Tuple Methods
Python has two built-in methods that you can use on tuples.

Method Description

count() Returns the number of times a specified value occurs in a tuple

index() Searches the tuple for a specified value and returns the position of wher

Set
A set is a collection which is unordered and unindexed. In Python sets are
written with curly brackets.

Example
Create a Set:

thisset = {"apple", "banana", "cherry"}


print(thisset)
Try it Yourself »

Note: Sets are unordered, so you cannot be sure in which order the items will
appear.

Access Items
You cannot access items in a set by referring to an index, since sets are
unordered the items has no index.

But you can loop through the set items using a for loop, or ask if a specified
value is present in a set, by using the in keyword.

Example
Loop through the set, and print the values:

thisset = {"apple", "banana", "cherry"}

for x in thisset:
print(x)
Try it Yourself »

Example
Check if "banana" is present in the set:

thisset = {"apple", "banana", "cherry"}

print("banana" in thisset)
Try it Yourself »

Change Items
Once a set is created, you cannot change its items, but you can add new items.
Add Items
To add one item to a set use the add() method.

To add more than one item to a set use the update() method.

Example
Add an item to a set, using the add() method:

thisset = {"apple", "banana", "cherry"}

thisset.add("orange")

print(thisset)
Try it Yourself »

Example
Add multiple items to a set, using the update() method:

thisset = {"apple", "banana", "cherry"}

thisset.update(["orange", "mango", "grapes"])

print(thisset)
Try it Yourself »

Get the Length of a Set


To determine how many items a set has, use the len() method.

Example
Get the number of items in a set:
thisset = {"apple", "banana", "cherry"}

print(len(thisset))
Try it Yourself »

Remove Item
To remove an item in a set, use the remove(), or the discard() method.

Example
Remove "banana" by using the remove() method:

thisset = {"apple", "banana", "cherry"}

thisset.remove("banana")

print(thisset)
Try it Yourself »

Note: If the item to remove does not exist, remove() will raise an error.

Example
Remove "banana" by using the discard() method:

thisset = {"apple", "banana", "cherry"}

thisset.discard("banana")

print(thisset)
Try it Yourself »

Note: If the item to remove does not exist, discard() will NOT raise an error.

You can also use the pop(), method to remove an item, but this method will
remove the last item. Remember that sets are unordered, so you will not know
what item that gets removed.

The return value of the pop() method is the removed item.

Example
Remove the last item by using the pop() method:

thisset = {"apple", "banana", "cherry"}

x = thisset.pop()

print(x)

print(thisset)
Try it Yourself »

Note: Sets are unordered, so when using the pop() method, you will not know
which item that gets removed.

Example
The clear() method empties the set:

thisset = {"apple", "banana", "cherry"}

thisset.clear()

print(thisset)
Try it Yourself »

Example
The del keyword will delete the set completely:

thisset = {"apple", "banana", "cherry"}

del thisset

print(thisset)
Try it Yourself »

Join Two Sets


There are several ways to join two or more sets in Python.

You can use the union() method that returns a new set containing all items from
both sets, or the update() method that inserts all the items from one set into
another:
Example
The union() method returns a new set with all items from both sets:

set1 = {"a", "b" , "c"}


set2 = {1, 2, 3}

set3 = set1.union(set2)
print(set3)
Try it Yourself »

Example
The update() method inserts the items in set2 into set1:

set1 = {"a", "b" , "c"}


set2 = {1, 2, 3}

set1.update(set2)
print(set1)
Try it Yourself »

Note: Both union() and update() will exclude any duplicate items.

There are other methods that joins two sets and keeps ONLY the duplicates, or
NEVER the duplicates, check the full list of set methods in the bottom of this
page.

The set() Constructor


It is also possible to use the set() constructor to make a set.

Example
Using the set() constructor to make a set:

thisset = set(("apple", "banana", "cherry")) # note the double round-


brackets
print(thisset)
Try it Yourself »
Set Methods
Python has a set of built-in methods that you can use on sets.

Method Description

add() Adds an element to the set

clear() Removes all the elements from the set

copy() Returns a copy of the set

difference() Returns a set containing the difference between two or more sets

difference_update() Removes the items in this set that are also included in another, specifie

discard() Remove the specified item

intersection() Returns a set, that is the intersection of two other sets

intersection_update() Removes the items in this set that are not present in other, specified se

isdisjoint() Returns whether two sets have a intersection or not


issubset() Returns whether another set contains this set or not

issuperset() Returns whether this set contains another set or not

pop() Removes an element from the set

remove() Removes the specified element

symmetric_difference Returns a set with the symmetric differences of two sets


()

symmetric_difference inserts the symmetric differences from this set and another
_update()

union() Return a set containing the union of sets

update() Update the set with the union of this set and others

Test Yourself With Exercises


Dictionary
A dictionary is a collection which is unordered, changeable and indexed. In
Python dictionaries are written with curly brackets, and they have keys and
values.

Example
Create and print a dictionary:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
print(thisdict)

Try it Yourself »

Accessing Items
You can access the items of a dictionary by referring to its key name, inside
square brackets:

Example
Get the value of the "model" key:

x = thisdict["model"]

Try it Yourself »

There is also a method called get() that will give you the same result:

Example
Get the value of the "model" key:

x = thisdict.get("model")

Change Values
You can change the value of a specific item by referring to its key name:

Example
Change the "year" to 2018:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict["year"] = 2018

Try it Yourself »

Loop Through a Dictionary


You can loop through a dictionary by using a for loop.

When looping through a dictionary, the return value are the keys of the
dictionary, but there are methods to return the values as well.

Example
Print all key names in the dictionary, one by one:

for x in thisdict:
print(x)
Try it Yourself »

Example
Print all values in the dictionary, one by one:

for x in thisdict:
print(thisdict[x])
Try it Yourself »

Example
You can also use the values() function to return values of a dictionary:
for x in thisdict.values():
print(x)
Try it Yourself »

Example
Loop through both keys and values, by using the items() function:

for x, y in thisdict.items():
print(x, y)
Try it Yourself »

Check if Key Exists


To determine if a specified key is present in a dictionary use the in keyword:

Example
Check if "model" is present in the dictionary:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
if "model" in thisdict:
print("Yes, 'model' is one of the keys in the thisdict dictionary")
Try it Yourself »

Dictionary Length
To determine how many items (key-value pairs) a dictionary has, use
the len() method.

Example
Print the number of items in the dictionary:
print(len(thisdict))
Try it Yourself »

Adding Items
Adding an item to the dictionary is done by using a new index key and assigning
a value to it:

Example
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict["color"] = "red"
print(thisdict)

Try it Yourself »

Removing Items
There are several methods to remove items from a dictionary:

Example
The pop() method removes the item with the specified key name:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.pop("model")
print(thisdict)
Try it Yourself »

Example
The popitem() method removes the last inserted item (in versions before 3.7, a
random item is removed instead):

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.popitem()
print(thisdict)
Try it Yourself »

Example
The del keyword removes the item with the specified key name:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
del thisdict["model"]
print(thisdict)
Try it Yourself »

Example
The del keyword can also delete the dictionary completely:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
del thisdict
print(thisdict) #this will cause an error because "thisdict" no longer
exists.
Try it Yourself »

Example
The clear() method empties the dictionary:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict.clear()
print(thisdict)
Try it Yourself »

Copy a Dictionary
You cannot copy a dictionary simply by typing dict2 = dict1, because: dict2 will
only be a reference to dict1, and changes made in dict1 will automatically also
be made in dict2.

There are ways to make a copy, one way is to use the built-in Dictionary
method copy().

Example
Make a copy of a dictionary with the copy() method:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
mydict = thisdict.copy()
print(mydict)
Try it Yourself »

Another way to make a copy is to use the built-in method dict().

Example
Make a copy of a dictionary with the dict() method:

thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
mydict = dict(thisdict)
print(mydict)
Try it Yourself »
Nested Dictionaries
A dictionary can also contain many dictionaries, this is called nested
dictionaries.

Example
Create a dictionary that contain three dictionaries:

myfamily = {
"child1" : {
"name" : "Emil",
"year" : 2004
},
"child2" : {
"name" : "Tobias",
"year" : 2007
},
"child3" : {
"name" : "Linus",
"year" : 2011
}
}
Try it Yourself »

Or, if you want to nest three dictionaries that already exists as dictionaries:

Example
Create three dictionaries, than create one dictionary that will contain the other
three dictionaries:

child1 = {
"name" : "Emil",
"year" : 2004
}
child2 = {
"name" : "Tobias",
"year" : 2007
}
child3 = {
"name" : "Linus",
"year" : 2011
}

myfamily = {
"child1" : child1,
"child2" : child2,
"child3" : child3
}
Try it Yourself »

The dict() Constructor


It is also possible to use the dict() constructor to make a new dictionary:

Example
thisdict = dict(brand="Ford", model="Mustang", year=1964)
# note that keywords are not string literals
# note the use of equals rather than colon for the assignment
print(thisdict)

Try it Yourself »

Dictionary Methods
Python has a set of built-in methods that you can use on dictionaries.

Method Description

clear() Removes all the elements from the dictionary

copy() Returns a copy of the dictionary


fromkeys() Returns a dictionary with the specified keys and value

get() Returns the value of the specified key

items() Returns a list containing a tuple for each key value pair

keys() Returns a list containing the dictionary's keys

pop() Removes the element with the specified key

popitem() Removes the last inserted key-value pair

setdefault() Returns the value of the specified key. If the key does not exist: insert the key,
specified value

update() Updates the dictionary with the specified key-value pairs

values() Returns a list of all the values in the dictionary


Sorting, searching and algorithm analysis
Introduction
We have learned that in order to write a computer program which performs some task
we must construct a suitable algorithm. However, whatever algorithm we construct is
unlikely to be unique – there are likely to be many possible algorithms which can
perform the same task. Are some of these algorithms in some sense better than others?
Algorithm analysis is the study of this question.

In this chapter we will analyse four algorithms; two for each of the following common
tasks:

 sorting: ordering a list of values


 searching: finding the position of a value within a list

Algorithm analysis should begin with a clear statement of the task to be performed. This
allows us both to check that the algorithm is correct and to ensure that the algorithms
we are comparing perform the same task.

Although there are many ways that algorithms can be compared, we will focus on two
that are of primary importance to many data processing algorithms:

 time complexity: how the number of steps required depends on the size of the input
 space complexity: how the amount of extra memory or storage required depends on the size
of the input

Note

Common sorting and searching algorithms are widely implemented and already
available for most programming languages. You will seldom have to implement them
yourself outside of the exercises in these notes. It is nevertheless important for you to
understand these basic algorithms, because you are likely to use them within your own
programs – their space and time complexity will thus affect that of your own algorithms.
Should you need to select a specific sorting or searching algorithm to fit a particular
task, you will require a good understanding of the available options.
Sorting algorithms
The sorting of a list of values is a common computational task which has been studied
extensively. The classic description of the task is as follows:

Given a list of values and a function that compares two values, order the values in the list from
smallest to largest.

The values might be integers, or strings or even other kinds of objects. We will examine
two algorithms:

 Selection sort, which relies on repeated selection of the next smallest item
 Merge sort, which relies on repeated merging of sections of the list that are already sorted

Other well-known algorithms for sorting lists are insertion sort, bubble sort, heap
sort, quicksort and shell sort.

There are also various algorithms which perform the sorting task for restricted kinds of
values, for example:

 Counting sort, which relies on the values belonging to a small set of items
 Bucket sort, which relies on the ability to map each value to one of a small set of items
 Radix sort, which relies on the values being sequences of digits

If we restrict the task, we can enlarge the set of algorithms that can perform it. Among
these new algorithms may be ones that have desirable properties. For example, Radix
sort uses fewer steps than any generic sorting algorithm.

Selection sort

To order a given list using selection sort, we repeatedly select the smallest remaining
element and move it to the end of a growing sorted list.
To illustrate selection sort, let us examine how it operates on a small list of four
elements:

Initially the entire list is unsorted. We will use the front of the list to hold the sorted items
– to avoid using extra storage space – but at the start this sorted list is empty.

First we must find the smallest element in the unsorted portion of the list. We take the
first element of the unsorted list as a candidate and compare it to each of the following
elements in turn, replacing our candidate with any element found to be smaller. This
requires 3 comparisons and we find that element 1.5 at position 2 is smallest.

Now we will swap the first element of our unordered list with the smallest element. This
becomes the start of our ordered list:

We now repeat our previous steps, determining that 2.7 is the smallest remaining
element and swapping it with 3.8 – the first element of the current unordered section –
to get:
Finally, we determine that 3.8 is the smallest of the remaining unordered elements and
swap it with 7.2:

The table below shows the number of operations of each type used in sorting our
example list:

Sorted List Length Comparisons Swaps Assign smallest candidate

0 -> 1 3 1 3

1 -> 2 2 1 2

2 -> 3 1 1 2

Total 6 3 7

Note that the number of comparisons and the number of swaps are independent of the
contents of the list (this is true for selection sort but not necessarily for other sorting
algorithms) while the number of times we have to assign a new value to the smallest
candidate depends on the contents of the list.

More generally, the algorithm for selection sort is as follows:

1. Divide the list to be sorted into a sorted portion at the front (initially empty) and an unsorted
portion at the end (initially the whole list).
2. Find the smallest element in the unsorted list:

1. Select the first element of the unsorted list as the initial candidate.
2. Compare the candidate to each element of the unsorted list in turn, replacing the candidate
with the current element if the current element is smaller.
3. Once the end of the unsorted list is reached, the candidate is the smallest element.

3. Swap the smallest element found in the previous step with the first element in the
unsorted list, thus extending the sorted list by one element.
4. Repeat the steps 2 and 3 above until only one element remains in the unsorted list.

Note

The Selection sort algorithm as described here has two properties which are often
desirable in sorting algorithms.

The first is that the algorithm is in-place. This means that it uses essentially no extra
storage beyond that required for the input (the unsorted list in this case). A little extra
storage may be used (for example, a temporary variable to hold the candidate for the
smallest element). The important property is that the extra storage required should not
increase as the size of the input increases.

The second is that the sorting algorithm is stable. This means that two elements which
are equal retain their initial relative ordering. This becomes important if there is
additional information attached to the values being sorted (for example, if we are sorting
a list of people using a comparison function that compares their dates of birth). Stable
sorting algorithms ensure that sorting an already sorted list leaves the order of the list
unchanged, even in the presence of elements that are treated as equal by the
comparison.
Exercise 1

Complete the following code which will perform a selection sort in Python. ”...” denotes
missing code that should be filled in:

def selection_sort(items):
"""Sorts a list of items into ascending order using the
selection sort algoright.
"""
for step in range(len(items)):
# Find the location of the smallest element in
# items[step:].
location_of_smallest = step
for location in range(step, len(items)):
# TODO: determine location of smallest
...
# TODO: Exchange items[step] with items[location_of_smallest]
...
Exercise 2

Earlier in this section we counted the number


of comparisons, swaps and assignments used in our example.

1. How many swaps are performed when we apply selection sort to a list of N items?
2. How many comparisons are performed when we apply selection sort to a list of N items?
1. How many comparisons are performed to find the smallest element when the unsorted
portion of the list has M items?
2. Sum over all the values of M encountered when sorting the list of length N to find the
total number of comparisons.
3. The number of assignments (to the candidate smallest number) performed during the
search for a smallest element is at most one more than the number of comparisons. Use this
to find an upper limit on the total number of assignments performed while sorting a list of
length N.
4. Use the results of the previous question to find an upper bound on the total number of
operations (swaps, comparisons and assignments) performed. Which term in the number of
operations will dominate for large lists?

Merge sort

When we use merge sort to order a list, we repeatedly merge sorted sub-sections of the
list – starting from sub-sections consisting of a single item each.

We will see shortly that merge sort requires significantly fewer operations than selection
sort.

Let us start once more with our small list of four elements:
First we will merge the two sections on the left into the temporary storage. Imagine the
two sections as two sorted piles of cards – we will merge the two piles by repeatedly
taking the smaller of the top two cards and placing it at the end of the merged list in the
temporary storage. Once one of the two piles is empty, the remaining items in the other
pile can just be placed on the end of the merged list:

Next we copy the merged list from the temporary storage back into the portion of the list
originally occupied by the merged subsections:
We repeat the procedure to merge the second pair of sorted sub-sections:

Having reached the end of the original list, we now return to the start of the list and
begin to merge sorted sub-sections again. We repeat this until the entire list is a single
sorted sub-section. In our example, this requires just one more merge:

Notice how the size of the sorted sections of the list doubles after every iteration of
merges. After M steps the size of the sorted sections is 2M. Once 2M is greater than N,
the entire list is sorted. Thus, for a list of size N, we need M equals log2N interations to
sort the list.

Each iteration of merges requires a complete pass through the list and each element is
copied twice – once into the temporary storage and once back into the original list. As
long as there are items left in both sub-sections in each pair, each copy into the
temporary list also requires a comparison to pick which item to copy. Once one of the
lists runs out, no comparisons are needed. Thus each pass requires 2N copies and
roughly N comparisons (and certainly no more than N).
The total number of operations required for our merge sort algorithm is the product of
the number of operations in each pass and the number of passes – i.e. 2Nlog2N copies
and roughly Nlog2N comparisons.

The algorithm for merge sort may be written as this list of steps:

1. Create a temporary storage list which is the same size as the list to be sorted.
2. Start by treating each element of the list as a sorted one-element sub-section of the original
list.
3. Move through all the sorted sub-sections, merging adjacent pairs as follows:
1. Use two variables to point to the indices of the smallest uncopied items in the two sorted
sub-sections, and a third variable to point to the index of the start of the temporary
storage.
2. Copy the smaller of the two indexed items into the indicated position in the temporary
storage. Increment the index of the sub-section from which the item was copied, and the
index into temporary storage.
3. If all the items in one sub-section have been copied, copy the items remaining in the
other sub-section to the back of the list in temporary storage. Otherwise return to step 3
ii.
4. Copy the sorted list in temporary storage back over the section of the original list which
was occupied by the two sub-sections that have just been merged.
4. If only a single sorted sub-section remains, the entire list is sorted and we are done.
Otherwise return to the start of step 3.

Exercise 3

Write a Python function that implements merge sort. It may help to write a separate
function which performs merges and call it from within your merge sort implementation.

Python’s sorting algorithm

Python’s default sorting algorithm, which is used by the built-in sorted function as well
as the sort method of list objects, is called Timsort. It’s an algorithm developed by Tim
Peters in 2002 for use in Python. Timsort is a modified version of merge sort which uses
insertion sort to arrange the list of items into conveniently mergeable sections.

Note
Tim Peters is also credited as the author of The Zen of Python – an attempt to
summarise the early Python community’s ethos in a short series of koans. You can read
it by typing import this into the Python console.
Searching algorithms
Searching is also a common and well-studied task. This task can be described formally
as follows:

Given a list of values, a function that compares two values and a desired value, find the position
of the desired value in the list.

We will look at two algorithms that perform this task:

 linear search, which simply checks the values in sequence until the desired value is found
 binary search, which requires a sorted input list, and checks for the value in the middle of
the list, repeatedly discarding the half of the list which contains values which are definitely
either all larger or all smaller than the desired value

There are numerous other searching techniques. Often they rely on the construction of
more complex data structures to facilitate repeated searching. Examples of such
structures are hash tables (such as Python’s dictionaries) and prefix trees. Inexact
searches that find elements similar to the one being searched for are also an important
topic.

Linear search

Linear search is the most basic kind of search method. It involves checking each
element of the list in turn, until the desired element is found.

For example, suppose that we want to find the number 3.8 in the following list:
We start with the first element, and perform a comparison to see if its value is the value
that we want. In this case, 1.5 is not equal to 3.8, so we move onto the next element:

We perform another comparison, and see that 2.7 is also not equal to 3.8, so we move
onto the next element:

We perform another comparison and determine that we have found the correct element.
Now we can end the search and return the position of the element (index 2).

We had to use a total of 3 comparisons when searching through this list of 4 elements.
How many comparisons we need to perform depends on the total length of the list, but
also whether the element we are looking for is near the beginning or near the end of the
list. In the worst-case scenario, if our element is the last element of the list, we will have
to search through the entire list to find it.

If we search the same list many times, assuming that all elements are equally likely to
be searched for, we will on average have to search through half of the list each time.
The cost (in comparisons) of performing linear search thus scales linearly with the
length of the list.

Exercise 4

1. Write a function which implements linear search. It should take a list and an element as a
parameter, and return the position of the element in the list. If the element is not in the list,
the function should raise an exception. If the element is in the list multiple times, the function
should return the first position.

Binary search

Binary search is a more efficient search algorithm which relies on the elements in the
list being sorted. We apply the same search process to progressively smaller sub-lists
of the original list, starting with the whole list and approximately halving the search area
every time.

We first check the middle element in the list.

 If it is the value we want, we can stop.


 If it is higher than the value we want, we repeat the search process with the portion of the
list before the middle element.
 If it is lower than the value we want, we repeat the search process with the portion of the
list after the middle element.

For example, suppose that we want to find the value 3.8 in the following list of 7
elements:

First we compare the element in the middle of the list to our value. 7.2 is bigger than
3.8, so we need to check the first half of the list next.
Now the first half of the list is our new list to search. We compare the element in the
middle of this list to our value. 2.7 is smaller than 3.8, so we need to search the second
half of this sublist next.

The second half of the last sub-list is just a single element, which is also the middle
element. We compare this element to our value, and it is the element that we want.

We have performed 3 comparisons in total when searching this list of 7 items. The
number of comparisons we need to perform scales with the size of the list, but much
more slowly than for linear search – if we are searching a list of length N, the maximum
number of comparisons that we will have to perform is log2N.

Exercise 5

1. Write a function which implements binary search. You may assume that the input list will be
sorted. Hint: this function is often written recursively.

Algorithm complexity and Big O notation


We commonly express the cost of an algorithm as a function of the number N of
elements that the algorithm acts on. The function gives us an estimate of the number of
operations we have to perform in order to use the algorithm on N elements – it thus
allows us to predict how the number of required operations will increase as N increases.
We use a function which is an approximation of the exact function – we simplify it as
much as possible, so that only the most important information is preserved.

For example, we know that when we use linear search on a list of N elements, on
average we will have to search through half of the list before we find our item – so the
number of operations we will have to perform is N/2. However, the most important thing
is that the algorithm scales linearly – as N increases, the cost of the algorithm increases
in proportion to N, not N2 or N3. The constant factor of 1/2 is insignificant compared to
the very large differences in cost between – for example – N and N2, so we leave it out
when we describe the cost of the algorithm.

We thus write the cost of the linear search algorithm as O(N) – we say that the cost
is on the order of N, or just order N. We call this notation big O notation, because it uses
the capital O symbol (for order).

We have dropped the constant factor 1/2. We would also drop any lower-order terms
from an expression with multiple terms – for example, O(N3 + N2) would be simplified to
O(N3).

In the example above we calculated the average cost of the algorithm, which is also
known as the expected cost, but it can also be useful to calculate the best
case and worst case costs. Here are the best case, expected and worst case costs for
the sorting and searching algorithms we have discussed so far:

Algorithm Best case Expected Worst case

Selection sort O(N2) O(N2) O(N2)

Merge sort O(N log N) O(N log N) O(N log N)

Linear search O(1) O(N) O(N)

Binary search O(1) O(log N) O(log N)

What does O(1) mean? It means that the cost of an algorithm is constant, no matter
what the value of N is. For both these search algorithms, the best case scenario
happens when the first element to be tested is the correct element – then we only have
to perform a single operation to find it.

In the previous table, big O notation has been used to describe the time complexity of
algorithms. It can also be used to describe their space complexity – in which case the
cost function represents the number of units of space required for storage rather than
the required number of operations. Here are the space complexities of the algorithms
above (for the worst case, and excluding the space required to store the input):

Algorithm Space complexity

Selection sort O(1)

Merge sort O(N)

Linear search O(1)

Binary search O(1)

None of these algorithms require a significant amount of storage space in addition to


that used by the input list, except for the merge sort – which, as we saw in a previous
section, requires temporary storage which is the same size as the input (and thus
scales linearly with the input size).

Note

The Python wiki has a summary of the time complexities of common operations on
collections. You may also wish to investigate the collections module, which provides
additional collection classes which are optimised for particular tasks.
Note

Computational complexity theory studies the inherent complexity of tasks themselves.


Sometimes it is possible to prove that any algorithm that can perform a given task will
require some minimum number of steps or amount of extra storage. For example, it can
be shown that, given a list of arbitrary objects and only a comparison function with
which to compare them, no sorting algorithm can use fewer than O(N log N)
comparisons.
Exercise 6

1. We can see from the comparison tables above that binary search is more efficient than
linear search. Why would we ever use linear search? Hint: what property must a list have for
us to be able to use a binary search on it?
2. Suppose that each of the following functions shows the average number of operations
required to perform some algorithm on a list of length N. Give the big O notation for the time
complexity of each algorithm:
1. 4N2 + 2N + 2
2. N + log N
3. N log N
4. 3

Answers to exercises
Answer to exercise 1

Completed selection sort implementation:

def selection_sort(items):
"""Sorts a list of items into ascending order using the
selection sort algoright.
"""
for step in range(len(items)):
# Find the location of the smallest element in
# items[step:].
location_of_smallest = step
for location in range(step, len(items)):
# determine location of smallest
if items[location] < items[location_of_smallest]:
location_of_smallest = location
# Exchange items[step] with items[location_of_smallest]
temporary_item = items[step]
items[step] = items[location_of_smallest]
items[location_of_smallest] = temporary_item
Answer to exercise 2

1. N - 1 swaps are performed.


2. (N - 1) * N / 2 comparisons are performed.

1. M - 1 comparisons are performed finding the smallest element.


2. Summing M - 1 from 2 to N gives:
3. 1 + 2 + 3 + ... + (N - 1)
4.
5. = (N - 1) * N / 2
3. At most (N - 1) * N / 2 + (N - 1) assignements are performed.
4. At most N**2 + N - 2 operations are performed. For long lists the number of
operations grows as N**2 .
Answer to exercise 3
1. Here is an example program:
2. def merge(items, sections, temporary_storage):
3. (start_1, end_1), (start_2, end_2) = sections
4. i_1 = start_1
5. i_2 = start_2
6. i_t = 0
7.
8. while i_1 < end_1 or i_2 < end_2:
9. if i_1 < end_1 and i_2 < end_2:
10. if items[i_1] < items[i_2]:
11. temporary_storage[i_t] = items[i_1]
12. i_1 += 1
13. else: # the_list[i_2] >= items[i_1]
14. temporary_storage[i_t] = items[i_2]
15. i_2 += 1
16. i_t += 1
17.
18. elif i_1 < end_1:
19. for i in range(i_1, end_1):
20. temporary_storage[i_t] = items[i_1]
21. i_1 += 1
22. i_t += 1
23.
24. else: # i_2 < end_2
25. for i in range(i_2, end_2):
26. temporary_storage[i_t] = items[i_2]
27. i_2 += 1
28. i_t += 1
29.
30. for i in range(i_t):
31. items[start_1 + i] = temporary_storage[i]
32.
33.
34. def merge_sort(items):
35. n = len(items)
36. temporary_storage = [None] * n
37. size_of_subsections = 1
38.
39. while size_of_subsections < n:
40. for i in range(0, n, size_of_subsections * 2):
41. i1_start, i1_end = i, min(i + size_of_subsections, n)
42. i2_start, i2_end = i1_end, min(i1_end + size_of_subsections, n)
43. sections = (i1_start, i1_end), (i2_start, i2_end)
44. merge(items, sections, temporary_storage)
45. size_of_subsections *= 2
46.
47. return items
Answer to exercise 4

1. Here is an example program:


2. def linear_search(items, desired_item):
3. for position, item in enumerate(items):
4. if item == desired_item:
5. return position
6.
7. raise ValueError("%s was not found in the list." % desired_item)
Answer to exercise 5

1. Here is an example program:


2. def binary_search(items, desired_item, start=0, end=None):
3. if end == None:
4. end = len(items)
5.
6. if start == end:
7. raise ValueError("%s was not found in the list." % desired_item)
8.
9. pos = (end - start) // 2 + start
10.
11. if desired_item == items[pos]:
12. return pos
13. elif desired_item > items[pos]:
14. return binary_search(items, desired_item, start=(pos + 1), end=end)
15. else: # desired_item < items[pos]:
16. return binary_search(items, desired_item, start=start, end=pos)
Answer to exercise 6

1. The advantage of linear search is that it can be performed on an unsorted list – if we are
going to examine all the values in turn, their order doesn’t matter. It can be more efficient to
perform a linear search than a binary search if we need to find a value once in a large
unsorted list, because just sorting the list in preparation for performing a binary search could
be more expensive. If, however, we need to find values in the same large list multiple times,
sorting the list and using binary search becomes more worthwhile.
2. We drop all constant factors and less significant terms:
1. O(N2)
2. O(N)
3. O(N log N)
4. O(1)

You might also like