In Python, how can you check whether two iterables are equal?
If we have two lists and we wanted to know whether the items in these two lists are the same, we could use the equality operator (==
):
>>> lines1 = ["Grains", "Kindred", "Zia"]
>>> lines2 = ["Grains", "Kindred", "Zia"]
>>> lines1 == lines2
True
The same thing works for comparing tuples:
>>> p = (3, 4, 8)
>>> q = (3, 5, 7)
>>> p == q
False
But what if we wanted to compare a list and a tuple?
We can't use a simple equality check for that:
>>> lines1 = ["Grains", "Kindred", "Zia"]
>>> lines2 = ("Grains", "Kindred", "Zia")
>>> lines1 == lines2
False
To compare the items in a list with the items in a tuple, we could convert the list to a tuple or the tuple to a list, and then check for equality:
>>> lines1 = ["Grains", "Kindred", "Zia"]
>>> lines2 = ("Grains", "Kindred", "Zia")
>>> lines1 == list(lines2)
True
This approach could also work for comparing any other type of iterable.
For example, we have two itertools.chain
objects here:
>>> file1a = open("year1-first.txt")
>>> file1b = open("year1-second.txt")
>>> file2a = open("year2-first.txt")
>>> file2b = open("year2-second.txt")
>>> from itertools import chain
>>> year1 = chain(file1a, file1b)
>>> year2 = chain(file2a, file2b)
These chain
objects are iterables, but they're also iterators.
And the equality operator doesn't work very well with iterators.
If we loop over these two chain
objects, we would see that they have the same items within them:
>>> year1 == year2
False
So we could convert each of them to a list before we compare them:
>>> list(year1) == list(year2)
True
And that works.
Converting two iterables to a list or a tuple before we compare them does work. But if we're comparing two very large iterables, and those iterables would usually differ in their first few elements when they are different, this type of comparison would take more memory and more time than is necessary.
For example, maybe we're comparing two files that are usually either completely the same or they differ within the first few lines.
We could read the whole file into memory all at once and then compare each of them:
>>> with open("results1.csv") as file1, open("results2.csv") as file2:
... same = (file1.read() == file2.read())
...
Or we could use the zip
function with the strict=True
argument to compare line-by-line in a for
loop:
>>> with open("results1.csv") as file1, open("results2.csv") as file2:
... same = True
... for line1, line2 in zip(file1, file2, strict=True):
... if line1 != line2:
... same = False
... break
...
This style of checking for a non-matching value, setting a boolean, and breaking from a loop is actually what Python's any
and all
functions were designed to handle.
So we could do the same thing by using a generator expression and the built-in all
function:
>>> with open("results1.csv") as file1, open("results2.csv") as file2:
... same = all(
... line1 == line2
... for line1, line2 in zip(file1, file2, strict=True)
... )
Either way, we're looping over each file line-by-line, comparing each line to make sure they're equal, and then discarding previous lines after we're done comparing them, and stopping as soon as we find an unequal line.
That's pretty nifty.
This all
and zip
approach can also be adapted to check for near equality.
For example, maybe we're reading 2 CSV files and comparing them, but we only care whether the first column of every row matches in the file:
>>> import csv
>>> with open("results1.csv") as file1, open("results2.csv") as file2:
... reader1, reader2 = csv.reader(file1), csv.reader(file2)
... same = all(
... row1[0] == row2[0]
... for row1, row2 in zip(reader1, reader2, strict=True)
... )
...
This zip
and all
style of comparison works well if you don't care about the types of the iterables you're comparing, and you only care whether they have the same items in the same order.
So if we want to check whether a list and a tuple (or any iterables) have the same items, we could use this approach.
All of these approaches so far assume that the order of the two iterables is significant. What if you need to compare two iterables while ignoring the order of their items?
>>> members = ["Gregory", "Barry", "Pablo", "Emily", "Donghee"]
>>> speakers = ["Barry", "Donghee", "Emily", "Gregory", "Pablo"]
Well, if the items are strings, numbers, or other hashable objects, then you could put the two iterables in sets and check whether those sets are equal:
>>> set(members) == set(speakers)
True
If your iterables might repeat items, and repeats are significant, then you could use collections.Counter
:
>>> hand1 = ["AS", "AS", "KS", "KH", "QC"]
>>> hand2 = ["AS", "KH", "AS", "QC", "KS"]
>>> from collections import Counter
>>> Counter(hand1) == Counter(hand2)
True
The sorted
function would work as well if your items are order-able:
>>> sorted(hand1) == sorted(hand2)
True
But sorting is a slower process than adding items to a Counter
object, so I would prefer using Counter
over sorting.
In Python, we love duck typing, which means we often care about the behavior of an object more than the type of that object.
Because of that, it's common to write code that works with lists, but also works with other types of iterables. So sometimes checking whether two iterables have the same items involves more than a simple equality check.
Need to fill-in gaps in your Python skills?
Sign up for my Python newsletter where I share one of my favorite Python tips every week.
Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.
Sign in to your Python Morsels account to track your progress.
Don't have an account yet? Sign up here.