Checking whether iterables are equal in Python

Share
Copied to clipboard.
Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
4 min. read 3 min. video Python 3.9—3.13

In Python, how can you check whether two iterables are equal?

Simple equality checks

If we have two lists and we wanted to know whether the items in these two lists are the same, we could use the equality operator (==):

>>> lines1 = ["Grains", "Kindred", "Zia"]
>>> lines2 = ["Grains", "Kindred", "Zia"]
>>> lines1 == lines2
True

The same thing works for comparing tuples:

>>> p = (3, 4, 8)
>>> q = (3, 5, 7)
>>> p == q
False

But what if we wanted to compare a list and a tuple?

We can't use a simple equality check for that:

>>> lines1 = ["Grains", "Kindred", "Zia"]
>>> lines2 = ("Grains", "Kindred", "Zia")
>>> lines1 == lines2
False

Comparing different types of iterables

To compare the items in a list with the items in a tuple, we could convert the list to a tuple or the tuple to a list, and then check for equality:

>>> lines1 = ["Grains", "Kindred", "Zia"]
>>> lines2 = ("Grains", "Kindred", "Zia")
>>> lines1 == list(lines2)
True

This approach could also work for comparing any other type of iterable.

For example, we have two itertools.chain objects here:

>>> file1a = open("year1-first.txt")
>>> file1b = open("year1-second.txt")
>>> file2a = open("year2-first.txt")
>>> file2b = open("year2-second.txt")
>>> from itertools import chain
>>> year1 = chain(file1a, file1b)
>>> year2 = chain(file2a, file2b)

These chain objects are iterables, but they're also iterators. And the equality operator doesn't work very well with iterators.

If we loop over these two chain objects, we would see that they have the same items within them:

>>> year1 == year2
False

So we could convert each of them to a list before we compare them:

>>> list(year1) == list(year2)
True

And that works.

Checking equality between large iterables

Converting two iterables to a list or a tuple before we compare them does work. But if we're comparing two very large iterables, and those iterables would usually differ in their first few elements when they are different, this type of comparison would take more memory and more time than is necessary.

For example, maybe we're comparing two files that are usually either completely the same or they differ within the first few lines.

We could read the whole file into memory all at once and then compare each of them:

>>> with open("results1.csv") as file1, open("results2.csv") as file2:
...     same = (file1.read() == file2.read())
...

Or we could use the zip function with the strict=True argument to compare line-by-line in a for loop:

>>> with open("results1.csv") as file1, open("results2.csv") as file2:
...     same = True
...     for line1, line2 in zip(file1, file2, strict=True):
...         if line1 != line2:
...             same = False
...             break
...

This style of checking for a non-matching value, setting a boolean, and breaking from a loop is actually what Python's any and all functions were designed to handle.

So we could do the same thing by using a generator expression and the built-in all function:

>>> with open("results1.csv") as file1, open("results2.csv") as file2:
...     same = all(
...         line1 == line2
...         for line1, line2 in zip(file1, file2, strict=True)
...     )

Either way, we're looping over each file line-by-line, comparing each line to make sure they're equal, and then discarding previous lines after we're done comparing them, and stopping as soon as we find an unequal line.

That's pretty nifty.

Checking for near-equality

This all and zip approach can also be adapted to check for near equality.

For example, maybe we're reading 2 CSV files and comparing them, but we only care whether the first column of every row matches in the file:

>>> import csv
>>> with open("results1.csv") as file1, open("results2.csv") as file2:
...     reader1, reader2 = csv.reader(file1), csv.reader(file2)
...     same = all(
...         row1[0] == row2[0]
...         for row1, row2 in zip(reader1, reader2, strict=True)
...     )
...

This zip and all style of comparison works well if you don't care about the types of the iterables you're comparing, and you only care whether they have the same items in the same order. So if we want to check whether a list and a tuple (or any iterables) have the same items, we could use this approach.

Ignoring order when comparing iterables

All of these approaches so far assume that the order of the two iterables is significant. What if you need to compare two iterables while ignoring the order of their items?

>>> members = ["Gregory", "Barry", "Pablo", "Emily", "Donghee"]
>>> speakers = ["Barry", "Donghee", "Emily", "Gregory", "Pablo"]

Well, if the items are strings, numbers, or other hashable objects, then you could put the two iterables in sets and check whether those sets are equal:

>>> set(members) == set(speakers)
True

If your iterables might repeat items, and repeats are significant, then you could use collections.Counter:

>>> hand1 = ["AS", "AS", "KS", "KH", "QC"]
>>> hand2 = ["AS", "KH", "AS", "QC", "KS"]
>>> from collections import Counter
>>> Counter(hand1) == Counter(hand2)
True

The sorted function would work as well if your items are order-able:

>>> sorted(hand1) == sorted(hand2)
True

But sorting is a slower process than adding items to a Counter object, so I would prefer using Counter over sorting.

Comparing iterables isn't just about equality

In Python, we love duck typing, which means we often care about the behavior of an object more than the type of that object.

Because of that, it's common to write code that works with lists, but also works with other types of iterables. So sometimes checking whether two iterables have the same items involves more than a simple equality check.

A Python Tip Every Week

Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.