Python groupby() pitfalls

Some conditions to keep in my mind while using Python's groupby method from the itertools lib

Python groupby() pitfalls

Pitfall 1 - Iterable is not sorted

The iterable needs to be sorted on the same key as you are trying to group it by, before applying the groupby() method.

Below is a barebones example showing what happens when the iterable is not sorted

>>> from itertools import groupby
>>> unsorted_iterable = ['red','orange','green','red','green']
>>> for key, group in list(groupby(unsorted_iterable)):
...     print(key)
...
red
orange
green
red
green
# Strings are repeated. This is NOT desired

Strings - red and green were repeated at the end of the output. Let’s take a look for a sorted iterable

>>> from itertools import groupby
>>> sorted_iterable = ['red','red','orange','green','green']
>>> for key, group in list(groupby(sorted_iterable)):
...     print(key)
...
red
orange
green
# We get only the unique strings

A more practical example would be group some fruits by their color

from itertools import groupby
fruits = [
    {'name': 'apple', 'color': 'red'},
    {'name': 'cherry','color': 'red'},
    {'name': 'orange','color': 'orange'},
    {'name': 'pear',  'color': 'green'},
    {'name': 'grape', 'color': 'green'}
]
for color, group in groupby(fruits, key=lambda fruit:fruit['color']):
    print(f"\nAll {color} fruits")
    print(list(group))

Note: here the array of fruits is sorted by their colour as we wish to group them by colour

# Output
All red fruits
[{'name': 'apple', 'color': 'red'}, {'name': 'cherry', 'color': 'red'}]

All orange fruits
[{'name': 'orange', 'color': 'orange'}]

All green fruits
[{'name': 'pear', 'color': 'green'}, {'name': 'grape', 'color': 'green'}]

Pitfall 2 - Nested groupby() calls

When looping through the groupby object, each group is a generator and when a generator is made to yield its values then, it is left with nothing to be grouped by in the nested groupby call.

For this example, a new cost field is added to our list of fruits to be grouped by. We wish to group by the color first and then by the cost.

from itertools import groupby
fruits = [                               # New field
    {'name': 'apple', 'color': 'red',    'cost': 12},
    {'name': 'cherry','color': 'red',    'cost': 12},
    {'name': 'orange','color': 'orange', 'cost': 10},
    {'name': 'pear',  'color': 'green',  'cost': 12},
    {'name': 'grape', 'color': 'green',  'cost': 15}
]

for color, color_group in groupby(fruits, key=lambda fruit:fruit['color']):
    print("-"*20)
    print(f"All {color} fruits")
    print(list(color_group))  # Note: list() operation made the group generator yield out all the values
    for cost, cost_group in groupby(color_group, key=lambda fruit:fruit['cost']):
        print(f"\tAll {color} fruits that cost {cost} bucks")
        print("\t\t", list(cost_group))

In the output below, we find that the inner loop was never executed, Let’s find out why

--------------------
All red fruits
[{'name': 'apple', 'color': 'red', 'cost': 10}, {'name': 'cherry', 'color': 'red', 'cost': 12}]
--------------------
All orange fruits
[{'name': 'orange', 'color': 'orange', 'cost': 12}]
--------------------
All green fruits
[{'name': 'pear', 'color': 'green', 'cost': 15}, {'name': 'grape', 'color': 'green', 'cost': 15}]

If we were to comment out the line print(list(color_group)) and prevent the yielding of values, then the following output would be produced. Where the inner loop was also executed.

--------------------
All red fruits
        All red fruits that cost 10 bucks
                 [{'name': 'apple', 'color': 'red', 'cost': 10}]
        All red fruits that cost 12 bucks
                 [{'name': 'cherry', 'color': 'red', 'cost': 12}]
--------------------
All orange fruits
        All orange fruits that cost 12 bucks
                 [{'name': 'orange', 'color': 'orange', 'cost': 12}]
--------------------
All green fruits
        All green fruits that cost 15 bucks
                 [{'name': 'pear', 'color': 'green', 'cost': 15}, {'name': 'grape', 'color': 'green', 'cost': 15}]

Did you find this article valuable?

Support Karan Parekh by becoming a sponsor. Any amount is appreciated!