Pitfall 1 - Iterable is not sorted
The iterable needs to be sorted on the same key as you are trying to group it by, before applying the groupby()
method.
Below is a barebones example showing what happens when the iterable is not sorted
>>> from itertools import groupby
>>> unsorted_iterable = ['red','orange','green','red','green']
>>> for key, group in list(groupby(unsorted_iterable)):
... print(key)
...
red
orange
green
red
green
Strings - red
and green
were repeated at the end of the output. Let’s take a look for a sorted iterable
>>> from itertools import groupby
>>> sorted_iterable = ['red','red','orange','green','green']
>>> for key, group in list(groupby(sorted_iterable)):
... print(key)
...
red
orange
green
A more practical example would be group some fruits by their color
from itertools import groupby
fruits = [
{'name': 'apple', 'color': 'red'},
{'name': 'cherry','color': 'red'},
{'name': 'orange','color': 'orange'},
{'name': 'pear', 'color': 'green'},
{'name': 'grape', 'color': 'green'}
]
for color, group in groupby(fruits, key=lambda fruit:fruit['color']):
print(f"\nAll {color} fruits")
print(list(group))
Note: here the array of fruits is sorted by their colour as we wish to group them by colour
All red fruits
[{'name': 'apple', 'color': 'red'}, {'name': 'cherry', 'color': 'red'}]
All orange fruits
[{'name': 'orange', 'color': 'orange'}]
All green fruits
[{'name': 'pear', 'color': 'green'}, {'name': 'grape', 'color': 'green'}]
Pitfall 2 - Nested groupby() calls
When looping through the groupby object, each group is a generator and when a generator is made to yield its values then, it is left with nothing to be grouped by in the nested groupby call.
For this example, a new cost
field is added to our list of fruits to be grouped by. We wish to group by the color first and then by the cost.
from itertools import groupby
fruits = [
{'name': 'apple', 'color': 'red', 'cost': 12},
{'name': 'cherry','color': 'red', 'cost': 12},
{'name': 'orange','color': 'orange', 'cost': 10},
{'name': 'pear', 'color': 'green', 'cost': 12},
{'name': 'grape', 'color': 'green', 'cost': 15}
]
for color, color_group in groupby(fruits, key=lambda fruit:fruit['color']):
print("-"*20)
print(f"All {color} fruits")
print(list(color_group))
for cost, cost_group in groupby(color_group, key=lambda fruit:fruit['cost']):
print(f"\tAll {color} fruits that cost {cost} bucks")
print("\t\t", list(cost_group))
In the output below, we find that the inner loop was never executed, Let’s find out why
--------------------
All red fruits
[{'name': 'apple', 'color': 'red', 'cost': 10}, {'name': 'cherry', 'color': 'red', 'cost': 12}]
--------------------
All orange fruits
[{'name': 'orange', 'color': 'orange', 'cost': 12}]
--------------------
All green fruits
[{'name': 'pear', 'color': 'green', 'cost': 15}, {'name': 'grape', 'color': 'green', 'cost': 15}]
--------------------
All red fruits
All red fruits that cost 10 bucks
[{'name': 'apple', 'color': 'red', 'cost': 10}]
All red fruits that cost 12 bucks
[{'name': 'cherry', 'color': 'red', 'cost': 12}]
--------------------
All orange fruits
All orange fruits that cost 12 bucks
[{'name': 'orange', 'color': 'orange', 'cost': 12}]
--------------------
All green fruits
All green fruits that cost 15 bucks
[{'name': 'pear', 'color': 'green', 'cost': 15}, {'name': 'grape', 'color': 'green', 'cost': 15}]