Overview of itertools groupby¶
Let us understand how we can use itertools.groupby
to take care of aggregations by key.
itertools.groupby
can be used to get the data grouped by a key.- It can be used to take care of use cases similar to following by using aggregate functions after grouping by key.
- Get count by order status.
- Get revenue for each order.
- Get order count by month.
- We need to ensure data is pre-sorted by the key, so that all the values associated with each key are grouped together.
In [1]:
import itertools as iter
In [2]:
iter.groupby?
Init signature: iter.groupby(iterable, key=None) Docstring: make an iterator that returns consecutive keys and groups from the iterable iterable Elements to divide into groups according to the key function. key A function for computing the group category for each element. If the key function is not specified or is None, the element itself is used for grouping. Type: type Subclasses:
In [3]:
l = [1, 1, 3, 2, 1, 3, 2]
In [4]:
l_grouped = iter.groupby(l)
In [5]:
type(l_grouped)
Out[5]:
itertools.groupby
In [6]:
list(l_grouped)
Out[6]:
[(1, <itertools._grouper at 0x7fe2946ee2b0>), (3, <itertools._grouper at 0x7fe2946ee310>), (2, <itertools._grouper at 0x7fe2946ee2e0>), (1, <itertools._grouper at 0x7fe2946ee4f0>), (3, <itertools._grouper at 0x7fe2946ee1f0>), (2, <itertools._grouper at 0x7fe2946ee640>)]
In [7]:
l_sorted = sorted(l)
In [8]:
ls_grouped = iter.groupby(l_sorted)
In [9]:
list(ls_grouped)
Out[9]:
[(1, <itertools._grouper at 0x7fe2946ee730>), (2, <itertools._grouper at 0x7fe2946ee580>), (3, <itertools._grouper at 0x7fe2946ee190>)]
{note}
Rebuilding l_sorted and ls_grouped as ls_grouped will be flushed out after being read by `list(ls_grouped)`.
In [10]:
l_sorted = sorted(l)
In [11]:
ls_grouped = iter.groupby(l_sorted)
In [12]:
for e in ls_grouped:
print(list(e[1]))
[1, 1, 1] [2, 2] [3, 3]
In [13]:
l_sorted = sorted(l)
ls_grouped = iter.groupby(l_sorted)
list(iter.starmap(lambda key, values: (key, len(list(values))), ls_grouped))
Out[13]:
[(1, 3), (2, 2), (3, 2)]