Python Collections Module Tutorial

Python's collections module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.

This module has the following containers:

11. Counter()
22. namedtuple()
33. deque()
44. defaultdict()
55. OrderedDict()
66. UserDict()
77. UserString()
88. UserList()
99. ChainMap()

In my experience, out of all of these modules Counter, defaultdict, OrderedDict, and deque are the most useful ones. The following section explains how Counter, defaultdict, OrderedDict, and deque works.

Counter

Time complexity: Constructing it is O(n), because it has to iterate over the input, but operations on individual elements remain O(1)

code

1from collections import Counter
2
3items = ['B','B','A','B','C','A','B','B','A','C']
4counter = Counter(items)
5print(counter)

output

1Counter({'B': 5, 'A': 3, 'C': 2})

Print all the items with their occurrence numbers

1for item, count in counter.items():
2    print(item, count)

output

1B 5
2A 3
3C 2

Find which item is most common in a list

1items = ['B','B','A','B','C','A','B','B','A','C']
2
3counter = Counter(items)
4print(counter.most_common(1))          # [('B', 5)]
5print(counter.most_common(1)[0][0])    # B
6print(counter.most_common(1)[0][1])    # 5

output

1[('B', 5)]
2B
35

We can also pass dictionary in the Counter

1d = {'A': 3, 'B': 5, 'C': 2}
2counter = Counter(d)
3print(counter)

output

1Counter({'B': 5, 'A': 3, 'C': 2})

You can also pass elements and its count directly

1counter = Counter(A=3, B=5, C=2)
2print(counter)

output

1Counter({'B': 5, 'A': 3, 'C': 2})

We can also construct all the elements from the above program

1counter = Counter(A=3, B=5, C=2)
2print(sorted(counter.elements()))

output

1['A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C']

We can also add two Counter objects

1A = Counter(A=3, B=5, C=2)
2B = Counter(A=1, B=2, C=3)
3C = A + B
4print(C)

output

1Counter({'B': 7, 'C': 5, 'A': 4})

We can also subtract one from another

1A = Counter(A=3, B=5, C=2)
2B = Counter(A=1, B=2, C=3)
3A.subtract(B)
4print(A)

output

1Counter({'B': 3, 'A': 2, 'C': -1})

Find the total of all counts

1c = Counter(A=3, B=5, C=2)
2print(sum(c.values()))  # 10

Now, let's see some very basic functions comes with Counter

1items = ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C']
2c = Counter(items)
3
4# print all the elements
5print(list(c.elements()))
6# print all the uniqe elements
7print(list(c))
8# print all the counts
9print(list(c.values()))

output

1['A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C']
2['A', 'B', 'C']
3[3, 5, 2]

defaultdict

Dict is one of the data structures available in Python which allows data to be stored in the form of key-value pairs.

Example:

1d = {'a': 2, 'b': 5, 'c': 6}
Problem with Dictionary

Dictionaries work well unless you encounter missing keys. Suppose you are looking for a key-value pair where there is no value in the dictionary - then you might encounter a KeyError problem. Something like this:

1d = {'a': 2, 'b': 5, 'c': 6}
2d['z']  # z is not present in dict so it will throw a error

You will see something like this:

1Traceback (most recent call last):
2  File "<stdin>", line 2, in <module>
3    d['z']
4KeyError: 'z'

To overcome the above problem we can use different ways:

  1. using get
  2. using defaultdict from collection module.

Using get , if the value doesn't exist for a certain key, it prints None

1d = {'a': 2, 'b': 5, 'c': 6}
2
3print(d.get('b'))    # 5
4print(d.get('d'))    # None

Using defaultdict with int

1from collections import defaultdict
2
3d = defaultdict(int)
4d['a'] = 1
5d['b'] = 2
6d['c'] = 3
7
8print(d['a'])     # 1
9print(d['d'])     # 0
  • int: default will be an integer value of 0
  • str: default will be an empty string ""
  • list: default will be an empty list []
 1d = defaultdict(list)
 2d['a'].append(1)
 3d['a'].append(2)
 4
 5d['c'].append(3)
 6d['c'].append(4)
 7
 8print(d['a'])      # [1, 2]
 9print(d['c'])      # [3, 4]
10print(d['d'])      # []

We can also set custom value for the default option

1d = defaultdict(lambda: 'Custom')
2d['a'] = 1
3d['b'] = 2
4
5print(d['a'])   # 1
6print(d['b'])   # 2
7print(d['c'])   # Custom

We can also convert a normal dictionary to a defaultdict

1normal_dict = {'a': 1, 'b': 2, 'c': 3}
2
3# make dc as a defaultdict
4d = defaultdict(int, normal_dict)
5print(d['a'])    # 1
6print(d['d'])    # 0

Print the keys and values of a dictionary

1normal_dict = {'a': 1, 'b': 2}
2
3d = defaultdict(int, normal_dict)
4
5for k, v in d.items():
6    print(k, v)

output

1a 1
2b 2

OrderedDict

The only difference between OrderedDict and dict is that, in OrderedDict, it maintains the orders of keys as inserted. In the dict, the ordering may or may not happen.

deque

Python’s collections module provides a class called deque that’s specially designed to provide fast and memory-efficient ways to append and pop item from both ends of the underlying data structure.

Deque is preferred over a list in the cases where we need quicker append and pop operations from both ends of the container, as deque provides an O(1) time complexity for append and pop operations as compared to a list that provides O(n) time complexity.

1from collections import deque
2
3d = deque()
4d.append(1)
5# add item to the right
6d.append(2)
7d.append(3)
8
9print(d)     # deque([1, 2, 3])

print all values of deque:

1for num in d:
2    print(num)

output

11
22
33

Add item to the left

1d.appendleft(4)
2print(d)        # deque([4, 1, 2, 3])

We can also pass different iterable to the deque.

1nums = [1, 2, 3, 4, 5]
2d = deque(nums)
3print(d)          # deque([1, 2, 3, 4, 5])
4
5strs = "abcde"
6d = deque(strs)
7print(d)          # deque(['a', 'b', 'c', 'd', 'e'])

Author: Sadman Kabir Soumik

comments powered by Disqus