Python Collections Module Tutorial
Python's collections module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict
, list
, set
, and tuple
.
This module has the following containers:
11. Counter()
22. namedtuple()
33. deque()
44. defaultdict()
55. OrderedDict()
66. UserDict()
77. UserString()
88. UserList()
99. ChainMap()
In my experience, out of all of these modules Counter, defaultdict, OrderedDict, and deque are the most useful ones. The following section explains how Counter, defaultdict, OrderedDict, and deque works.
Counter
Time complexity: Constructing it is O(n), because it has to iterate over the input, but operations on individual elements remain O(1)
code
1from collections import Counter
2
3items = ['B','B','A','B','C','A','B','B','A','C']
4counter = Counter(items)
5print(counter)
output
1Counter({'B': 5, 'A': 3, 'C': 2})
Print all the items with their occurrence numbers
1for item, count in counter.items():
2 print(item, count)
output
1B 5
2A 3
3C 2
Find which item is most common in a list
1items = ['B','B','A','B','C','A','B','B','A','C']
2
3counter = Counter(items)
4print(counter.most_common(1)) # [('B', 5)]
5print(counter.most_common(1)[0][0]) # B
6print(counter.most_common(1)[0][1]) # 5
output
1[('B', 5)]
2B
35
We can also pass dictionary in the Counter
1d = {'A': 3, 'B': 5, 'C': 2}
2counter = Counter(d)
3print(counter)
output
1Counter({'B': 5, 'A': 3, 'C': 2})
You can also pass elements and its count directly
1counter = Counter(A=3, B=5, C=2)
2print(counter)
output
1Counter({'B': 5, 'A': 3, 'C': 2})
We can also construct all the elements from the above program
1counter = Counter(A=3, B=5, C=2)
2print(sorted(counter.elements()))
output
1['A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C']
We can also add two Counter objects
1A = Counter(A=3, B=5, C=2)
2B = Counter(A=1, B=2, C=3)
3C = A + B
4print(C)
output
1Counter({'B': 7, 'C': 5, 'A': 4})
We can also subtract one from another
1A = Counter(A=3, B=5, C=2)
2B = Counter(A=1, B=2, C=3)
3A.subtract(B)
4print(A)
output
1Counter({'B': 3, 'A': 2, 'C': -1})
Find the total of all counts
1c = Counter(A=3, B=5, C=2)
2print(sum(c.values())) # 10
Now, let's see some very basic functions comes with Counter
1items = ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C']
2c = Counter(items)
3
4# print all the elements
5print(list(c.elements()))
6# print all the uniqe elements
7print(list(c))
8# print all the counts
9print(list(c.values()))
output
1['A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C']
2['A', 'B', 'C']
3[3, 5, 2]
defaultdict
Dict is one of the data structures available in Python which allows data to be stored in the form of key-value pairs.
Example:
1d = {'a': 2, 'b': 5, 'c': 6}
Problem with Dictionary
Dictionaries work well unless you encounter missing keys. Suppose you are looking for a key-value pair where there is no value in the dictionary - then you might encounter a KeyError
problem. Something like this:
1d = {'a': 2, 'b': 5, 'c': 6}
2d['z'] # z is not present in dict so it will throw a error
You will see something like this:
1Traceback (most recent call last):
2 File "<stdin>", line 2, in <module>
3 d['z']
4KeyError: 'z'
To overcome the above problem we can use different ways:
- using
get
- using
defaultdict
from collection module.
Using get
, if the value doesn't exist for a certain key, it prints None
1d = {'a': 2, 'b': 5, 'c': 6}
2
3print(d.get('b')) # 5
4print(d.get('d')) # None
Using defaultdict
with int
1from collections import defaultdict
2
3d = defaultdict(int)
4d['a'] = 1
5d['b'] = 2
6d['c'] = 3
7
8print(d['a']) # 1
9print(d['d']) # 0
int
: default will be an integer value of0
str
: default will be an empty string""
list
: default will be an empty list[]
1d = defaultdict(list)
2d['a'].append(1)
3d['a'].append(2)
4
5d['c'].append(3)
6d['c'].append(4)
7
8print(d['a']) # [1, 2]
9print(d['c']) # [3, 4]
10print(d['d']) # []
We can also set custom value for the default option
1d = defaultdict(lambda: 'Custom')
2d['a'] = 1
3d['b'] = 2
4
5print(d['a']) # 1
6print(d['b']) # 2
7print(d['c']) # Custom
We can also convert a normal dictionary to a defaultdict
1normal_dict = {'a': 1, 'b': 2, 'c': 3}
2
3# make dc as a defaultdict
4d = defaultdict(int, normal_dict)
5print(d['a']) # 1
6print(d['d']) # 0
Print the keys and values of a dictionary
1normal_dict = {'a': 1, 'b': 2}
2
3d = defaultdict(int, normal_dict)
4
5for k, v in d.items():
6 print(k, v)
output
1a 1
2b 2
OrderedDict
The only difference between OrderedDict and dict is that, in OrderedDict, it maintains the orders of keys as inserted. In the dict, the ordering may or may not happen.
deque
Python’s collections
module provides a class called deque
that’s specially designed to provide fast and memory-efficient ways to append and pop item from both ends of the underlying data structure.
Deque is preferred over a list
in the cases where we need quicker append and pop operations from both ends of the container, as deque
provides an O(1) time complexity for append and pop operations as compared to a list that provides O(n) time complexity.
1from collections import deque
2
3d = deque()
4d.append(1)
5# add item to the right
6d.append(2)
7d.append(3)
8
9print(d) # deque([1, 2, 3])
print all values of deque:
1for num in d:
2 print(num)
output
11
22
33
Add item to the left
1d.appendleft(4)
2print(d) # deque([4, 1, 2, 3])
We can also pass different iterable
to the deque.
1nums = [1, 2, 3, 4, 5]
2d = deque(nums)
3print(d) # deque([1, 2, 3, 4, 5])
4
5strs = "abcde"
6d = deque(strs)
7print(d) # deque(['a', 'b', 'c', 'd', 'e'])
Author: Sadman Kabir Soumik