A common task in programming and development is finding items that exist already or recognising duplicates. This is done by checking membership of an element in a collection.
Often a list
is used as it is mutable (can be changed and items added to it) and it is suited for single items. However there are other data structures that could help – a hash – which in python is a dict
and a set
.
Which data structure is best suited for this action for performance?
The Test Code
The full test code is not posted but the method for checking each type is here:
def check_dict_item(item):
if my_dict.get(item):
return True
return False
def check_set_item(item):
if item in my_set:
return True
return False
def check_list_item(item):
if item in my_list:
return True
return False
Note: it was slightly slowed added about 0.3x more latency when using
if item in my_dict.keys()
to check for membership with a dict.
The results
µs
: micro seconds (10^-6)ns
: nano seconds (10^-9)
Using timeit
Set (nano seconds 10^-9):
In [2]: %timeit membership.check_set()
575 ns ± 3.37 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [5]: %timeit -r 10 -n 1000 membership.check_set()
1.2 µs ± 127 ns per loop (mean ± std. dev. of 10 runs, 1,000 loops each)
List:
In [3]: %timeit membership.check_list()
396 µs ± 1.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [4]: %timeit -r 10 -n 1000 membership.check_list()
398 µs ± 1.31 µs per loop (mean ± std. dev. of 10 runs, 1,000 loops each)
Dict:
In [4]: %timeit membership.check_dict()
787 ns ± 6.19 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit -r 10 -n 1000 membership.check_dict()
1.6 µs ± 224 ns per loop (mean ± std. dev. of 10 runs, 1,000 loops each)
Using time
Main method:
$ python3 membership.py
List: 0.0004038810729980469 (400 micro seconds)
Set: 1.9073486328125e-06 (1.9 micro seconds)
Dict: 2.1457672119140625e-06 (2.1 micro seconds)
Conclusion
It is very quick in all cases however the optimal choice is using a set()
.
The slowest option between the 3 is a list.