A common task in programming and development is finding items that exist already or recognising duplicates. This is done by checking membership of an element in a collection.
list is used as it is mutable (can be changed and items added to it) and it is suited for single items. However there are other data structures that could help – a hash – which in python is a
dict and a
Which data structure is best suited for this action for performance?
The Test Code
The full test code is not posted but the method for checking each type is here:
def check_dict_item(item): if my_dict.get(item): return True return False def check_set_item(item): if item in my_set: return True return False def check_list_item(item): if item in my_list: return True return False
Note: it was slightly slowed added about 0.3x more latency when using
if item in my_dict.keys()to check for membership with a dict.
µs: micro seconds (10^-6)
ns: nano seconds (10^-9)
Set (nano seconds 10^-9):
In : %timeit membership.check_set() 575 ns ± 3.37 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) In : %timeit -r 10 -n 1000 membership.check_set() 1.2 µs ± 127 ns per loop (mean ± std. dev. of 10 runs, 1,000 loops each)
In : %timeit membership.check_list() 396 µs ± 1.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) In : %timeit -r 10 -n 1000 membership.check_list() 398 µs ± 1.31 µs per loop (mean ± std. dev. of 10 runs, 1,000 loops each)
In : %timeit membership.check_dict() 787 ns ± 6.19 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) In : %timeit -r 10 -n 1000 membership.check_dict() 1.6 µs ± 224 ns per loop (mean ± std. dev. of 10 runs, 1,000 loops each)
$ python3 membership.py List: 0.0004038810729980469 (400 micro seconds) Set: 1.9073486328125e-06 (1.9 micro seconds) Dict: 2.1457672119140625e-06 (2.1 micro seconds)
It is very quick in all cases however the optimal choice is using a
The slowest option between the 3 is a list.