Even more efficient hashing of columns in a pandas dataframe

One of the joys of software development is that small changes can sometimes make solving the same problem orders of magnitude faster. Revisiting previous solutions with more experience can lead to even better results. I show you how I improved the previous implementation by a factor of 2.7.

2022-12-20 · 5 min · Maurice Borgmeier

Efficiently hashing columns in a pandas dataframe

One of the joys of software development is that small changes can sometimes make solving the same problem orders of magnitude faster. I experienced this recently when implementing a function to generate a hash over multiple columns in a dataframe. Today I’m going to show you how I came up with that solution.

2022-09-18 · 9 min · Maurice Borgmeier