Code Optimization: Filtering dataframes using exact matches in multiple columns

Filtering medium to large amounts of data to extract a relevant subset is a very common task in any data related project. Often we do this on the basis of pandas dataframes. In this post I want to compare some filtering options for exact matches across multiple columns. The idea is pretty simple. We have a dataframe with multiple columns and rows as well as a list of conditions by which we want to extract data from it....

2023-11-17 · 8 min · Maurice Borgmeier

Even more efficient hashing of columns in a pandas dataframe

One of the joys of software development is that small changes can sometimes make solving the same problem orders of magnitude faster. Revisiting previous solutions with more experience can lead to even better results. I show you how I improved the previous implementation by a factor of 2.7.

2022-12-20 · 5 min · Maurice Borgmeier

Efficiently hashing columns in a pandas dataframe

One of the joys of software development is that small changes can sometimes make solving the same problem orders of magnitude faster. I experienced this recently when implementing a function to generate a hash over multiple columns in a dataframe. Today I’m going to show you how I came up with that solution.

2022-09-18 · 9 min · Maurice Borgmeier