Filtering medium to large amounts of data to extract a relevant subset is a very common task in any data related project. Often we do this on the basis of pandas dataframes. In this post I want to compare some filtering options for exact matches across multiple columns. The idea is pretty simple. We have a dataframe with multiple columns and rows as well as a list of conditions by which we want to extract data from it....
Imagine you’re running a sports competition with multiple competitions going on and you need to keep track of the top 10 fastest scores across all disciplines. As each athlete finishes competing in one or more games they want to know what their spot on the leaderboard is. What’s the fastest way to compute this across a range of competitions? Given a n x m matrix like you can see below where the rows are the disciplines and columns the top 10 spots, figure out where player p ranks in all disciplines based on their times....
I published a new blog post on the tecRacer AWS Blog: https://www.tecracer.com/blog/2023/09/teaching-boto3-to-store-floats-and-datetime-objects-in-dynamodb.html
I published a new blog post on the tecRacer AWS Blog: https://www.tecracer.com/blog/2023/08/handling-errors-and-retries-in-stepfunctions.html
I published a new blog post on the tecRacer AWS Blog: https://www.tecracer.com/blog/2023/08/hive_cursor_error-in-athena-when-reading-parquet-files-written-by-pandas.html
I published a new blog post on the tecRacer AWS Blog: https://www.tecracer.com/blog/2023/06/advanced-credential-rotation-for-iam-users-with-a-grace-period.html
I published a new blog post on the tecRacer AWS Blog: https://www.tecracer.com/blog/2023/04/avoiding-memoryerrors-when-working-with-parquet-data-in-pandas.html
I published a new blog post on the tecRacer AWS Blog: https://www.tecracer.com/blog/2023/03/the-beating-heart-of-sqs-of-heartbeats-and-watchdogs.html
One of the joys of software development is that small changes can sometimes make solving the same problem orders of magnitude faster. Revisiting previous solutions with more experience can lead to even better results. I show you how I improved the previous implementation by a factor of 2.7.
I published a new blog post on the tecRacer AWS Blog: https://www.tecracer.com/blog/2022/12/introduction-to-asynchronous-interactions-with-the-aws-api-in-python.html