AWS-Blog: Building Data Aggregation Pipelines using Apache Airflow and Athena

Business insights are frequently generated from aggregated data, like daily sales per market segment over time. In this blog post we’ll use Apache Airflow to build a data aggregation pipeline that utilizes Amazon Athena for the heavy lifting. We’ll cover best practices that you should follow to build a production-ready system.

2024-09-23 · 7 min · Maurice Borgmeier

AWS-Blog: Making the TPC-H dataset available in Athena using Airflow

The TPC-H dataset is commonly used to benchmark data warehouses or, more generally, decision support systems. It describes a typical e-commerce workload and includes benchmark queries to enable performance comparison between different data warehouses. I think the dataset is also useful to teach building different kinds of ETL or analytics workflows, so I decided to explore ways of making it available in Amazon Athena.

2024-08-29 · 7 min · Maurice Borgmeier

AWS-Blog: Enabling Apache Airflow to copy large S3 objects

If you’re trying to use Apache Airflow to copy large objects in S3, you might have encountered issues where S3 complains about you sending an InvalidRequest. We will fix that in this post by writing a custom operator to handle the underlying problem.

2024-08-27 · 3 min · Maurice Borgmeier

AWS-Blog: You can't Opt-Out of Performance Tracking in the AWS Console

Even though I had opted out of performance measurement cookies, I noticed a lot of web requests that look like performance measurement in the AWS console. In this article I investigate what’s being sent and what we can do about it.

2024-08-22 · 7 min · Maurice Borgmeier

AWS-Blog: Improving Accessibility by Generating Image-alt texts using GenAI

In this article, we’ll be using GenAI to generate alternative texts for images in Markdown documents, which will help people relying on screen readers to access your content.

2024-08-21 · 7 min · Maurice Borgmeier