[ETR #48] Impress With This 3-Step Alerting Project


Extract. Transform. Read.

A newsletter from Pipeline

Hi past, present or future data professional!

I don’t know about you, but this email won’t be the only thing I scroll through today.

Incidentally, one of the most infinitely scroll-able (and doom scroll-able) sites, Reddit, was my source for a data project demonstrating alerting systems.

And the best part?

You can and should absolutely steal this approach to gain an understanding of how to monitor ingested data and trigger alerts. This project demonstrates practical skills in data extraction, processing, and automation – and it's easier than you might think.

The TLDR of my use case for extracting the content of the Reddit API is monitoring a watch sale subreddit for a particular model (an Omega Seamaster 2531.80 if you happen to be a fellow watch enthusiast).

To do this, I ingested the 100 latest posts from the Reddit thread, used a string match to determine whether the model number was included in the post and triggered an automated email with the seller’s link, which was ultimately delivered to my gmail.

Here's the basic process:

Extract Data

  • Get the Reddit info you want making a single API call
  • Make specific requests to this end point: “https://oauth.reddit.com/{subreddit}/{new/hot/best}”
  • Filter the data: Are you looking at certain subreddits? Keywords? Users?

Define Triggers

  • Set the rules that determine when you get an alert
  • Create a regex expression to search a field for a given string (psst AI Agents can help make regex writing less tedious)
  • Example: Email me when the “post” field in r/dataengineering mentions "Airflow."

Connect to Email

  • Automate the email send when your rules are met
  • Establish a custom subject and body containing the link or links to relevant posts
  • If you prefer gmail, you can read the story (linked at the bottom) for a template to create and send a simple email

Although this is a simple application, this custom alerting system has real-world application. Including this or a similar monitoring project in your portfolio shows you’re not only dedicated to creating data pipelines, but that you have the professional instinct to be accountable for the output.

By creating alerts for specific events (errors, delays, unusual data), you can:

  • Catch problems early.
  • Reduce downtime.
  • Keep your data reliable.

This helps ensure smooth and efficient data systems.

For a complete walkthrough, including code, read the story on Medium: https://medium.com/pipeline-a-data-engineering-resource/engineering-custom-email-alerts-for-reddit-in-3-steps-710e5191bd3f

Happy alerting and thanks for ingesting,

-Zach Quinn

Extract. Transform. Read.

Reaching 20k+ readers on Medium and nearly 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.

Read more from Extract. Transform. Read.

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! While many tech-oriented companies have (in one way or another) reneged on remote working arrangements, my employer made an extreme gesture to demonstrate its commitment to the ongoing office-less lifestyle: It removed an entire floor of our two-floor New Jersey office space. Other companies, like Spotify, have unveiled slogans like “Our employees aren’t children. Spotify will continue working...

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! The only thing worse than summer temperatures (if you’re in the western hemisphere, that is) is a summer job search. Conventionally, summer isn’t the best time to apply for work; you could probably tell this if you’re currently working and find yourself accepting an overwhelming amount of OOO cal invites. If you are braving the heat of the job market, I want to share a more targeted and...

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! Well, it finally happened; AI has replaced a build I created and I’ve been made redundant. Thankfully, the person that created the AI integration was also me. And I did this on personal time so this isn’t an apocalyptic scenario. I’ve previously written about a handful of tools I created to optimize the “busy work” of blogging. One of the ways is by adding links to past relevant articles and...