[ETR #39] Your Pipelines Will Fail On These 10 Days


Extract. Transform. Read.

A newsletter from Pipeline

Hi past, present or future data professional!

From 2014-2017 I lived in Phoenix, Arizona and enjoyed the state’s best resident privilege: No daylight saving time. If you’re unaware (and if you're in the other 49 US states, you’re really unaware), March 9th was daylight saving, when we spring forward an hour.

If you think this messes up your microwave and oven clocks, just wait until you check on your data pipelines. Even though data teams are very aware of DST, this isn’t always something we account for when building and scaling pipelines.

To build DST-resistant pipelines, you need to set your schedule parameters to daylight time vs standard time. And even if you think your builds are properly calibrated before breaking for the weekend, I’d still remind a team it’s DST and, if possible, designate an on-call position to respond to issues that shouldn’t wait until the next weekday.

In addition to DST, a less frequent problem is creating schedules and variables that account for Leap Year. While you could be like one engineer I know and tell yourself it’s a “future me” problem, I’d recommend creating logic to check for instances of that extra February day in a given year. You can also use the datetime package’s .day method to output the correct day.

Much more common than either DST or Leap Year is what I call the “31 problem.” This is when you want to isolate date attributes but are lagging a day behind because of the few months that have 31 days.

For instance, say you need to create a file string that is supposed to say “March 31” but the datetime module hasn’t accounted for an extra day in March so your output becomes “April 31”, a date that doesn’t exist.

To learn how to solve this problem and for more in-depth analyses of date issues (including code snippets), I encourage you to read “Why Your Data Pipelines Will Fail On These 10 Days Every Year (And What To Do About It)”.

Happy DST and thanks for ingesting,

-Zach Quinn

Extract. Transform. Read.

Reaching 20k+ readers on Medium and nearly 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.

Read more from Extract. Transform. Read.

Extract. Transform. Read. A newsletter from PipelineToDE Hi past, present or future data professional! One of the most validating and terrifying professional moments is reaching the final interview round. It is in this context that you meet candidacy’s final boss, who incidentally, usually ends up being your boss' boss. Specifically I’m referring to the department executive responsible for bringing in additional headcount, i.e. you. While this may sound intimidating, the role of the executive...

Extract. Transform. Read. A newsletter from PipelineToDE Hi past, present or future data professional! If you’re a job seeker in the data space, your GitHub portfolio has only one job: To act as a calling card that gets you to the next step of the hiring process. Too often, I review portfolios for potential referrals and see brilliant code buried under structural mistakes that have nothing to do with programming skill. Your GitHub is not just cloud storage for your code; it’s a public display...

Extract. Transform. Read. A newsletter from PipelineToDE Hi past, present or future data professional! Despite crushing autocorrect scenarios, most AI code assistants like CoPilot miss a critical step when helping developers of any experience level: Validation. Arguably, leveraging an AI Agent to validate a code’s quality is on the user. But a surprising amount of experienced programmers are taking the worrying approach of believing an AI’s first “thought” when it comes to code that will...