Extract. Transform. Read.

A newsletter from Pipeline

Hi past, present or future data professional!

From 2014-2017 I lived in Phoenix, Arizona and enjoyed the state’s best resident privilege: No daylight saving time. If you’re unaware (and if you're in the other 49 US states, you’re really unaware), March 9th was daylight saving, when we spring forward an hour.

If you think this messes up your microwave and oven clocks, just wait until you check on your data pipelines. Even though data teams are very aware of DST, this isn’t always something we account for when building and scaling pipelines.

To build DST-resistant pipelines, you need to set your schedule parameters to daylight time vs standard time. And even if you think your builds are properly calibrated before breaking for the weekend, I’d still remind a team it’s DST and, if possible, designate an on-call position to respond to issues that shouldn’t wait until the next weekday.

In addition to DST, a less frequent problem is creating schedules and variables that account for Leap Year. While you could be like one engineer I know and tell yourself it’s a “future me” problem, I’d recommend creating logic to check for instances of that extra February day in a given year. You can also use the datetime package’s .day method to output the correct day.

Much more common than either DST or Leap Year is what I call the “31 problem.” This is when you want to isolate date attributes but are lagging a day behind because of the few months that have 31 days.

For instance, say you need to create a file string that is supposed to say “March 31” but the datetime module hasn’t accounted for an extra day in March so your output becomes “April 31”, a date that doesn’t exist.

To learn how to solve this problem and for more in-depth analyses of date issues (including code snippets), I encourage you to read “Why Your Data Pipelines Will Fail On These 10 Days Every Year (And What To Do About It)”.

Happy DST and thanks for ingesting,

-Zach Quinn

Read more from Extract. Transform. Read.

[ETR #73] Your Data Project's Catch-22

Extract. Transform. Read. A newsletter from PipelineToDE Hi past, present or future data professional! After choosing a dataset, one of the most significant decisions you must make when creating displayable work is: How am I going to build this thing? For some, you may try to “vibe code” along with an LLM doing the grunt technical work. If you choose this approach, be warned: Nearly half of all “vibe code” generated contains security vulnerabilities and that’s before you even consider its...

17 days ago • 1 min read

[ETR #72] Indeed Just Killed Junior Devs

Extract. Transform. Read. A newsletter from PipelineToDE Amid layoff announcements from Meta, Amazon and even UPS, it's job aggregator Indeed that signals a different concern for entry-level data job seekers. This week a post on Blind revealed Indeed’s plan to quietly reduce junior roles. They’re not necessarily going to stop hiring or layoff juniors (though they are losing 1300 employees by end of year)—they’re just going to stop paying attention to them. Specifically, Indeed will no longer...

24 days ago • 1 min read

[ETR #71] Burnt Out in 2021: The Best Decision I Made (Not Coding)

Extract. Transform. Read. A newsletter from PipelineToDE Hi past, present or future data professional! I want to share the single most important realization I had back in the summer of 2021. I was burned out, juggling two part-time jobs, trying to plan a wedding, and drowning in full-time job applications. I felt overwhelmed and underprepared as I plunged into a sea of candidates I perceived to be more intelligent and better "fits" than me. My portfolio was full of the usual Titanic, Iris,...

about 1 month ago • 1 min read