Extract. Transform. Read.A newsletter from Pipeline Hi past, present or future data professional! From 2014-2017 I lived in Phoenix, Arizona and enjoyed the state’s best resident privilege: No daylight saving time. If you’re unaware (and if you're in the other 49 US states, you’re really unaware), March 9th was daylight saving, when we spring forward an hour. If you think this messes up your microwave and oven clocks, just wait until you check on your data pipelines. Even though data teams are very aware of DST, this isn’t always something we account for when building and scaling pipelines. To build DST-resistant pipelines, you need to set your schedule parameters to daylight time vs standard time. And even if you think your builds are properly calibrated before breaking for the weekend, I’d still remind a team it’s DST and, if possible, designate an on-call position to respond to issues that shouldn’t wait until the next weekday. In addition to DST, a less frequent problem is creating schedules and variables that account for Leap Year. While you could be like one engineer I know and tell yourself it’s a “future me” problem, I’d recommend creating logic to check for instances of that extra February day in a given year. You can also use the datetime package’s .day method to output the correct day. Much more common than either DST or Leap Year is what I call the “31 problem.” This is when you want to isolate date attributes but are lagging a day behind because of the few months that have 31 days. For instance, say you need to create a file string that is supposed to say “March 31” but the datetime module hasn’t accounted for an extra day in March so your output becomes “April 31”, a date that doesn’t exist. To learn how to solve this problem and for more in-depth analyses of date issues (including code snippets), I encourage you to read “Why Your Data Pipelines Will Fail On These 10 Days Every Year (And What To Do About It)”. Happy DST and thanks for ingesting, -Zach Quinn |
Reaching 20k+ readers on Medium and over 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.
Hi fellow data professional! For my health I try not to argue with my wife; but when she told me her networking plan I had to push back. Some context. She’s exploring career paths within the multinational corp she works for and wanted to meet with a friend of a family member. The catch? She felt weird about leveraging a personal connection and wanted to reach out cold. This is the wrong approach. Western culture demonizes nepotism but, truthfully, sometimes a connection is so painfully...
Hi fellow data professional! It’s baseball season in the U.S., a game defined by the "on-deck" line up. Before a player takes a big swing at the plate, they are already there, weighted bat in hand, timing the pitcher (who has to move a bit faster now thanks to the pitch clock), fully prepared for their moment. They don’t start looking for their helmet only after the umpire calls them up. In your early career perhaps you're considering "taking a big swing" by applying for that dream role at a...
Hi fellow data professional! In undergrad, in pursuit of a coveted TV internship, I once cold messaged an alum of my school using an email I found on his acting reel. When we finally got on the phone it wasn’t the warm handshake connection I was seeking; he spent time grilling me on my intentions and skills. After I hung up I thought “what a jerk.” In my yet-to-be-developed mind I thought as long as I went to the effort of getting someone on the phone they’d reward that initiative with a job,...