Extract. Transform. Read.A newsletter from Pipeline Hi past, present or future data professional! From 2014-2017 I lived in Phoenix, Arizona and enjoyed the state’s best resident privilege: No daylight saving time. If you’re unaware (and if you're in the other 49 US states, you’re really unaware), March 9th was daylight saving, when we spring forward an hour. If you think this messes up your microwave and oven clocks, just wait until you check on your data pipelines. Even though data teams are very aware of DST, this isn’t always something we account for when building and scaling pipelines. To build DST-resistant pipelines, you need to set your schedule parameters to daylight time vs standard time. And even if you think your builds are properly calibrated before breaking for the weekend, I’d still remind a team it’s DST and, if possible, designate an on-call position to respond to issues that shouldn’t wait until the next weekday. In addition to DST, a less frequent problem is creating schedules and variables that account for Leap Year. While you could be like one engineer I know and tell yourself it’s a “future me” problem, I’d recommend creating logic to check for instances of that extra February day in a given year. You can also use the datetime package’s .day method to output the correct day. Much more common than either DST or Leap Year is what I call the “31 problem.” This is when you want to isolate date attributes but are lagging a day behind because of the few months that have 31 days. For instance, say you need to create a file string that is supposed to say “March 31” but the datetime module hasn’t accounted for an extra day in March so your output becomes “April 31”, a date that doesn’t exist. To learn how to solve this problem and for more in-depth analyses of date issues (including code snippets), I encourage you to read “Why Your Data Pipelines Will Fail On These 10 Days Every Year (And What To Do About It)”. Happy DST and thanks for ingesting, -Zach Quinn |
Reaching 20k+ readers on Medium and nearly 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.
Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! It’s hardly controversial to say debugging is everyone’s least favorite part of programming. One widely-used debugging method is the rubber duck method, popularized in Pragmatic Programming, which suggests you talk through your code, aloud, to an inanimate object. Being able to speak intelligently about what prompted a technical decision is one of the most underrated data engineering skills. One...
Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! If you’re like me, in school you were always envious of your classmates that may not have applied themselves academically but were “good test takers.” Fortunately (for them at least), these folks would likely do well on what is quietly becoming the SAT of programming the GCA, or General Coding Assessment. Now, the General Coding Assessment isn’t any kind of board certifying test like the Bar...
Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! While many tech-oriented companies have (in one way or another) reneged on remote working arrangements, my employer made an extreme gesture to demonstrate its commitment to the ongoing office-less lifestyle: It removed an entire floor of our two-floor New Jersey office space. Other companies, like Spotify, have unveiled slogans like “Our employees aren’t children. Spotify will continue working...