[ETR #33] Avoid These DE Shortcuts


Extract. Transform. Read.

A newsletter from Pipeline

Hi past, present or future data professional!

For data engineering, a profession built on principles of automation, it can be counterintuitive to suggest that any optimizations or “shortcuts” could be negative.

But, as someone who was once a “baby engineer”, I can tell you that a combination of temptation and overconfidence will inevitably drive you to say “I could do without x development step.”

Doing so increases reputational risk (loss of credibility or trust) and, in a worst-case scenario, could even put your job at risk.

If you’re job searching or beginning your first role, there are 6 areas where I’d never even attempt to take “the easy route.”

  1. Testing only transformations, not load steps - it’s true that most of the “heavy lifting” of your pipeline occurs at the “T” of ETL (or ELT); but this doesn’t mean errors can’t occur when loading to your database. Take the time to create and load to a test table.
  2. Validating against expected volume, not metrics - Data engineers are primarily concerned with the shape of data, but it’s the content that matters. If the output is suspect, it’s important to understand what components are influencing critical fields like revenue. Take the time to meet and collaborate with technical partners, like data analysts, who understand the “what” of the data.
  3. Waiting too long to alert stakeholders of “bad data” - Being responsible for pipelines that produce “bad data” is objectively not a good look. But you can mitigate stakeholder concern and emotional responses by sounding the alarm as soon as you notice something is “off.” Providing a concise explanation of the issue, steps to resolve and estimated resolution timelines will help cushion the bad news you deliver.
  4. Assuming “someone else will fix it” - It’s too easy to assume, when an alert comes in, that “someone else will deal with it.” If your pipelines include an alerting component and you don’t receive a message that someone is checking the issue, take the initiative and, most importantly, let your team know you’re on it.
  5. Not testing in production-adjacent environments - Repeat after me: Your local IDE is not production. While your code may run flawlessly locally, you need to remember that production environments have different configurations and dependency requirements. Work to create a virtual environment or container that mimics these conditions to decrease the chances of something not deploying or functioning correctly.
  6. Saying “yes” to everything - While it’s tempting to cement yourself as the “go-to” for your team as a new engineer, you need to avoid the possibility of taking on too much grunt work. An abundance of grunt work increases your chances of becoming overwhelmed. But it also means you’re not working on projects that will raise your visibility and make an impact within the org, both of which are necessary to get you noticed for raises, promotions and general kudos. All of which make this sometimes thankless job a little better.

For an expansion on any of these areas, you can read the piece this was based on, “These 6 Data Engineering Shortcuts Will Burn You In Year 1” published in Pipeline earlier this week.

Thanks for ingesting,

-Zach Quinn

Extract. Transform. Read.

Reaching 20k+ readers on Medium and nearly 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.

Read more from Extract. Transform. Read.

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! It’s hardly controversial to say debugging is everyone’s least favorite part of programming. One widely-used debugging method is the rubber duck method, popularized in Pragmatic Programming, which suggests you talk through your code, aloud, to an inanimate object. Being able to speak intelligently about what prompted a technical decision is one of the most underrated data engineering skills. One...

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! If you’re like me, in school you were always envious of your classmates that may not have applied themselves academically but were “good test takers.” Fortunately (for them at least), these folks would likely do well on what is quietly becoming the SAT of programming the GCA, or General Coding Assessment. Now, the General Coding Assessment isn’t any kind of board certifying test like the Bar...

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! While many tech-oriented companies have (in one way or another) reneged on remote working arrangements, my employer made an extreme gesture to demonstrate its commitment to the ongoing office-less lifestyle: It removed an entire floor of our two-floor New Jersey office space. Other companies, like Spotify, have unveiled slogans like “Our employees aren’t children. Spotify will continue working...