[ETR #33] Avoid These DE Shortcuts


Extract. Transform. Read.

A newsletter from Pipeline

Hi past, present or future data professional!

For data engineering, a profession built on principles of automation, it can be counterintuitive to suggest that any optimizations or “shortcuts” could be negative.

But, as someone who was once a “baby engineer”, I can tell you that a combination of temptation and overconfidence will inevitably drive you to say “I could do without x development step.”

Doing so increases reputational risk (loss of credibility or trust) and, in a worst-case scenario, could even put your job at risk.

If you’re job searching or beginning your first role, there are 6 areas where I’d never even attempt to take “the easy route.”

  1. Testing only transformations, not load steps - it’s true that most of the “heavy lifting” of your pipeline occurs at the “T” of ETL (or ELT); but this doesn’t mean errors can’t occur when loading to your database. Take the time to create and load to a test table.
  2. Validating against expected volume, not metrics - Data engineers are primarily concerned with the shape of data, but it’s the content that matters. If the output is suspect, it’s important to understand what components are influencing critical fields like revenue. Take the time to meet and collaborate with technical partners, like data analysts, who understand the “what” of the data.
  3. Waiting too long to alert stakeholders of “bad data” - Being responsible for pipelines that produce “bad data” is objectively not a good look. But you can mitigate stakeholder concern and emotional responses by sounding the alarm as soon as you notice something is “off.” Providing a concise explanation of the issue, steps to resolve and estimated resolution timelines will help cushion the bad news you deliver.
  4. Assuming “someone else will fix it” - It’s too easy to assume, when an alert comes in, that “someone else will deal with it.” If your pipelines include an alerting component and you don’t receive a message that someone is checking the issue, take the initiative and, most importantly, let your team know you’re on it.
  5. Not testing in production-adjacent environments - Repeat after me: Your local IDE is not production. While your code may run flawlessly locally, you need to remember that production environments have different configurations and dependency requirements. Work to create a virtual environment or container that mimics these conditions to decrease the chances of something not deploying or functioning correctly.
  6. Saying “yes” to everything - While it’s tempting to cement yourself as the “go-to” for your team as a new engineer, you need to avoid the possibility of taking on too much grunt work. An abundance of grunt work increases your chances of becoming overwhelmed. But it also means you’re not working on projects that will raise your visibility and make an impact within the org, both of which are necessary to get you noticed for raises, promotions and general kudos. All of which make this sometimes thankless job a little better.

For an expansion on any of these areas, you can read the piece this was based on, “These 6 Data Engineering Shortcuts Will Burn You In Year 1” published in Pipeline earlier this week.

Thanks for ingesting,

-Zach Quinn

Extract. Transform. Read.

Reaching 20k+ readers on Medium and nearly 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.

Read more from Extract. Transform. Read.

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! While many tech-oriented companies have (in one way or another) reneged on remote working arrangements, my employer made an extreme gesture to demonstrate its commitment to the ongoing office-less lifestyle: It removed an entire floor of our two-floor New Jersey office space. Other companies, like Spotify, have unveiled slogans like “Our employees aren’t children. Spotify will continue working...

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! The only thing worse than summer temperatures (if you’re in the western hemisphere, that is) is a summer job search. Conventionally, summer isn’t the best time to apply for work; you could probably tell this if you’re currently working and find yourself accepting an overwhelming amount of OOO cal invites. If you are braving the heat of the job market, I want to share a more targeted and...

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! Well, it finally happened; AI has replaced a build I created and I’ve been made redundant. Thankfully, the person that created the AI integration was also me. And I did this on personal time so this isn’t an apocalyptic scenario. I’ve previously written about a handful of tools I created to optimize the “busy work” of blogging. One of the ways is by adding links to past relevant articles and...