[ETR #33] Avoid These DE Shortcuts


Extract. Transform. Read.

A newsletter from Pipeline

Hi past, present or future data professional!

For data engineering, a profession built on principles of automation, it can be counterintuitive to suggest that any optimizations or “shortcuts” could be negative.

But, as someone who was once a “baby engineer”, I can tell you that a combination of temptation and overconfidence will inevitably drive you to say “I could do without x development step.”

Doing so increases reputational risk (loss of credibility or trust) and, in a worst-case scenario, could even put your job at risk.

If you’re job searching or beginning your first role, there are 6 areas where I’d never even attempt to take “the easy route.”

  1. Testing only transformations, not load steps - it’s true that most of the “heavy lifting” of your pipeline occurs at the “T” of ETL (or ELT); but this doesn’t mean errors can’t occur when loading to your database. Take the time to create and load to a test table.
  2. Validating against expected volume, not metrics - Data engineers are primarily concerned with the shape of data, but it’s the content that matters. If the output is suspect, it’s important to understand what components are influencing critical fields like revenue. Take the time to meet and collaborate with technical partners, like data analysts, who understand the “what” of the data.
  3. Waiting too long to alert stakeholders of “bad data” - Being responsible for pipelines that produce “bad data” is objectively not a good look. But you can mitigate stakeholder concern and emotional responses by sounding the alarm as soon as you notice something is “off.” Providing a concise explanation of the issue, steps to resolve and estimated resolution timelines will help cushion the bad news you deliver.
  4. Assuming “someone else will fix it” - It’s too easy to assume, when an alert comes in, that “someone else will deal with it.” If your pipelines include an alerting component and you don’t receive a message that someone is checking the issue, take the initiative and, most importantly, let your team know you’re on it.
  5. Not testing in production-adjacent environments - Repeat after me: Your local IDE is not production. While your code may run flawlessly locally, you need to remember that production environments have different configurations and dependency requirements. Work to create a virtual environment or container that mimics these conditions to decrease the chances of something not deploying or functioning correctly.
  6. Saying “yes” to everything - While it’s tempting to cement yourself as the “go-to” for your team as a new engineer, you need to avoid the possibility of taking on too much grunt work. An abundance of grunt work increases your chances of becoming overwhelmed. But it also means you’re not working on projects that will raise your visibility and make an impact within the org, both of which are necessary to get you noticed for raises, promotions and general kudos. All of which make this sometimes thankless job a little better.

For an expansion on any of these areas, you can read the piece this was based on, “These 6 Data Engineering Shortcuts Will Burn You In Year 1” published in Pipeline earlier this week.

Thanks for ingesting,

-Zach Quinn

Extract. Transform. Read.

Reaching 20k+ readers on Medium and nearly 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.

Read more from Extract. Transform. Read.

Hi past, present or future data professional! As the winter holidays approach, we’re entering a period of downtime for most orgs. Assuming your employer has hit goals (or accepted losses), allocated coverage for the slew of inevitable vacation requests and maybe even entered a “code freeze”, you’re entering data & tech’s slow season. If you’re working, during this time you may be asked to do any number of “downtime” (actual free time, not data outages) tasks ranging from code refactors to...

Hi past, present or future data professional! If you’re in the U.S., Happy Thanksgiving! I’m prepping for my food coma, so I’ll make this week’s newsletter quick. Like millions of Americans, I’ll be watching NFL football (go Ravens!). The average NFL game is 3 hours. If you can skip just one of today’s games and carve out that time for professional development, here’s how I’d spend it. In the spirit of football, I’ll split the time designation into 4 quarters. Documentation pass - if you read...

Extract. Transform. Read. A newsletter from PipelineToDE Hi past, present or future data professional! In 2 weeks or so The Oxford English Dictionary will reveal its 2025 word of the year, a semi-democratic process that lends academic legitimacy to words like “rizz” (2023’s pick). If you’re currently employed or interact with white collar workers, you would think the word of the year is “headwinds.” Used in a sentence: “We’ve pivoted our AI strategy but still encountered headwinds that...