Extract. Transform. Read.

A newsletter from Pipeline

Hi past, present or future data professional!

For data engineering, a profession built on principles of automation, it can be counterintuitive to suggest that any optimizations or “shortcuts” could be negative.

But, as someone who was once a “baby engineer”, I can tell you that a combination of temptation and overconfidence will inevitably drive you to say “I could do without x development step.”

Doing so increases reputational risk (loss of credibility or trust) and, in a worst-case scenario, could even put your job at risk.

If you’re job searching or beginning your first role, there are 6 areas where I’d never even attempt to take “the easy route.”

Testing only transformations, not load steps - it’s true that most of the “heavy lifting” of your pipeline occurs at the “T” of ETL (or ELT); but this doesn’t mean errors can’t occur when loading to your database. Take the time to create and load to a test table.
Validating against expected volume, not metrics - Data engineers are primarily concerned with the shape of data, but it’s the content that matters. If the output is suspect, it’s important to understand what components are influencing critical fields like revenue. Take the time to meet and collaborate with technical partners, like data analysts, who understand the “what” of the data.
Waiting too long to alert stakeholders of “bad data” - Being responsible for pipelines that produce “bad data” is objectively not a good look. But you can mitigate stakeholder concern and emotional responses by sounding the alarm as soon as you notice something is “off.” Providing a concise explanation of the issue, steps to resolve and estimated resolution timelines will help cushion the bad news you deliver.
Assuming “someone else will fix it” - It’s too easy to assume, when an alert comes in, that “someone else will deal with it.” If your pipelines include an alerting component and you don’t receive a message that someone is checking the issue, take the initiative and, most importantly, let your team know you’re on it.
Not testing in production-adjacent environments - Repeat after me: Your local IDE is not production. While your code may run flawlessly locally, you need to remember that production environments have different configurations and dependency requirements. Work to create a virtual environment or container that mimics these conditions to decrease the chances of something not deploying or functioning correctly.
Saying “yes” to everything - While it’s tempting to cement yourself as the “go-to” for your team as a new engineer, you need to avoid the possibility of taking on too much grunt work. An abundance of grunt work increases your chances of becoming overwhelmed. But it also means you’re not working on projects that will raise your visibility and make an impact within the org, both of which are necessary to get you noticed for raises, promotions and general kudos. All of which make this sometimes thankless job a little better.

For an expansion on any of these areas, you can read the piece this was based on, “These 6 Data Engineering Shortcuts Will Burn You In Year 1” published in Pipeline earlier this week.

Thanks for ingesting,

-Zach Quinn

Extract. Transform. Read.

[ETR #33] Avoid These DE Shortcuts

Extract. Transform. Read.

[ETR #53] Companies Actually Still Hiring Remotely

[ETR #52] 5-Step Prompt To Land More Interviews

[ETR #51] Why AI Can Take Some DE Jobs