[ETR #19] 1 Question New Data Engineers Can't Ask


Extract. Transform. Read.

A newsletter from Pipeline

Hi past, present or future data professional!

I recently participated in a technical design meeting that was derailed by a single, fundamental question.

“Why?”

Despite the fact that I worked with the particular data source we were discussing for nearly two years, I fell into the common trap of going “on autopilot” and failing to question the initial need for the data. At this point, you would think asking “why” of years’ worth of work would be offensive.

Instead of myself or other team members getting defensive, it led to a productive conversation about not just refining our approach to ingestion, but also inspired talk of how we can manage stakeholder expectations and softly encourage them to “do more with less.”

Fortunately, you don’t need to derail a meeting to leverage what I call a productive why. Asking occasional, tactful “whys” can position you as a critical thinker and thought leader (or at least an enthusiastic thought contributor) within your org. When appropriate, consider asking…

  • Why are we using x tool over y when x clearly offers a more streamlined integration with our data warehouse?
  • Why are we dedicating development resources to solving this issue when there isn’t a clear business outcome?
  • Why are stakeholders asking for a new data pipeline when this existing table provides nearly all of the dimensions they’re seeking?
  • Why are we paying for x service when we could feasibly build our own solution?

I realize you may not be in a professional role; nonetheless, I’ve found a lot of value can result from occasionally asking “why” even when you’re simply writing code.

For instance, I was a habitual user of Pandas’ .append() method. Unfortunately, to my disappointment, Pandas 2.0 deprecated .append() in the past year. I easily could have panicked and said “Iterating and appending key values to an empty data frame is how I’ve always converted JSON to a data frame. What am I going to do?” But being forced to adapt to the change made me think about what prompted that habit initially.

To learn what that motivation was plus how a simple "why" nearly left me tongue-tied in an interview, read the latest on Pipeline.

And so you don’t have to question where those hyperlinks go, here they are as plain text.

Questions? zach@pipelinetode.com

Thanks for ingesting,

-Zach Quinn

Pipeline To DE

Top data engineering writer on Medium & Senior Data Engineer in media; I use my skills as a former journalist to demystify data science/programming concepts so beginners to professionals can target, land and excel in data-driven roles.

Read more from Pipeline To DE

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! From 2014-2017 I lived in Phoenix, Arizona and enjoyed the state’s best resident privilege: No daylight saving time. If you’re unaware (and if you're in the other 49 US states, you’re really unaware), March 9th was daylight saving, when we spring forward an hour. If you think this messes up your microwave and oven clocks, just wait until you check on your data pipelines. Even though data teams...

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! As difficult as data engineering can be, 95% of the time there is a structure to data that originates from external streams, APIs and vendor file deliveries. Useful context is provided via documentation and stakeholder requirements. And specific libraries and SDKs exist to help speed up the pipeline build process. But what about the other 5% of the time when requirements might be structured, but...

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! To clarify the focus of this edition of the newsletter, the reason you shouldn’t bother learning certain data engineering skills is due to one of two scenarios— You won’t need them You’ll learn them on the job You won’t need them Generally these are peripheral skills that you *technically* need but will hardly ever use. One of the most obvious skills, for most data engineering teams, is any...