[ETR #84] Why 80% Of Execs Say DE is Essential (But Won't Hire Juniors)

Hi fellow data professional!

On a recent holiday, a family member and I were strolling along a beach, talking about AI disruption (relaxing, I know).

He, an attorney, assured me his job was AI-proof and jokingly offered to hire me when AI takes my data engineering job. If you ask executives at most companies, they’d find several flaws in that argument.

Over 80% of technical executives, including Chief Data Officers and Chief AI Officers, consider data engineering to be an essential role within a data organization, according to a joint survey between MIT and Snowflake. Nearly half of Chief Information Officers (CIO) said the same thing.

And now comes the bad news.

More than 75% of respondents felt data engineering workloads were getting heavier. While this reflects the survey sentiment of the role being a necessity, it also suggests that companies aren’t necessarily allocating resources to “beef up” teams responsible for data infrastructure.

As far as the workload goes, I can attest to this firsthand. I actually remarked to my boss I can’t remember the last “slow” Q4 we’ve had on my team.

The tension lies in how AI has radically changed the job description without actually reducing the headcount requirement. We are seeing a fundamental pivot from traditional ETL (Extract, Transform, Load) toward a heavier focus on data governance and quality. While GenAI can automate the boilerplate code for a pipeline, it cannot "know" if the underlying data is ethically sourced, compliant, or high-quality enough to feed a Large Language Model.

In this new paradigm, the data engineer isn't just a plumber; they are the architects of the "data supply chain." The workload is increasing because engineers are now expected to manage the complex metadata and vector databases that make RAG (Retrieval-Augmented Generation) possible.

The industry is currently in a "productivity trap." Execs see AI generating code and assume they can do more with less. But as pipelines become more automated, they also become more abstract and harder to troubleshoot. Relying solely on a few "super engineers" creates a single point of failure.

Near-term, this means that orgs may be tightening their belts and relying on a corps of AI-powered super engineers to build and maintain pipelines. But they will soon find this is unsustainable. There simply need to be entry-level engineers who don’t just do “grunt work,” but who are available to learn and grow, helping to replenish senior talent who will inevitably become overworked or hit a growth ceiling.

Execs may say the data engineering role is essential, but until I see more junior engineering positions posted, I won’t believe it.

When super engineer is the new standard, your portfolio can't just show that you can follow a tutorial; it has to show you can manage a system.

But I know the gap between a tutorial and a production system can feel like a massive "black box." So, I'm curious: What currently stops your progress when building independently?

What's Your Biggest "Brick Wall" When Building Projects?

Finding "real data"; I'm tired of Kaggle

The infrastructure; I struggle to get my code off the page and into the cloud

The why; I can code but I can't explain the architecture.

Documentation; I don't know how to make my work "production ready"

Thanks for ingesting,

-Zach