|
Hi fellow data professional! On a recent holiday, a family member and I were strolling along a beach, talking about AI disruption (relaxing, I know). He, an attorney, assured me his job was AI-proof and jokingly offered to hire me when AI takes my data engineering job. If you ask executives at most companies, they’d find several flaws in that argument. Over 80% of technical executives, including Chief Data Officers and Chief AI Officers, consider data engineering to be an essential role within a data organization, according to a joint survey between MIT and Snowflake. Nearly half of Chief Information Officers (CIO) said the same thing. And now comes the bad news. More than 75% of respondents felt data engineering workloads were getting heavier. While this reflects the survey sentiment of the role being a necessity, it also suggests that companies aren’t necessarily allocating resources to “beef up” teams responsible for data infrastructure. As far as the workload goes, I can attest to this firsthand. I actually remarked to my boss I can’t remember the last “slow” Q4 we’ve had on my team. The tension lies in how AI has radically changed the job description without actually reducing the headcount requirement. We are seeing a fundamental pivot from traditional ETL (Extract, Transform, Load) toward a heavier focus on data governance and quality. While GenAI can automate the boilerplate code for a pipeline, it cannot "know" if the underlying data is ethically sourced, compliant, or high-quality enough to feed a Large Language Model. In this new paradigm, the data engineer isn't just a plumber; they are the architects of the "data supply chain." The workload is increasing because engineers are now expected to manage the complex metadata and vector databases that make RAG (Retrieval-Augmented Generation) possible. The industry is currently in a "productivity trap." Execs see AI generating code and assume they can do more with less. But as pipelines become more automated, they also become more abstract and harder to troubleshoot. Relying solely on a few "super engineers" creates a single point of failure. Near-term, this means that orgs may be tightening their belts and relying on a corps of AI-powered super engineers to build and maintain pipelines. But they will soon find this is unsustainable. There simply need to be entry-level engineers who don’t just do “grunt work,” but who are available to learn and grow, helping to replenish senior talent who will inevitably become overworked or hit a growth ceiling. Execs may say the data engineering role is essential, but until I see more junior engineering positions posted, I won’t believe it. When super engineer is the new standard, your portfolio can't just show that you can follow a tutorial; it has to show you can manage a system. But I know the gap between a tutorial and a production system can feel like a massive "black box." So, I'm curious: What currently stops your progress when building independently?
Thanks for ingesting, -Zach Medium | LinkedIn | Ebooks |
Reaching 20k+ readers on Medium and over 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.
Hi fellow data professional! In a previous newsletter, I mentioned an idea that I wanted to explore deeper. At the risk of double-quoting a la The Office’s Michael Scott quoting Wayne Gretzky (“You Miss 100% Of The Shots You Don’t Take - Waynze Gretzky - Michael Scott”), here is the idea. “To be marketable as a candidate, you don’t just want to show how you can go from A to B (requirements->pipeline). You need to go from A to C (requirements->pipeline->scale/support).” You might be asking...
Hi fellow data professional! Remember when the world ended? This month, 6 years ago, the world shut down and entered “unprecedented times.” Shortly after COVID-19 was designated a pandemic, I was unceremoniously furloughed from my day job at Disney World for 3-ish months. During COVID while others quarantined, I was on the move. After quickly feeling isolated in our third floor Central Florida apartment, my now-wife and I joined millions of other American 20-somethings who took a pandemic as...
Hi fellow data professional! I’ve broken my own data project rule. I’ve used the same data over and over again. For 3 years. It sounds boring but that depth exposure may actually be one of the few moats that slows encroaching AI. A little context: I support subscriptions, newsletters and growth for my employer. Spoiler alert: These areas are all basically the same thing. And they use basically the same three data sets. While I have opportunities to jump to other projects, this has been my...