[ETR #103] What's Preventing Analysts From Becoming Engineers


Hi fellow data professional!

Hardly a work day goes by without receiving a request from a data analyst. They range from the mundane “Can you add this column?” to the occasional emergency “The data didn’t load all weekend and the leadership call starts in 15 minutes!”

At the end of a jam-packed week I received an unusual request: Help with a Python script.

My teammate wanted to know:

  • Best practices
  • How to commit to GitHub
  • What the best way to deploy is

They admitted the task was simple, converting a query to a CSV on a scheduled interval.

After all, any LLM can spit out Python code for that task in 1 attempt.

The difficult part? All the “other stuff.”

  • Version control
  • Upstream dependencies
  • Scheduling
  • Error handling

This isn’t the first conversation I’ve had with a data analyst who wants to get more “data engineering experience.”

At least three close friends/family members revealed they’ve been “trying to learn Python” to make the leap from data analytics to data engineering. They inevitably ask me for course suggestions.

But more programming courses aren’t going to to teach you the data engineering mindset needed to break into an oversaturated market and keep up with rapidly advancing technologies.

Take my teammate's request, for example. They wanted to automate a query, dump it to a CSV, and run it on a schedule.

If you ask an LLM or look at a standard Python tutorial, it will tell you to drop that script onto a server and spin up a local Linux Cron job. It looks simple, clean, and fast.

But in production, Cron is a "fire and forget" tool. It has no systemic awareness. If the database crashes, Cron doesn't care; it runs anyway on empty data.

If a network blip kills the API, Cron won't try again; it just quietly fails and waits until the next day. When you scale to 500 tasks, checking 500 separate log files on a server becomes a nightmare.

Data engineers don't build "jobs." We orchestrate tasks.

Instead of guessing what time an upstream table will finish loading and blindly scheduling a script to run 30 minutes later, we use tools like Apache Airflow to declare strict dependency boundaries and automated error-recovery logic directly in our code.

Look at the structural shift between how an analyst schedules a task...

# THE ANALYST WAY (Brittle, blind time-based execution)
# If the database is delayed, this script executes on stale data anyway.
30 2 * * * /usr/bin/python3 /pipelines/query_to_csv.py >> /var/log/cron_output.log 2>&1

Vs. how an engineer orchestrates a production-grade asset pipeline.

# THE DATA ENGINEERING WAY (Defensive, orchestrated execution)
# Built with native dependency logic, automatic retries, and failure alerts.
with DAG(
    dag_id='automated_ledger_extraction',
    default_args={
        'retries': 3,                           # Automatically handles transient network blips
        'retry_delay': timedelta(minutes=5),    # Grace period before trying again
        'email_on_failure': True,               # Real-time alerting framework
    },
    schedule_interval='@daily',
    catchup=False,
) as dag:
    # Task 1: Explicitly wait until the data actually exists
    wait_for_upstream_data = BigQueryTablePartitionExistenceSensor(
        task_id='verify_source_table_readiness',
        table_id='transaction_ledger',
    )
    # Task 2: Execute the extraction only after Task 1 succeeds
    extract_clean_csv = BulkDataExtractionOperator(
        task_id='export_query_to_perimeter_storage',
    )
    # The Lineage Chain: Task 2 is completely blocked if the data isn't ready
    wait_for_upstream_data >> extract_clean_csv

The code blocks above aren't just theoretical examples; they are a sneak peek into the exact architectural transformations covered inside the PipelineToDE Membership, which I am officially launching on Tuesday, June 16th.

The market is already oversaturated with people who can write a basic loop or have an LLM spit out a simple query. It does not need more “coders.”

It demands engineers who understand how to design production-grade cloud stacks, protect infrastructure perimeters, containerize development environments, and automate pipelines that survive a real production environment.

On 6/16, I’m opening the doors to an all-access membership designed to take you completely out of tutorial hell and the code-happy analyst mindset.

Becoming a member doesn't just give you the keys to the complete DA->DE Pathway Course; it unlocks an active, evolving engineering ecosystem packed with production blueprints, container templates, code reviews, and the frameworks you need to break into the field.

If you want to be the first to get access on launch day, click here so I can alert you first.


Thanks for ingesting,

-Zach Quinn

Medium | LinkedIn | Ebooks

Extract. Transform. Read.

Reaching 20k+ readers on Medium and over 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.

Read more from Extract. Transform. Read.

Hi fellow data professional! It finally happened. I fell for a job scam. Luckily I realized my naivety after responding to the initial email. But let’s back up. We’ll examine Why this particular attempt was so “real” What made me skeptical How to prevent this from happening to you Established professionals in any field have the privileged problem of receiving unsolicited recruiter inquiries. If it’s from a random firm I typically move it to junk; if it’s a big name company, I give a look...

Hi fellow data professional! The best data skills to develop right now might just be cutting and measuring. While that statement might be a bit facetious, the hot media narrative is to push the idea of blue collar work as a viable fallback if you’re having trouble breaking into a conventional tech role. Outlets like CNN have touted the fact that data center engineer is the hottest role in tech. Executives, specifically Nvidia’s Jensen Huang, speculate that data center construction (despite...

Hi fellow data professional! This is the 100th week I’m reaching out into the void of the Internet to connect with you in order to democratize data engineering career knowledge. In the golden age of cable TV, shows would celebrate the 100th episode milestone by airing extended content like a 1-hour special for a sitcom that would typically consume 30 minutes. But I know your time is valuable so I’m going to do the opposite and make this a shorter newsletter than normal. Since I live what I...