[ETR #103] What's Preventing Analysts From Becoming Engineers

Hi fellow data professional!

Hardly a work day goes by without receiving a request from a data analyst. They range from the mundane “Can you add this column?” to the occasional emergency “The data didn’t load all weekend and the leadership call starts in 15 minutes!”

At the end of a jam-packed week I received an unusual request: Help with a Python script.

My teammate wanted to know:

Best practices
How to commit to GitHub
What the best way to deploy is

They admitted the task was simple, converting a query to a CSV on a scheduled interval.

After all, any LLM can spit out Python code for that task in 1 attempt.

The difficult part? All the “other stuff.”

Version control
Upstream dependencies
Scheduling
Error handling

This isn’t the first conversation I’ve had with a data analyst who wants to get more “data engineering experience.”

At least three close friends/family members revealed they’ve been “trying to learn Python” to make the leap from data analytics to data engineering. They inevitably ask me for course suggestions.

But more programming courses aren’t going to to teach you the data engineering mindset needed to break into an oversaturated market and keep up with rapidly advancing technologies.

Take my teammate's request, for example. They wanted to automate a query, dump it to a CSV, and run it on a schedule.

If you ask an LLM or look at a standard Python tutorial, it will tell you to drop that script onto a server and spin up a local Linux Cron job. It looks simple, clean, and fast.

But in production, Cron is a "fire and forget" tool. It has no systemic awareness. If the database crashes, Cron doesn't care; it runs anyway on empty data.

If a network blip kills the API, Cron won't try again; it just quietly fails and waits until the next day. When you scale to 500 tasks, checking 500 separate log files on a server becomes a nightmare.

Data engineers don't build "jobs." We orchestrate tasks.

Instead of guessing what time an upstream table will finish loading and blindly scheduling a script to run 30 minutes later, we use tools like Apache Airflow to declare strict dependency boundaries and automated error-recovery logic directly in our code.

Look at the structural shift between how an analyst schedules a task...

# THE ANALYST WAY (Brittle, blind time-based execution)
# If the database is delayed, this script executes on stale data anyway.
30 2 * * * /usr/bin/python3 /pipelines/query_to_csv.py >> /var/log/cron_output.log 2>&1

Vs. how an engineer orchestrates a production-grade asset pipeline.

# THE DATA ENGINEERING WAY (Defensive, orchestrated execution)
# Built with native dependency logic, automatic retries, and failure alerts.
with DAG(
    dag_id='automated_ledger_extraction',
    default_args={
        'retries': 3,                           # Automatically handles transient network blips
        'retry_delay': timedelta(minutes=5),    # Grace period before trying again
        'email_on_failure': True,               # Real-time alerting framework
    },
    schedule_interval='@daily',
    catchup=False,
) as dag:
    # Task 1: Explicitly wait until the data actually exists
    wait_for_upstream_data = BigQueryTablePartitionExistenceSensor(
        task_id='verify_source_table_readiness',
        table_id='transaction_ledger',
    )
    # Task 2: Execute the extraction only after Task 1 succeeds
    extract_clean_csv = BulkDataExtractionOperator(
        task_id='export_query_to_perimeter_storage',
    )
    # The Lineage Chain: Task 2 is completely blocked if the data isn't ready
    wait_for_upstream_data >> extract_clean_csv

The code blocks above aren't just theoretical examples; they are a sneak peek into the exact architectural transformations covered inside the PipelineToDE Membership, which I am officially launching on Tuesday, June 16th.

The market is already oversaturated with people who can write a basic loop or have an LLM spit out a simple query. It does not need more “coders.”

It demands engineers who understand how to design production-grade cloud stacks, protect infrastructure perimeters, containerize development environments, and automate pipelines that survive a real production environment.

On 6/16, I’m opening the doors to an all-access membership designed to take you completely out of tutorial hell and the code-happy analyst mindset.

Becoming a member doesn't just give you the keys to the complete DA->DE Pathway Course; it unlocks an active, evolving engineering ecosystem packed with production blueprints, container templates, code reviews, and the frameworks you need to break into the field.

If you want to be the first to get access on launch day, click here so I can alert you first.

Thanks for ingesting,

-Zach Quinn

Extract. Transform. Read.

[ETR #103] What's Preventing Analysts From Becoming Engineers

Medium | LinkedIn | Ebooks

[ETR #105] Why Employers Care About A Green Square

[ETR #104] Master The 3-Step Debugging Framework From A Worst-Case Data Scenario

[ETR #102] I Fell For A Job Scam So You Don't

Extract. Transform. Read.

[ETR #103] What's Preventing Analysts From Becoming Engineers

​Medium | LinkedIn | Ebooks

Extract. Transform. Read.

[ETR #105] Why Employers Care About A Green Square

[ETR #104] Master The 3-Step Debugging Framework From A Worst-Case Data Scenario

[ETR #102] I Fell For A Job Scam So You Don't

Medium | LinkedIn | Ebooks