Extract. Transform. Read.

A newsletter from Pipeline

Hi past, present or future data professional!

In 2024 I published roughly 75 stories, mostly about data engineering or technology; understandably, with the pace of life and media, you most likely missed something I hope you’ll find valuable and actionable.

Keeping with one of my core beliefs, that data-driven tools should result in both professional enrichment and reduce personal problems, my methodology for picking stories out of that stack is informed by data I ingested daily from the unofficial Medium API.

Instead of focusing on ambiguous metrics like views and/or read time and to avoid getting noisy data due to multiple clapper fans, I calculated only individual accounts reacting to my work. From there, it was a simple matter of writing a quick query against data stored within my BigQuery project.

To respect your time, I’ll present the links as plain text with a 1 sentence summary.

For a more fully rendered version of this list, along with some important framing, you can read the full story.

How I Reduced My Query’s Run Time From 30 Min. To 30 Sec. In 1 Hour

A shared responsibility of data engineers, regardless of organizational hierarchy, is to ensure that teammates and fellow data analysts, data scientists and data consumers are querying efficiently; this story, published in the publication I co-edit, Learning SQL, takes you from my receiving a monstrous query to the point of optimization.

Pandas’ 2.0 Release Deprecated Your Favorite Method. What Now?

In 2023 Pandas deprecated one of its users favorite method–one that I used in nearly every data pipeline; learn what was deprecated and what workarounds are available.

Why Your Data Pipelines Will Fail On These 10 Days Every Year — And What To Do About It

Building robust pipelines doesn’t end with assuring your code can run using production dependencies; it ends when you ensure your work functions within the constraints of Earth time.

SQL Developers: Take These 5 Create Table Steps To Improve Performance

In a piece for Learning SQL, I explain why table performance doesn’t begin with your query, but with the act of table creation itself.

How Not To Annoy Senior Developers — Sincerely, A Senior Data Engineer

After being promoted to a senior position, I wrote about strategies covering how not to annoy your senior data scientists, engineers or developers from a senior’s perspective.

2023 In 12 Data Engineering Errors That Ultimately Advanced My Skills

Before I wrote a year-end wrap up like this, I took an unconventional route, reflecting not on my successes as a developer and engineer in a given year, but sharing errors that resonated with me so much, I thought about them all year; typically, these errors fall into broad categories based on the technology, i.e. Python, SQL and Airflow.

Not Getting Data Science Job Interviews? You Have A Visibility Problem

Data science job applicants are facing more competition than ever; how to create and publicize compelling work to stand out — the right way.

I’ve blogged about data science and data engineering concepts for as long as I’ve been a data engineer; somehow, 2025 marks 4 years of working in an engineering role in tech.

I’m continuously humbled by you, the reader, who takes time to read, engage and reach out.

The one-time journalist in me is thrilled to have had the opportunity to develop and engage with a loyal audience. I appreciate you reading in 2024 and look forward to our conversations in ‘25.

Happy New Year and thanks for ingesting,

-Zach Quinn

Extract. Transform. Read.

[ETR #29] ICYMI: Pipeline’s Top Stories Of ‘24

Extract. Transform. Read.

[ETR #44] Why No One Can Find Your GitHub

[ETR #40] Data Engineering Uncontained

[ETR #39] Your Pipelines Will Fail On These 10 Days