[ETR #14] Build Your Staging Tables Faster/Safer


Extract. Transform. Read.

A newsletter from Pipeline: Your Data Engineering Resource

Presented by Basejump AI

Use natural language prompts to chat with your database in Basejump’s intuitive interface, or embed it directly in your application. Book your demo here.

Hi past, present or future data professional!

One thing that makes my work day easier is when I’m Google-ing (as all software developers do) a problem and I come across the holy grail of solutions: A one-line implementation.

Like anything, however, a one-liner that is too complex can become a bad thing. Think: Chained Pandas expressions that become unreadable. Or cramming a multi-line query inside of a BigQuery client method.

My favorite one line (at least in recent memory) is a clause used with SQL’s ALTER TABLE statement: RENAME TO. You may find renaming a table as compelling as schema creation. But this simple clause can be especially useful in lieu of a more dangerous phrase: CREATE OR REPLACE.

The RENAME command allows you to rename a table without having to completely recreate its contents–and risk a SQL statement failing and losing some or all of your data.

Specifically, I use RENAME TO when I want to convert a copy table with some change, like an updated schema, to a production table. I do so using these steps:

  • Create/backfill a staging table I’ll ultimately convert to prod
  • Use ALTER TABLE `dataset.production_table` RENAME TO `production_table_original`
  • Use ALTER TABLE `dataset.staging_table` RENAME TO `production_table`
  • Double-check all partitions, clustering specifications and metadata descriptions are identical between the tables

The best part is that this is a true one-liner. No chains–or headaches–involved.

To save you a headache, here are this week’s links:

If you want to read more about this method, I cover the process in more detail here.

Questions? You know where to find me: zach@pipelinetode.com.

Until next time–thanks for ingesting,

-Zach Quinn

Extract. Transform. Read.

Reaching 20k+ readers on Medium and over 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.

Read more from Extract. Transform. Read.

Hi fellow data professional! Big news from my home base of Orlando: Disney hired a new CEO with a pay package of nearly $40 million. If you read beyond the headline you’ll see that his base salary is “only” 2.5 million with the possibility of up to a 250% target incentive and some $26-ish million in stock options. This is why you, the job seeker, need to think beyond base salary and look at TC. Total compensation. Thanks to labor transparency laws passed in hiring hubs like New York and...

Hi fellow data professional! Once, during a virtual interview, I had to nod politely as my interviewer apologized for coughing after their cigarette. Oh, and to make this situation even more cringe—they were driving. Some industries design stressful interview processes to psychologically test a candidate’s poise under pressure. Luckily (for the most part) the software engineering field is not included in this basket of high-stress tests. Sure, we are subjected to moderate stress in the form...

Hi fellow data professional! On a recent holiday, a family member and I were strolling along a beach, talking about AI disruption (relaxing, I know). He, an attorney, assured me his job was AI-proof and jokingly offered to hire me when AI takes my data engineering job. If you ask executives at most companies, they’d find several flaws in that argument. Over 80% of technical executives, including Chief Data Officers and Chief AI Officers, consider data engineering to be an essential role...