[ETR #62] I Got The Same ? In 12 Interviews


Extract. Transform. Read.

A newsletter from Pipeline

Hi past, present or future data professional!

I once participated in a remote job interview in which the interviewer was on the video call while driving... and smoking.

While that instance was among the most memorable interview experiences (for the wrong reasons), I’ve had just as many interviews that have blended together and faded into the recesses of my mind.

The common denominator, however, was the insistence on asking one question.

The answer you provide can make or break your interview.

The question I heard repeatedly, especially after I presented a project from my portfolio, was: “Where did you get your data?”

It’s an innocent question, but it’s a brilliant way for an interviewer to gauge your resourcefulness.

And while there’s no truly "wrong" answer, I quickly learned there's a definite best answer. The truth is, relying on perfectly clean, pre-packaged data from repositories like Kaggle is a trap. I’m not saying Kaggle is necessarily bad. I mean, I’ve used it myself for school projects. It just isn't always representative of the majority of data sources you'll encounter.

As I got deeper into the field and understood employer expectations, I realized that real-world data is messy, incomplete, and rarely comes in a perfectly formatted CSV. Using a stock dataset doesn’t show a potential employer that you’re ready for the reality of the job; it just demonstrates your ability to use read_csv.

When I started offering responses that showed my ability to source and manipulate data in a novel way, the interviews took a noticeable turn for the better.

Here’s what you should be saying:

  • “I scraped the data from a website and converted it to a dataframe.”
  • “I combined an existing dataset with data scraped from a Wikipedia table.”
  • “I accessed an API and built a pipeline to gather the information.”

These answers signal a crucial skill: you’re not just a data consumer; you’re an aggregator of information. You’re resourceful and you're not afraid of the messiness that accompanies the process of mining real-world data.

Creating your own unique dataset (even a small, niche one) demonstrates 3 things to a hiring manager:

  • You’re comfortable converting messy data into something usable
  • You are willing to deviate from "stock" datasets and approach problems with creativity
  • It showcases a genuine passion for the field and you’re invested in the craft of the role

As a bonus, if you can find a dataset that’s relevant to the industry you're applying to, you'll also prove that you have relevant domain knowledge, which is truly a rarity among technically-inclined candidates.

So, before your next interview, take a look at your portfolio. If it's full of projects using perfectly clean data, consider spending some time creating a new end-to-end build that starts messier.

You don’t have to build a custom data warehouse from scratch. In fact, even a simple project that involves scraping a Wikipedia table with Pandas can demonstrate additional effort that goes beyond downloading and reading a CSV.

In the end, the best source of data is yourself.

Read the original story here.

Thanks for ingesting,

-Zach Quinn

Extract. Transform. Read.

Reaching 20k+ readers on Medium and nearly 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.

Read more from Extract. Transform. Read.

Extract. Transform. Read. A newsletter from PipelineToDE Hi past, present or future data professional! One of the most validating and terrifying professional moments is reaching the final interview round. It is in this context that you meet candidacy’s final boss, who incidentally, usually ends up being your boss' boss. Specifically I’m referring to the department executive responsible for bringing in additional headcount, i.e. you. While this may sound intimidating, the role of the executive...

Extract. Transform. Read. A newsletter from PipelineToDE Hi past, present or future data professional! If you’re a job seeker in the data space, your GitHub portfolio has only one job: To act as a calling card that gets you to the next step of the hiring process. Too often, I review portfolios for potential referrals and see brilliant code buried under structural mistakes that have nothing to do with programming skill. Your GitHub is not just cloud storage for your code; it’s a public display...

Extract. Transform. Read. A newsletter from PipelineToDE Hi past, present or future data professional! Despite crushing autocorrect scenarios, most AI code assistants like CoPilot miss a critical step when helping developers of any experience level: Validation. Arguably, leveraging an AI Agent to validate a code’s quality is on the user. But a surprising amount of experienced programmers are taking the worrying approach of believing an AI’s first “thought” when it comes to code that will...