[ETR #35] AI Can Train You In DE


Extract. Transform. Read.

A newsletter from Pipeline

Hi past, present or future data professional!

Software engineers can package anything— including buzzwords. “Learn new, industry-relevant skills” was compressed to “upskilling.”

And while I’m a proponent of continuous learning, especially when it helps you avoid stagnation, at the end of the day, upskilling is a lot of work. Without proper structure and no mandate from a school or employer, it’s difficult to remain engaged, no matter how interesting the content.

My Udemy cart with 5 unfinished courses can attest to that.

So when I wanted to brush up on the ever-relevant PySpark, I booked some time with Professor LLM; LLMs like Google’s Gemini (my choice to be compatible with my GCP tech stack) make incredible teachers because, like a real tutor, they can engage in a dialogue and adjust to your learning style on the fly.

As a former tutor, I appreciate that it explains concepts as it provides results. This is the educational equivalent of “showing your work.”

To get the most out of your chat bot study sesh, I recommend:

  • Providing experience level context and desired trajectory: “I have 2 years’ experience with SQL but I’d like to learn more about CTEs as they relate to query optimization
  • Prompting the LLM to explain concepts as they relate to a specific role: “Show me examples of how a data engineer might use this skill/tool to build a data pipeline”
  • Repeating subject matter for confirmation: “Let me make sure I have this right, certain Python versions are no longer compatible with Pandas?”
  • Code correction/optimization: “How might I make this code more concise?”
  • Demonstrating errors: “What are the common errors associated with this method? How might I troubleshoot? How might I handle the errors?“

Lost in the AI hype is the power of being able to streamline simple tasks. I did this recently by using function calls to automate the conversion of Google Docs to markdown files I can render as blog posts.

Answering a string of questions as a “teacher” isn’t as revolutionary as an AI creating “original” movies or podcasts.

But by optimizing your learning, you’re creating an abundance of something far more valuable: Time.

Thanks for ingesting,

-Zach Quinn

Pipeline To DE

Top data engineering writer on Medium & Senior Data Engineer in media; I use my skills as a former journalist to demystify data science/programming concepts so beginners to professionals can target, land and excel in data-driven roles.

Read more from Pipeline To DE

Extract. Transform. Read. A newsletter from Pipeline For a STEM discipline, there is a lot of abstraction in data engineering, evident in everything from temporary SQL views to complex, multi-task AirFlow DAGs. Though perhaps most abstract of all is the concept of containerization, which is the process of running an application in a clean, standalone environment–which is the simplest definition I can provide. Since neither of us has all day, I won’t get too into the weeds on containerization,...

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! From 2014-2017 I lived in Phoenix, Arizona and enjoyed the state’s best resident privilege: No daylight saving time. If you’re unaware (and if you're in the other 49 US states, you’re really unaware), March 9th was daylight saving, when we spring forward an hour. If you think this messes up your microwave and oven clocks, just wait until you check on your data pipelines. Even though data teams...

Extract. Transform. Read. A newsletter from Pipeline Hi past, present or future data professional! As difficult as data engineering can be, 95% of the time there is a structure to data that originates from external streams, APIs and vendor file deliveries. Useful context is provided via documentation and stakeholder requirements. And specific libraries and SDKs exist to help speed up the pipeline build process. But what about the other 5% of the time when requirements might be structured, but...