Extract. Transform. Read.A newsletter from Pipeline: Your Data Engineering ResourceHi past, present or future data professional! Data engineering can be dangerous; ok—not, like, physically, but by building and maintaining data infrastructure, data engineers are given a surprising amount of access and responsibility. Every commit, table alteration and deletion must be made with care. It took 2 years, but I finally learned a shortcut to make developing SQL staging tables less risky and more efficient. Even seemingly minor mistakes like joining on the wrong key can result in losing days or months of valuable data, which can be equal to hundreds of thousands or millions of dollars in revenue visibility. Outside of code mistakes, not paying attention to logistic factors like vendor contracts and API usage can not only result in downtime, in a worst-case scenario it can lead to an all-out blackout. If the stakes sound ominous, I’d suggest examining the root of your hesitation to work more confidently and efficiently—it may even be the code itself. There is a happy medium between freely building data pipelines and using the appropriate guard rails. As long as you take your time and don’t commit code directly to the main branch then you can do data engineering safely and avoid bursting your pipelines. For those who are anti-virus minded, here are this week’s links as plain text:
P.S. Want to learn how to go from code to automated pipeline? Take advantage of my 100% free email course: Deploy Google Cloud Functions In 5 Days. Thanks for ingesting, -Zach |
Reaching 20k+ readers on Medium and over 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.
Hi fellow data professional! Hardly a work day goes by without receiving a request from a data analyst. They range from the mundane “Can you add this column?” to the occasional emergency “The data didn’t load all weekend and the leadership call starts in 15 minutes!” At the end of a jam-packed week I received an unusual request: Help with a Python script. My teammate wanted to know: Best practices How to commit to GitHub What the best way to deploy is They admitted the task was simple,...
Hi fellow data professional! It finally happened. I fell for a job scam. Luckily I realized my naivety after responding to the initial email. But let’s back up. We’ll examine Why this particular attempt was so “real” What made me skeptical How to prevent this from happening to you Established professionals in any field have the privileged problem of receiving unsolicited recruiter inquiries. If it’s from a random firm I typically move it to junk; if it’s a big name company, I give a look...
Hi fellow data professional! The best data skills to develop right now might just be cutting and measuring. While that statement might be a bit facetious, the hot media narrative is to push the idea of blue collar work as a viable fallback if you’re having trouble breaking into a conventional tech role. Outlets like CNN have touted the fact that data center engineer is the hottest role in tech. Executives, specifically Nvidia’s Jensen Huang, speculate that data center construction (despite...