Extract. Transform. Read.A newsletter from Pipeline Hi past, present or future data professional! As difficult as data engineering can be, 95% of the time there is a structure to data that originates from external streams, APIs and vendor file deliveries. Useful context is provided via documentation and stakeholder requirements. And specific libraries and SDKs exist to help speed up the pipeline build process. But what about the other 5% of the time when requirements might be structured, but your data isn’t? Unstructured data comes in many forms, including incomprehensible metadata from ioT devices; I have the most experience with textual data, so I can speak to how I recommend approaching this classification of data. Since I nearly always work with structured data at work, I’ll be speaking from my experience scraping web data, parsing text files and reading PDFs.
Finally, if you’re working with a particular type of data, understand what libraries are available to reduce the manual parsing that will be required. And remember, the only shape you don’t want your data in is (0,0). Thanks for ingesting, -Zach Quinn |
Reaching 20k+ readers on Medium and over 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.
Hi fellow data professional! It finally happened. I fell for a job scam. Luckily I realized my naivety after responding to the initial email. But let’s back up. We’ll examine Why this particular attempt was so “real” What made me skeptical How to prevent this from happening to you Established professionals in any field have the privileged problem of receiving unsolicited recruiter inquiries. If it’s from a random firm I typically move it to junk; if it’s a big name company, I give a look...
Hi fellow data professional! The best data skills to develop right now might just be cutting and measuring. While that statement might be a bit facetious, the hot media narrative is to push the idea of blue collar work as a viable fallback if you’re having trouble breaking into a conventional tech role. Outlets like CNN have touted the fact that data center engineer is the hottest role in tech. Executives, specifically Nvidia’s Jensen Huang, speculate that data center construction (despite...
Hi fellow data professional! This is the 100th week I’m reaching out into the void of the Internet to connect with you in order to democratize data engineering career knowledge. In the golden age of cable TV, shows would celebrate the 100th episode milestone by airing extended content like a 1-hour special for a sitcom that would typically consume 30 minutes. But I know your time is valuable so I’m going to do the opposite and make this a shorter newsletter than normal. Since I live what I...