Extract. Transform. Read.A newsletter from Pipeline Hi past, present or future data professional! As difficult as data engineering can be, 95% of the time there is a structure to data that originates from external streams, APIs and vendor file deliveries. Useful context is provided via documentation and stakeholder requirements. And specific libraries and SDKs exist to help speed up the pipeline build process. But what about the other 5% of the time when requirements might be structured, but your data isn’t? Unstructured data comes in many forms, including incomprehensible metadata from ioT devices; I have the most experience with textual data, so I can speak to how I recommend approaching this classification of data. Since I nearly always work with structured data at work, I’ll be speaking from my experience scraping web data, parsing text files and reading PDFs.
Finally, if you’re working with a particular type of data, understand what libraries are available to reduce the manual parsing that will be required. And remember, the only shape you don’t want your data in is (0,0). Thanks for ingesting, -Zach Quinn |
Reaching 20k+ readers on Medium and over 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.
Hi fellow data professional! This is the 100th week I’m reaching out into the void of the Internet to connect with you in order to democratize data engineering career knowledge. In the golden age of cable TV, shows would celebrate the 100th episode milestone by airing extended content like a 1-hour special for a sitcom that would typically consume 30 minutes. But I know your time is valuable so I’m going to do the opposite and make this a shorter newsletter than normal. Since I live what I...
Hi fellow data professional! For my health I try not to argue with my wife; but when she told me her networking plan I had to push back. Some context. She’s exploring career paths within the multinational corp she works for and wanted to meet with a friend of a family member. The catch? She felt weird about leveraging a personal connection and wanted to reach out cold. This is the wrong approach. Western culture demonizes nepotism but, truthfully, sometimes a connection is so painfully...
Hi fellow data professional! It’s baseball season in the U.S., a game defined by the "on-deck" line up. Before a player takes a big swing at the plate, they are already there, weighted bat in hand, timing the pitcher (who has to move a bit faster now thanks to the pitch clock), fully prepared for their moment. They don’t start looking for their helmet only after the umpire calls them up. In your early career perhaps you're considering "taking a big swing" by applying for that dream role at a...