Extract. Transform. Read.A newsletter from Pipeline Hi past, present or future data professional! As difficult as data engineering can be, 95% of the time there is a structure to data that originates from external streams, APIs and vendor file deliveries. Useful context is provided via documentation and stakeholder requirements. And specific libraries and SDKs exist to help speed up the pipeline build process. But what about the other 5% of the time when requirements might be structured, but your data isn’t? Unstructured data comes in many forms, including incomprehensible metadata from ioT devices; I have the most experience with textual data, so I can speak to how I recommend approaching this classification of data. Since I nearly always work with structured data at work, I’ll be speaking from my experience scraping web data, parsing text files and reading PDFs.
Finally, if you’re working with a particular type of data, understand what libraries are available to reduce the manual parsing that will be required. And remember, the only shape you don’t want your data in is (0,0). Thanks for ingesting, -Zach Quinn |
Reaching 20k+ readers on Medium and nearly 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.
Hi past, present or future data professional! If you’re in the U.S., Happy Thanksgiving! I’m prepping for my food coma, so I’ll make this week’s newsletter quick. Like millions of Americans, I’ll be watching NFL football (go Ravens!). The average NFL game is 3 hours. If you can skip just one of today’s games and carve out that time for professional development, here’s how I’d spend it. In the spirit of football, I’ll split the time designation into 4 quarters. Documentation pass - if you read...
Extract. Transform. Read. A newsletter from PipelineToDE Hi past, present or future data professional! In 2 weeks or so The Oxford English Dictionary will reveal its 2025 word of the year, a semi-democratic process that lends academic legitimacy to words like “rizz” (2023’s pick). If you’re currently employed or interact with white collar workers, you would think the word of the year is “headwinds.” Used in a sentence: “We’ve pivoted our AI strategy but still encountered headwinds that...
Extract. Transform. Read. A newsletter from PipelineToDE Hi past, present or future data professional! After choosing a dataset, one of the most significant decisions you must make when creating displayable work is: How am I going to build this thing? For some, you may try to “vibe code” along with an LLM doing the grunt technical work. If you choose this approach, be warned: Nearly half of all “vibe code” generated contains security vulnerabilities and that’s before you even consider its...