Extract. Transform. Read.A newsletter from Pipeline Hi past, present or future data professional! As difficult as data engineering can be, 95% of the time there is a structure to data that originates from external streams, APIs and vendor file deliveries. Useful context is provided via documentation and stakeholder requirements. And specific libraries and SDKs exist to help speed up the pipeline build process. But what about the other 5% of the time when requirements might be structured, but your data isn’t? Unstructured data comes in many forms, including incomprehensible metadata from ioT devices; I have the most experience with textual data, so I can speak to how I recommend approaching this classification of data. Since I nearly always work with structured data at work, I’ll be speaking from my experience scraping web data, parsing text files and reading PDFs.
Finally, if you’re working with a particular type of data, understand what libraries are available to reduce the manual parsing that will be required. And remember, the only shape you don’t want your data in is (0,0). Thanks for ingesting, -Zach Quinn |
Reaching 20k+ readers on Medium and over 3k learners by email, I draw on my 4 years of experience as a Senior Data Engineer to demystify data science, cloud and programming concepts while sharing job hunt strategies so you can land and excel in data-driven roles. Subscribe for 500 words of actionable advice every Thursday.
Hi fellow data professional! On a recent holiday, a family member and I were strolling along a beach, talking about AI disruption (relaxing, I know). He, an attorney, assured me his job was AI-proof and jokingly offered to hire me when AI takes my data engineering job. If you ask executives at most companies, they’d find several flaws in that argument. Over 80% of technical executives, including Chief Data Officers and Chief AI Officers, consider data engineering to be an essential role...
Hi fellow data professional! Ken Jee, who you heard from last week, drops some sobering career advice in one of the earliest editions of AI Survival Guide: Making a senior-level tech role is no longer about advancement; it’s about survival. The post talks about the growing moat or "wall" between those breaking into the industry, those in entry-level roles and those in a mid-career phase. In the spirit of AI Survival Guide’s advice to bridge the gap separating the early and mid-career...
Hi fellow data professional!' Today I’m turning the newsletter over to my friend Ken Jee (writer of AI Survival Guide, creator of Newsletter Hero) to share how he cuts through the noise of shiny AI products to find tools that enhance technical work. My Simple Framework For Adopting AI Tools Ken Jee As new AI tools launch almost daily, a quiet tax is emerging. Decision fatigue. Every new model, agent, or workflow tool carries the same implicit question. Should I switch, or should I go deeper...