Ten years of Building Open Source Standards: From Parquet to Arrow to OpenLineage
This is a blog version of my talk of the same title. Since it’s a bit long, I have decided to publish it in 3 separate chapters.
Introduction
Over the last decade, I have been lucky enough to contribute to a few successful open source projects in the data ecosystem.
In this post, I will share the story of how these projects came to be and what made their success possible. I will describe the ideation process and early growth of the Apache Parquet columnar format and show how that led to the creation of its in-memory alter-ego Apache Arrow. I will end by showing how this experience enabled the success of OpenLineage, an LF AI & Data project that brings observability to the data ecosystem. Along the way, I will talk about the key elements that catalyzed their growth, from project focus to governance and community.