Applied Researcher, Mosaic |Databricks
Apache Spark PMC member and long-time committer. Created ZXing barcode library (33.9k stars) and Oryx project. Co-author of Advanced Analytics with Spark (O'Reilly) and Mahout in Action. Former VP Apache Mahout, Director of Data Science at Cloudera, and senior engineer at Google.
Biography
Sean Owen is a veteran open-source engineer and applied researcher currently working in Mosaic Applied Research at Databricks. He is an Apache Spark PMC member and long-time committer with deep contributions to Spark's MLlib machine-learning library. Before Databricks he was Director of Data Science (EMEA) at Cloudera, where he arrived via the acqui-hire of his startup Myrrix Ltd, which commercialized real-time recommender systems on Hadoop. Earlier in his career he was a senior engineer at Google, where he helped build and launch Mobile Web search and created ZXing ('Zebra Crossing'), the ubiquitous open-source barcode scanning library for Java and Android (33.9k stars). Owen also served as VP and primary committer of Apache Mahout, authoring its Taste recommender framework. He co-authored two O'Reilly books -- 'Advanced Analytics with Spark' and 'Mahout in Action' -- and co-authored the MLlib paper in JMLR. He holds a BA in Computer Science from Harvard University and an MBA from London Business School, where he was selected as a Kauffman Fellow. He received the 2022 ACM SIGMOD Systems Award as part of the Apache Spark community.
Open-source multi-format 1D/2D barcode image processing library for Java and Android. Primary author. 33.9k GitHub stars.
Long-time committer and PMC member of Apache Spark with deep contributions to MLlib, Spark's distributed machine learning library.
VP and primary committer of Apache Mahout. Authored the Taste collaborative-filtering recommender framework.
Lambda architecture on Apache Spark and Apache Kafka for real-time large-scale machine learning. 1.8k stars (archived).
Co-authored the definitive O'Reilly book on practical large-scale analytics with Spark, covering classification, clustering, collaborative filtering, anomaly detection, genomics, and more.
Co-authored with Robin Anil, Ted Dunning, and Ellen Friedman. Practical guide to Apache Mahout's machine learning algorithms.
Co-authored the foundational JMLR paper describing MLlib's design, algorithms, and distributed optimization foundations.
Founded startup commercializing real-time recommender systems on Hadoop. Acqui-hired by Cloudera in 2013 and evolved into the open-source Oryx project.
Big data was a name for this phenomenon. We suddenly went from a data scarce world to one where you could collect as much data as you cared to.
Data is one of the remaining differentiators in this new era of big data and data analytics.
Applied research @ Mosaic @ Databricks. Formerly Principal DS/ML Specialist @ Databricks. Apache Spark PMC / committer. Primary author, zxing and Oryx project.
Research generated March 19, 2026