Sean Owen

Biography

Sean Owen is a veteran open-source engineer and applied researcher currently working in Mosaic Applied Research at Databricks. He is an Apache Spark PMC member and long-time committer with deep contributions to Spark's MLlib machine-learning library. Before Databricks he was Director of Data Science (EMEA) at Cloudera, where he arrived via the acqui-hire of his startup Myrrix Ltd, which commercialized real-time recommender systems on Hadoop. Earlier in his career he was a senior engineer at Google, where he helped build and launch Mobile Web search and created ZXing ('Zebra Crossing'), the ubiquitous open-source barcode scanning library for Java and Android (33.9k stars). Owen also served as VP and primary committer of Apache Mahout, authoring its Taste recommender framework. He co-authored two O'Reilly books -- 'Advanced Analytics with Spark' and 'Mahout in Action' -- and co-authored the MLlib paper in JMLR. He holds a BA in Computer Science from Harvard University and an MBA from London Business School, where he was selected as a Kauffman Fellow. He received the 2022 ACM SIGMOD Systems Award as part of the Apache Spark community.

Apache SparkMLlib / Distributed Machine LearningRecommender SystemsApache MahoutBarcode Recognition (ZXing)Lambda ArchitectureReal-Time Streaming AnalyticsAutoML on DatabricksBig Data EngineeringOpen-Source Software

Timeline

17 Research17 total

2025

2025-01Research

Transitioned to Applied Research at Mosaic (Databricks), focusing on foundation model research

2024

2024-01Research

Recent Apache Spark commit: corrected DateTimeFormat.withZone usage in UIUtils (SPARK-46611)

2024-06Research

Speaker at Databricks Data + AI Summit 2024

2022

2022-01Research

Published 'Advanced Analytics with PySpark' (O'Reilly, 2nd edition) adding Akash Tandon as co-author

2022-06Research

Received the 2022 ACM SIGMOD Systems Award as part of the Apache Spark community

2018

2018-01Research

Joined Databricks as Principal Product Specialist for Data Science and Machine Learning

2015

2015-01Research

Published 'Advanced Analytics with Spark' (O'Reilly) with Sandy Ryza, Uri Laserson, and Josh Wills

2015-05Research

Co-authored 'MLlib: Machine Learning in Apache Spark' paper (JMLR) with Xiangrui Meng, Matei Zaharia et al.

2014

2014-01Research

Launched Oryx project -- lambda architecture on Apache Spark and Kafka for real-time large-scale machine learning (1.8k stars, now archived)

2013

2013-01Research

Myrrix acqui-hired by Cloudera; became Director of Data Science (EMEA), based in London

2012

2012-01Research

Published 'Mahout in Action' (Manning) with Robin Anil, Ted Dunning, and Ellen Friedman

2012-01Research

Founded Myrrix Ltd to commercialize real-time recommender systems on Apache Hadoop

2009

2009-01Research

Became a primary committer and VP of Apache Mahout; authored the Taste recommender framework

2008

2008-01Research

Created ZXing ('Zebra Crossing'), an open-source barcode scanning library for Java and Android that now has 33.9k GitHub stars

2008-01Research

Started MBA at London Business School (graduated 2010); selected as a Kauffman Fellow

2000

2000-01Research

Joined Google as a software engineer; helped build and launch Mobile Web search

1996

1996-01Research

Enrolled at Harvard University to study Computer Science (BA)

Key Contributions

ZXing (Zebra Crossing)

Open-source multi-format 1D/2D barcode image processing library for Java and Android. Primary author. 33.9k GitHub stars.

Apache Spark MLlib

Long-time committer and PMC member of Apache Spark with deep contributions to MLlib, Spark's distributed machine learning library.

Apache Mahout -- Taste Recommender Framework

VP and primary committer of Apache Mahout. Authored the Taste collaborative-filtering recommender framework.

Oryx Project

Lambda architecture on Apache Spark and Apache Kafka for real-time large-scale machine learning. 1.8k stars (archived).

Advanced Analytics with Spark / PySpark (O'Reilly)

Co-authored the definitive O'Reilly book on practical large-scale analytics with Spark, covering classification, clustering, collaborative filtering, anomaly detection, genomics, and more.

Mahout in Action (Manning)

Co-authored with Robin Anil, Ted Dunning, and Ellen Friedman. Practical guide to Apache Mahout's machine learning algorithms.

MLlib: Machine Learning in Apache Spark (JMLR Paper)

Co-authored the foundational JMLR paper describing MLlib's design, algorithms, and distributed optimization foundations.

Myrrix Ltd

Founded startup commercializing real-time recommender systems on Hadoop. Acqui-hired by Cloudera in 2013 and evolved into the open-source Oryx project.

Notable Quotes

“

Big data was a name for this phenomenon. We suddenly went from a data scarce world to one where you could collect as much data as you cared to.

All Things Data Podcast (University of St. Thomas)·Source

“

Data is one of the remaining differentiators in this new era of big data and data analytics.

All Things Data Podcast (University of St. Thomas)·Source

“

Applied research @ Mosaic @ Databricks. Formerly Principal DS/ML Specialist @ Databricks. Apache Spark PMC / committer. Primary author, zxing and Oryx project.

GitHub profile bio·Source

16 sources(click to expand)

srowen (Sean Owen) -- GitHub Profile Sean Owen -- LinkedIn Sean Owen -- O'Reilly Author Page Sean Owen -- Data Science Festival Speaker Bio Sean Owen -- QCon London 2014 Speaker Page ZXing Barcode Scanning Library (33.9k stars)Oryx Project -- Lambda Architecture on Spark + Kafka MLlib: Machine Learning in Apache Spark (arXiv / JMLR)2022 SIGMOD Systems Award for Apache Spark Cloudera Acqui-hires Myrrix (BigDataWire)Advanced Analytics with PySpark (O'Reilly)Mahout in Action (Manning / Amazon)GreyBeards Podcast #123 with Sean Owen All Things Data Podcast -- Sean Owen of Cloudera Kauffman Fellowship -- Owen and Carrillo (London Business School)Sean Owen -- Databricks Data + AI Summit 2024

Research generated March 19, 2026

AI Infrastructure & Inference/Sean Owen

All Profiles