The Art of Clustering: The Good, The Bad and The Beautiful by Seth Levine, Contentsquare

Video by Open Data Science and AI Conference via YouTube
The Art of Clustering: The Good, The Bad and The Beautiful by Seth Levine, Contentsquare

Clustering is one of the most widely used and most frequently misapplied techniques in machine learning. While it promises to uncover hidden structure in data, practitioners often encounter unstable results, arbitrary groupings, and clusters that are difficult to interpret or act on. These challenges are especially pronounced with high-dimensional data such as text embeddings and behavioral signals.

This talk presents a practical, end-to-end approach to clustering using modern, open-source tools built for real-world data. We begin by examining why common approaches like k-means often fall short when assumptions about cluster shape, density, and separability do not hold. We then introduce a more robust workflow: using UMAP to reveal structure in high-dimensional data, applying HDBSCAN to identify clusters without forcing arbitrary decisions, and leveraging datamapplot to make results interpretable and actionable.

Through examples, including real reviews and other data types, we show how clustering can surface meaningful themes and patterns that drive product and business decisions. We also highlight common pitfalls, such as over-interpreting clusters, misconfiguring parameters, and ignoring noise, along with practical guidance on how to avoid them.

Attendees will leave with a clear mental model for when clustering works, when it does not, and how to apply a modern workflow that turns messy, high-dimensional data into useful and explainable insights.

————————————————————————————————————-

Visit our website and choose the nearest ODSC event to attend and experience all our training and workshops: https://odsc.ai

To watch more videos like this, visit https://aiplus.training

Sign up for the newsletter to stay up to date with the latest trends in data science: https://opendatascience.com/newsletter/

Follow us online!
• Facebook: https://www.facebook.com/OPENDATASCI
• Instagram: https://www.instagram.com/odsc/
• Blog: https://opendatascience.com/
• LinkedIn: https://www.linkedin.com/company/open-data-science/
• X (twitter): https://x.com/_odsc

Source