Understanding Expectation Maximization and Soft Clustering

Software Engineering & Travel Journal

Hands-on

This powerful algorithm will give you, for each data point you have, a vector of probabilities (I forgot the name of such vector, it has one!). Each probability will refer to each cluster you are trying to assign the data point to. Based on that, you can put the point to the cluster it is more probable to belong to.

It is an algorithm of soft clustering, given it won’t tell hardly which point belong to what cluster. It will say to you:

Hey, given the options, I think this point has more changes of belonging to cluster A than cluster B.

What if a data point has a probability of 50% of belonging to two clusters?

Or even if the point has equal or nearly equal probability of belonging to any of the clusters?

Well, that’s what it is a soft technique. That decision is up to you, and that’s good. You get to decide if the point belong to data set A or B (or any other letter from the alphabet).

Probabilities of 40% to 60% of belonging to any given cluster will contrast an elegant reality: the data isn’t black and white all the time, it has its shades of grey.

About The Author

Thiago Ricieri

I’m software engineer at Pluto TV, working on iOS, Android and Roku platforms. Personally interested in machine learning, algorithms, data science and tech industry. I travel a lot and take good pictures.

Add a comment

*Please complete all fields correctly

Related Posts

How to write a decay function in Python
Although simple, it took me some time to visualize both functions are equal but written in different forms.
K-Means Clustering: A very simple overview
A quick draft about K-Means Clustering, what it is, what it does and how to get started with it using SKLearn.