This week I spent some time reviewing K-Means Clustering algorithm for Unsupervised Learning.

Imagine you have a data set and you would like to know if there are clusters that could describe the relationship between the data points. If you plot them into a chart, most likely you will notice with your human eyes how they divided in groups. But how to make the computer notice that?

That’s what K-Means Clustering does: it is a set of processes or algorithms which allows you to define those groups without the need of knowing what the groups actually are. The machine will decide that for you.

## Easy way to get started

An easy way to get started with K-Means is follow the process:

- Choose the amount of clusters you want to extract from the data;
- Place them as centroids, randomly;
- Calculate what points are closer to each centroid;
- Change the centroid position to center in the average distance of all points it held in its cluster;
- Repeat until you have a clear boundary dividing the data set.

The problem with this approach is that if you set the centroids randomly, you might get different clusters each time you try the process again. It’s a hill climbing problem: the result depends on the starting point. It’d be necessary to initialize the algorithm a couple times to get the most commons clusters found by the process.

Sklearn has a classifier that makes this process very easy. You just have to set the main values, `n_clusters`

, `n_init`

and `max_iter`

and you are ready to go.

## About The Author

Thiago Ricieri

I'm software engineer at Pluto TV, working on iOS, Android and Roku platforms. Personally interested in machine learning, algorithms, data science and tech industry. I travel a lot and take good pictures.