What is K-MEANS clustering algorithm :
K-MEANS clustering is one of the simplest and popular unsupervised machine learning algorithms.
As a part of the unsupervised learning method, clustering attempts to identify a relationship between n-observations( data points) without being trained by the response variable.
With the intent of obtaining data points under the same class as identical as possible, and the data points in a separate class as dissimilar as possible.
Basically, in the process of clustering, one can identify which observations are alike and classify them significantly in that manner. Keeping this perspective in mind, k-means clustering is the most straightforward and frequently practised clustering method to categorize a dataset into a bunch of k classes (groups).
How Does the K-means clustering algorithm work?
k-means clustering tries to group similar kinds of items in form of clusters. It finds the similarity between the items and groups them into the clusters. K-means clustering algorithm works in three steps. Let’s see what are these three steps.
- Select the k values.
- Initialize the centroids.
- Select the group and find the average.
Let us understand the above steps with the help of the figure because a good picture is better than the thousands of words.
We will understand each figure one by one.
- Figure 1 shows the representation of data of two different items. the first item has shown in blue color and the second item has shown in red color. Here I am choosing the value of K randomly as 2. There are different methods by which we can choose the right k values.
- In figure 2, Join the two selected points. Now to find out centroid, we will draw a perpendicular line to that line. The points will move to their centroid. If you will notice there, then you will see that some of the red points are now moved to the blue points. Now, these points belong to the group of blue color items.
- The same process will continue in figure 3. we will join the two points and draw a perpendicular line to that and find out the centroid. Now the two points will move to its centroid and again some of the red points get converted to blue points.
- The same process is happening in figure 4. This process will be continued until and unless we get two completely different clusters of these groups.
K-MEANS clustering algorithm use case :
Let’s consider the data on drug-related crimes in Canada. The data consists of crimes due to various drugs that include, Heroin, Cocaine to prescription drugs, especially by underage people. The crimes resulted due to these substance abuse can be brought down by starting de-addiction centres in areas most afflicted by this kind of crime. With the available data, different objectives can be set. They are:
- Classify the crimes based on the abuse substance to detect prominent cause.
- Classify the crimes based on age groups.
- Analyze the data to determine what kinds of de-addiction centre is required.
- Find out how many de-addiction centres need to be setup to reduce drug related crime rate.
The K-means algorithm can be used to determine any of the above scenarios by analyzing the available data.
#vimaldaga #righteducation #educationredefine #rightmentor #worldrecordholder #linuxworld #makingindiafutureready #righeducation