K-Means Clustering,Real Use Case in the security domain

4 min readJul 19, 2021

Crimes are the most common social issues nowadays, affecting the economic growth, quality of life, and economy of any country. Crimes affect the reputation of a country on an international scale and affect the economy of the country by placing a financial burden on the government in hiring additional police forces. It is important to control, understand the measures for decreasing the crime rate.

The article today will discuss a very popular algorithm for unsupervised learning, which is K-means clustering.

K-means algorithm is used the seperate date into different groups or classes.

K-means clustering is one of the method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.

A K –means algorithm can be applied to a numerical and continous data with minimal dimension.

We are using USArrests.csv dataset for understanding the Kmeans algorithm.

1. We need to import important libraries :

import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.preprocessing import scale
from numpy import random, float, array
import numpy as np
import seaborn as sns

2. Now reading the csv file:

df = pd.read_csv(“USArrests.csv”)

3. Viewing it:

df.head()

4. Understanding the view of dataset from different angle using graph.

f, ax = plt.subplots(figsize=(16, 10))

stats = df.sort_values("Total", ascending=False)

sns.set_color_codes("pastel")

sns.barplot(x="Total", y="State", data=stats,
            label="Total", color="g")

sns.barplot(x="Assault", y="State", data=stats,
            label="Assault", color="b")

sns.barplot(x="Rape", y="State", data=stats,
            label="Rape", color="y")

sns.barplot(x="Murder", y="State", data=stats,
            label="Murder", color="r")

# Add a legend and informative axis label
ax.legend(ncol=2, loc="lower right", frameon=True)
ax.set(xlim=(0, 400), ylabel="State",
       xlabel="Nr of arrests for each crime");

5. Finding out the optimal number of clusters

We start by using all 4 variables, excluding variable Total.

X = df[['Murder', 'Assault', 'Rape', 'UrbanPop']]

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform( X )
cluster_range = range( 1, 20 )
cluster_errors = []

for num_clusters in cluster_range:
  clusters = KMeans( num_clusters )
  clusters.fit( X_scaled )
  cluster_errors.append( clusters.inertia_ )

clusters_df = pd.DataFrame( { "num_clusters":cluster_range, "cluster_errors": cluster_errors } )

clusters_df[0:10]

6. Analysing the data — Murder & Assault

sns.lmplot( 'Murder','Assault',  data=df,
        hue = 'Crime_clusters',
        fit_reg=False, size = 6 );

7. Now studing the value of input points Using Graph :

The output shows the different points inside graph denoted by “x” mark on graph the area they come nearby to these points are considered relates them.

# And same thing using Matplotlib to show also the cluster centers as x:s.data=X
clusters = KMeans(4);
clusters.fit(X);
plt.figure(figsize=(7, 7))
clusters.labels_;
centers = np.array(clusters.cluster_centers_)
plt.scatter(centers[:,0], centers[:,1], marker="x", color='r')
plt.scatter(data.iloc[:,0],data.iloc[:,1], c=[plt.cm.spectral(float(i) /5) for i in clusters.labels_]); 
#print(data)
#print(centers)

8. Coorelation Analysis :

The table confirms the assumptions regarding variable correlations indicated also by the graphs. For example, murder and assault have the highest correlations, whereas the size of urban population is not significant.

variables_correlation = df[['Murder', 'Assault', 'Rape', 'UrbanPop']]
variables_correlation.corr()

9. Sort the data according to the four clusters.

stats = df.sort_values("Total", ascending=True)
df_total= pd.DataFrame(stats)

df_total.head(30)

10. Conclusion —

From the above experiment on crime data set we were able to predict the crime rate in different States of US and as per result we can state following

0 Crime Rate value — Missouri, Tennessee, Texas, Georgia, Delware…

1 Crime Rate value — Maine, South Dakota, Hawaii, Minnesota…

2 Crime Rate value — Alabama, Delaware….

3 Crime Rate value — Nebraska,Montana….

So, we can conclude that Missiouri is less in crime rate than Alabama….

ThankYou For Reading Article……………………….