📜  kmeans sklearn - Python (1)

📅  最后修改于: 2023-12-03 14:43:40.176000             🧑  作者: Mango

Introduction to K-Means Clustering in Python using Scikit-Learn

K-Means clustering is one of the most popular unsupervised learning algorithms used to classify datasets. In this tutorial, we will introduce the concept of K-Means clustering using Scikit-Learn library in Python.

Installation of Scikit-Learn

Before we start using Scikit-Learn library, we need to install it. You can use the following command to install it using pip:

!pip install scikit-learn
Importing the Required Libraries

Let's start by importing the required libraries:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

Here, we are importing numpy, matplotlib, and KMeans from sklearn.cluster module.

Generating Random Data

Let's generate some random data to demonstrate the K-Means clustering method.

X = -2 * np.random.rand(100, 2)
X1 = 1 + 2 * np.random.rand(50, 2)
X[50:100, :] = X1
plt.scatter(X[:,0], X[:,1], s = 50)
plt.show()

The above code will create two blobs of random data points.

![Kmeans Sklearn Python-01.png](attachment:Kmeans Sklearn Python-01.png)

Implementing K-Means Clustering

Now, let's implement K-Means clustering method to classify these data points into two clusters.

kmeans = KMeans(n_clusters=2)
kmeans.fit(X)

Here, we are initializing the KMeans object with n_clusters=2 parameter to create two clusters. Then, we are calling the fit method to classify the data points into two clusters.

Visualizing the Clusters

We can visualize the two clusters by using the following code:

plt.scatter(X[:,0], X[:,1], s = 50, c = kmeans.labels_)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 200, marker = '*', c = 'red')
plt.show()

The above code will plot the two clusters in different colors and also highlight the centroids of the clusters with a red star marker.

![Kmeans Sklearn Python-02.png](attachment:Kmeans Sklearn Python-02.png)

Conclusion

In this tutorial, we introduced K-Means clustering method using Scikit-Learn library in Python. We also generated some random data points and classified them into two clusters using K-Means algorithm. Finally, we visualized the clusters and centroids using matplotlib library.