K-Means 和 DBScan 聚类之间的区别

聚类是无监督机器学习中的一种技术，它根据数据集中数据点可用信息的相似性将数据点分组到集群中。属于同一簇的数据点在某些方面彼此相似，而属于不同簇的数据项不同。

K-means和DBScan (基于密度的应用程序空间聚类与噪声)是无监督机器学习中最流行的两种聚类算法。

1. K-Means 聚类：
K-means 是一种基于质心或基于分区的聚类算法。该算法将样本空间中的所有点划分为 K 个相似性组。通常使用欧几里得距离来衡量相似性。

算法如下：

算法：

2. DBScan 集群：
DBScan 是一种基于密度的聚类算法。该算法的关键事实是，给定半径 (R) 内的集群中每个点的邻域必须具有最小数量的点 (M)。事实证明，该算法在检测异常值和处理噪声方面非常有效。

算法如下：

算法：

K-means和DBScan之间存在一些显着差异。

S.No.	K-means Clustering	DBScan Clustering
1.	Clusters formed are more or less spherical or convex in shape and must have same feature size.	Clusters formed are arbitrary in shape and may not have same feature size.
2.	K-means clustering is sensitive to the number of clusters specified.	Number of clusters need not be specified.
3.	K-means Clustering is more efficient for large datasets.	DBSCan Clustering can not efficiently handle high dimensional datasets.
4.	K-means Clustering does not work well with outliers and noisy datasets.	DBScan clustering efficiently handles outliers and noisy datasets.
5.	In the domain of anomaly detection, this algorithm causes problems as anomalous points will be assigned to the same cluster as “normal” data points.	DBScan algorithm, on the other hand, locates regions of high density that are separated from one another by regions of low density.
6.	It requires one parameter : Number of clusters (K)	It requires two parameters : Radius(R) and Minimum Points(M) R determines a chosen radius such that if it includes enough points within it, it is a dense area. M determines the minimum number of data points required in a neighborhood to be defined as a cluster.
7.	Varying densities of the data points doesn’t affect K-means clustering algorithm.	DBScan clustering does not work very well for sparse datasets or for data points with varying density.