K-Means和DBScan集群之间的区别

📌 相关文章

📜 K-Means和DBScan集群之间的区别

📅 最后修改于: 2021-08-27 04:07:40 🧑 作者: Mango

聚类是无监督机器学习中的一种技术，它基于可用于数据集中数据点的信息的相似性将数据点分组为聚类。属于相同群集的数据点在某些方面彼此相似，而属于不同群集的数据项则不同。

K-means和DBScan (带噪声的应用程序的基于密度的空间聚类)是无监督机器学习中最流行的两种聚类算法。

1. K-Means聚类：
K均值是基于质心或基于分区的聚类算法。该算法将样本空间中的所有点划分为K个相似度组。通常使用欧几里德距离来衡量相似性。

算法如下：

算法：

2. DBScan集群：
DBScan是基于密度的聚类算法。该算法的关键事实是，在给定半径(R)内的群集中每个点的邻域必须具有最小数量的点(M)。实践证明，该算法在检测异常值和处理噪声方面非常有效。

算法如下：

算法：

K-means和DBScan之间存在一些显着差异。

S.No.	K-means Clustering	DBScan Clustering
1.	Clusters formed are more or less spherical or convex in shape and must have same feature size.	Clusters formed are arbitrary in shape and may not have same feature size.
2.	K-means clustering is sensitive to the number of clusters specified.	Number of clusters need not be specified.
3.	K-means Clustering is more efficient for large datasets.	DBSCan Clustering can not efficiently handle high dimensional datasets.
4.	K-means Clustering does not work well with outliers and noisy datasets.	DBScan clustering efficiently handles outliers and noisy datasets.
5.	In the domain of anomaly detection, this algorithm causes problems as anomalous points will be assigned to the same cluster as “normal” data points.	DBScan algorithm, on the other hand, locates regions of high density that are separated from one another by regions of low density.
6.	It requires one parameter : Number of clusters (K)	It requires two parameters : Radius(R) and Minimum Points(M) R determines a chosen radius such that if it includes enough points within it, it is a dense area. M determines the minimum number of data points required in a neighborhood to be defined as a cluster.
7.	Varying densities of the data points doesn’t affect K-means clustering algorithm.	DBScan clustering does not work very well for sparse datasets or for data points with varying density.