K均值和分层聚类之间的区别
k-means 是使用预先指定的编号进行聚类分析的方法。的集群。它需要预先了解“K”。
层次聚类也称为层次聚类分析(HCA)也是一种聚类分析方法,它试图在没有固定数量的聚类的情况下建立聚类的层次结构。
K均值和分层聚类之间的主要区别是:k-means Clustering Hierarchical Clustering k-means, using a pre-specified number of clusters, the method assigns records to each cluster to find the mutually exclusive cluster of spherical shape based on distance. Hierarchical methods can be either divisive or agglomerative. K Means clustering needed advance knowledge of K i.e. no. of clusters one want to divide your data. In hierarchical clustering one can stop at any number of clusters, one find appropriate by interpreting the dendrogram. One can use median or mean as a cluster centre to represent each cluster. Agglomerative methods begin with ‘n’ clusters and sequentially combine similar clusters until only one cluster is obtained. Methods used are normally less computationally intensive and are suited with very large datasets. Divisive methods work in the opposite direction, beginning with one cluster that includes all the records and Hierarchical methods are especially useful when the target is to arrange the clusters into a natural hierarchy. In K Means clustering, since one start with random choice of clusters, the results produced by running the algorithm many times may differ. In Hierarchical Clustering, results are reproducible in Hierarchical clustering K- means clustering a simply a division of the set of data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset). A hierarchical clustering is a set of nested clusters that are arranged as a tree. K Means clustering is found to work well when the structure of the clusters is hyper spherical (like circle in 2D, sphere in 3D). Hierarchical clustering don’t work as well as, k means when the shape of the clusters is hyper spherical. Advantages: 1. Convergence is guaranteed. 2. Specialized to clusters of different sizes and shapes. Advantages: 1 .Ease of handling of any forms of similarity or distance. 2. Consequently, applicability to any attributes types. Disadvantages: 1. K-Value is difficult to predict 2. Didn’t work well with global cluster. Disadvantage: 1. Hierarchical clustering requires the computation and storage of an n×n distance matrix. For very large datasets, this can be expensive and slow