主成分分析 (PCA): PCA 是一种用于超高维数据的无监督线性降维和数据可视化技术。由于拥有高维数据很难从中获得洞察力,因此计算量非常大。该技术背后的主要思想是通过将原始向量集转换为称为主成分的新集来降低高度相关数据的维数。
PCA 尝试保留数据的全局结构,即当将 d 维数据转换为 d’ 维数据时,它会尝试将所有集群映射为一个整体,因为局部结构可能会丢失。该技术的应用包括噪声过滤、特征提取、股市预测和基因数据分析。
t-分布式随机邻域嵌入 (t-SNE): t-SNE 也是一种无监督的非线性降维和数据可视化技术。 t-SNE 背后的数学非常复杂,但想法很简单。它将点从较高维度嵌入到较低维度,试图保留该点的邻域。
与 PCA 不同,它试图通过最小化两个分布之间相对于地图中点的位置的 Kullback-Leibler 散度(KL 散度)来保留数据的局部结构。该技术可应用于计算机安全研究、音乐分析、癌症研究、生物信息学和生物医学信号处理。
PCA 和 t-SNE 之间的差异表
S.NO. | PCA | t-SNE |
---|---|---|
1. | It is a linear Dimensionality reduction technique. | It is a non-linear Dimensionality reduction technique. |
2. | It tries to preserve the global structure of the data. | It tries to preserve the local structure(cluster) of data. |
3. | It does not work well as compared to t-SNE. | It is one of the best dimensionality reduction technique. |
4. | It does not involve Hyperparameters. | It involves Hyperparameters such as perplexity, learning rate and number of steps. |
5. | It gets highly affected by outliers. | It can handle outliers. |
6. | PCA is a deterministic algorithm. | It is a non-deterministic or randomised algorithm. |
7. | It works by rotating the vectors for preserving variance. | It works by minimising the distance between the point in a guassian. |
8. | We can find decide on how much variance to preserve using eigen values. | We cannot preserve variance instead we can preserve distance using hyperparameters. |