1. 降维:
它是一种用于获得原始数据的简化或压缩表示的技术。它进一步分为两个组成部分:
- 特征选择——
它是去除不相关或冗余特征的过程。 - 特征提取 –
它是将数据转换为适合建模的特征的过程。
2. 数量减少:
它是一种数据缩减技术,用于通过使用合适的数据表示形式来减少数据量。这些技术可以是参数的或非参数的。对于参数方法,使用模型来估计数据,因此通常只需要存储数据参数,而不是实际数据。用于存储数据简化表示的非参数方法包括直方图、聚类和采样。
降维和归约的区别:
Dimensionality Reduction | Numerosity Reduction |
---|---|
In dimensionality reduction, data encoding or data transformations are applied to obtain a reduced or compressed for of original data. | In Numerosity reduction, data volume is reduced by choosing suitable alternating forms of data representation. |
It can be used to remove irrelevant or redundant attributes. | It is merely a representation technique of original data into smaller form. |
In this method, some data can be lost which is irrelevant. | In this method, there is no loss of data. |
Methods for dimensionality reduction are:
|
Methods for Numerosity reduction are:
|
The components of dimensionality reduction are feature selection and feature extraction. | It has no components but methods that ensure reduction of data volume. |
It leads to less misleading data and more model accuracy. | It preserves the integrity of data and the data volume is also reduced. |