数据清洗和数据处理的区别

数据处理：它被定义为收集、操作和处理收集到的数据以用于所需用途。这是一项将数据从给定形式转换为更有用和更需要的形式的任务，即使其更有意义和信息量更大。使用机器学习算法、数学建模和统计知识，整个过程可以自动化。这似乎很简单，但当涉及到 Twitter、Facebook 等真正的大型组织、议会、联合国教科文组织和卫生部门组织等行政机构时，整个过程需要以非常结构化的方式进行。因此，执行步骤如下：

数据清理：数据清理是修复或删除数据集中不正确、损坏、格式不正确、重复或不完整数据的过程。它是机器学习的重要组成部分之一。它在构建模型中起着重要作用。数据清理是每个人都在做但没有人真正谈论的事情之一。它肯定不是机器学习中最精彩的部分，同时，也没有任何隐藏的技巧或秘密需要揭开。但是，适当的数据清理可以成就或破坏您的项目。数据清理涉及的步骤 –

数据清理

数据处理与数据清理

Sr. no.	Data Processing	Data Cleaning
1	Data Processing is done after data cleaning	Data Cleaning is done before data Processing
2	Data Processing requires necessary storage hardware like Ram, Graphical Processing units etc for processing the data	Data Cleaning doesn’t require hardware tools.
3	Data Processing Frameworks like Hadoop, Pig Frameworks etc	Data Cleaning involves Removing Noisy data etc. No special Frameworks are used.
4	Data Processing is difficult when compared to data cleaning.	Data Cleaning is easier than data Processing.
5	Examples: Loading Student data in Hadoop Cluster(data storage) and retrieving (processing)the marks less than 60 percent. Percentage calculation.	Examples: Finding the fraud data like age of the student is greater than the range and Percentage is not more than 100. Check whether the marks is not inserted or not. If not, we can verify and place the correct data in place of missed data.