📌 相关文章

📜 识别和删除 R 中的重复数据

📅 最后修改于: 2022-05-13 01:55:09.650000 🧑 作者: Mango

识别和删除 R 中的重复数据

数据集可以具有重复值并保持其无冗余和准确，需要识别和删除重复的行。在本文中，我们将看到如何识别和删除 R 中的重复数据。首先，我们将检查数据中是否存在重复数据，如果是，则将其删除。

使用中的数据：

识别重复数据

为了识别，我们将使用duplicated()函数返回重复行的计数。

句法：

duplicated(dataframe)

编程需要懂一点英语

方法：

创建数据框
将其传递给duplicated()函数
此函数返回以布尔值形式重复的行
应用 sum函数来获取数字

例子：

R

# Creating a sample data frame of students 
# and their marks in respective subjects.
student_result=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))
  
# Printing data
student_result
duplicated(student_result)
sum(duplicated(student_result))

R

# Creating a sample data frame of students 
# and their marks in respective subjects.
student_result=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))
  
# Printing data
student_result
unique(student_result)

R

# Creating a sample data frame of students and 
# their marks in respective subjects.
student_result=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))
  
# Printing data
student_result
distinct(student_result)

R

# Creating a sample data frame of students and
# their marks in respective subjects.
student_result=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))
  
# Printing data
student_result
distinct(student_result,maths,.keep_all = TRUE)

输出：

> duplicated(student_result)

[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE

> sum(duplicated(student_result))

[1] 2

编程需要懂一点英语

删除重复数据

方法

创建数据框
选择唯一的行
检索这些行
显示结果

方法 1：使用 unique()

我们使用 unique() 来获取数据中具有唯一值的行。

句法：

unique(dataframe)

编程需要懂一点英语

例子：

电阻

# Creating a sample data frame of students 
# and their marks in respective subjects.
student_result=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))
  
# Printing data
student_result
unique(student_result)

输出：

方法 2：使用 distinct()

应安装包“tidyverse”并加载“dplyr”库以使用 distinct()。我们使用 distinct() 来获取数据中具有不同值的行。

Syntax:

distinct(dataframe,keepall)

Parameter:

dataframe: data in use
keepall: decides which variables to keep

编程需要懂一点英语

例子：

电阻

# Creating a sample data frame of students and 
# their marks in respective subjects.
student_result=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))
  
# Printing data
student_result
distinct(student_result)

输出：

示例 2：根据数学列打印唯一行

电阻

# Creating a sample data frame of students and
# their marks in respective subjects.
student_result=data.frame(name=c("Ram","Geeta","John","Paul",
                                 "Cassie","Geeta","Paul"),
                          maths=c(7,8,8,9,10,8,9),
                          science=c(5,7,6,8,9,7,8),
                          history=c(7,7,7,7,7,7,7))
  
# Printing data
student_result
distinct(student_result,maths,.keep_all = TRUE)

输出：