📜  如何删除 R DataFrame 中的重复行?

📅  最后修改于: 2022-05-13 01:55:42.331000             🧑  作者: Mango

如何删除 R DataFrame 中的重复行?

在本文中,我们将讨论如何在 R 编程语言中删除数据框中的重复行。

正在使用的数据集:

方法一:使用 distinct()

此方法在 dplyr 包中可用,用于从数据框中获取唯一行。我们可以从整个中删除重复的行,也可以删除特定列中的重复行。

语法

示例:使用 distinct()函数删除重复行的 R 程序

R
# load the package
library(dplyr)
 
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2),
                subjects=c("java","python","php",
                           "html","java","python"))
 
 
# remove all duplicate rows
print(distinct(data))
 
# remove  duplicate rows in subjects column
print(distinct(data,subjects))
 
# remove  duplicate rows in namescolumn
print(distinct(data,names))


R
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2),
                subjects=c("java","python","php",
                           "html","java","python"))
 
 
# remove duplicate rows in subjects column
print(data[!duplicated(data$subjects), ])
 
# remove  duplicate rows in names column
print(data[!duplicated(data$names), ])
 
# remove  duplicate rows in  id column
print(data[!duplicated(data$id), ])


R
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2),
                subjects=c("java","python","php",
                           "html","java","python"))
 
 
# remove duplicate rows in subjects column
print(unique(data$subjects))
 
# remove  duplicate rows in names column
print(unique(data$names))
 
# remove  duplicate rows in  id column
print(unique(data$id))


R
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2),
                subjects=c("java","python","php",
                           "html","java","python"))
 
 
# remove duplicate rows in entire dataframe
print(unique(data))


输出:

方法2:使用duplicated()

此函数将从数据框中返回重复项,为了获得唯一的行,我们必须指定!此方法之前的运算符

句法:

data[!duplicated(data$column_name), ]

在哪里,

  • 数据是输入数据框
  • column_name 是该列中删除重复项的列

示例:使用 duplicated()函数删除重复行的 R 程序

R

# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2),
                subjects=c("java","python","php",
                           "html","java","python"))
 
 
# remove duplicate rows in subjects column
print(data[!duplicated(data$subjects), ])
 
# remove  duplicate rows in names column
print(data[!duplicated(data$names), ])
 
# remove  duplicate rows in  id column
print(data[!duplicated(data$id), ])

输出:

方法 3:使用 unique()

这将从数据框中获取唯一的行。

句法:

unique(dataframe)

进入特定列

句法:

unique(dataframe$column_name

示例:使用 unique()函数删除重复行的 R 程序

R

# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2),
                subjects=c("java","python","php",
                           "html","java","python"))
 
 
# remove duplicate rows in subjects column
print(unique(data$subjects))
 
# remove  duplicate rows in names column
print(unique(data$names))
 
# remove  duplicate rows in  id column
print(unique(data$id))


输出:

[1] "java"   "python" "php"    "html"  
[1] "manoj"  "bobby"  "sravan" "deepu"  
[1] 1 2 3 4

示例:在整个数据帧中应用 unique()函数的 R 程序

R

# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2),
                subjects=c("java","python","php",
                           "html","java","python"))
 
 
# remove duplicate rows in entire dataframe
print(unique(data))

输出: