如何删除 R DataFrame 中的重复行?
在本文中,我们将讨论如何在 R 编程语言中删除数据框中的重复行。
正在使用的数据集:
方法一:使用 distinct()
此方法在 dplyr 包中可用,用于从数据框中获取唯一行。我们可以从整个中删除重复的行,也可以删除特定列中的重复行。
语法:
distinct(dataframe)
distinct(dataframe,column1,column2,.,column n)
示例:使用 distinct()函数删除重复行的 R 程序
R
# load the package
library(dplyr)
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
"deepu","manoj","bobby") ,
id=c(1,2,3,4,1,2),
subjects=c("java","python","php",
"html","java","python"))
# remove all duplicate rows
print(distinct(data))
# remove duplicate rows in subjects column
print(distinct(data,subjects))
# remove duplicate rows in namescolumn
print(distinct(data,names))
R
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
"deepu","manoj","bobby") ,
id=c(1,2,3,4,1,2),
subjects=c("java","python","php",
"html","java","python"))
# remove duplicate rows in subjects column
print(data[!duplicated(data$subjects), ])
# remove duplicate rows in names column
print(data[!duplicated(data$names), ])
# remove duplicate rows in id column
print(data[!duplicated(data$id), ])
R
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
"deepu","manoj","bobby") ,
id=c(1,2,3,4,1,2),
subjects=c("java","python","php",
"html","java","python"))
# remove duplicate rows in subjects column
print(unique(data$subjects))
# remove duplicate rows in names column
print(unique(data$names))
# remove duplicate rows in id column
print(unique(data$id))
R
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
"deepu","manoj","bobby") ,
id=c(1,2,3,4,1,2),
subjects=c("java","python","php",
"html","java","python"))
# remove duplicate rows in entire dataframe
print(unique(data))
输出:
方法2:使用duplicated()
此函数将从数据框中返回重复项,为了获得唯一的行,我们必须指定!此方法之前的运算符
句法:
data[!duplicated(data$column_name), ]
在哪里,
- 数据是输入数据框
- column_name 是该列中删除重复项的列
示例:使用 duplicated()函数删除重复行的 R 程序
R
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
"deepu","manoj","bobby") ,
id=c(1,2,3,4,1,2),
subjects=c("java","python","php",
"html","java","python"))
# remove duplicate rows in subjects column
print(data[!duplicated(data$subjects), ])
# remove duplicate rows in names column
print(data[!duplicated(data$names), ])
# remove duplicate rows in id column
print(data[!duplicated(data$id), ])
输出:
方法 3:使用 unique()
这将从数据框中获取唯一的行。
句法:
unique(dataframe)
进入特定列
句法:
unique(dataframe$column_name
示例:使用 unique()函数删除重复行的 R 程序
R
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
"deepu","manoj","bobby") ,
id=c(1,2,3,4,1,2),
subjects=c("java","python","php",
"html","java","python"))
# remove duplicate rows in subjects column
print(unique(data$subjects))
# remove duplicate rows in names column
print(unique(data$names))
# remove duplicate rows in id column
print(unique(data$id))
输出:
[1] "java" "python" "php" "html"
[1] "manoj" "bobby" "sravan" "deepu"
[1] 1 2 3 4
示例:在整个数据帧中应用 unique()函数的 R 程序
R
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
"deepu","manoj","bobby") ,
id=c(1,2,3,4,1,2),
subjects=c("java","python","php",
"html","java","python"))
# remove duplicate rows in entire dataframe
print(unique(data))
输出: