在 R 中使用 Dplyr 删除基于多列的重复行
在本文中,我们将学习如何使用 R 编程语言中的 dplyr 删除基于多列的重复行。
使用中的数据框:
lang value usage
1 Java 21 21
2 C 21 21
3 Python 3 0
4 GO 5 99
5 RUST 180 44
6 Javascript 9 48
7 Cpp 12 53
8 Java 21 21
9 Julia 6 6
10 Typescript 0 8
11 Python 3 0
12 GO 6 6
基于单列删除重复行
distinct()函数可用于过滤掉重复的行。我们只需要在distinct()函数中将我们的 R 对象和列名作为参数传递。
注意:我们在函数使用了这个参数“ .keep_all= TRUE ”,因为默认情况下它是FALSE,它只会打印指定列的不同值,但我们想要所有列,所以我们必须让它为TRUE,例如它将打印所有其他列以及当前列。
Syntax: distinct(df, column_name, .keep_all= TRUE)
Parameters:
df: dataframe object
column_name: column name based on which duplicate rows will be removed
示例:基于单列删除重复行的 R 程序
R
library(dplyr)
df <- data.frame (lang =c ('Java','C','Python','GO','RUST','Javascript',
'Cpp','Java','Julia','Typescript','Python','GO'),
value = c (21,21,3,5,180,9,12,21,6,0,3,6),
usage =c(21,21,0,99,44,48,53,21,6,8,0,6))
distinct(df, lang, .keep_all= TRUE)
R
library(dplyr)
df <- data.frame (lang =c ('Java','C','Python','GO','RUST','Javascript',
'Cpp','Java','Julia','Typescript','Python','GO'),
value = c (21,21,3,5,180,9,12,21,6,0,3,6),
usage =c(21,21,0,99,44,48,53,21,6,8,0,6))
distinct(df, value, usage, .keep_all= TRUE)
R
library(dplyr)
df <- data.frame (lang =c ('Java','C','Python','GO','RUST','Javascript',
'Cpp','Java','Julia','Typescript','Python','GO'),
value = c (21,21,3,5,180,9,12,21,6,0,3,6),
usage =c(21,21,0,99,44,48,53,21,6,8,0,6))
distinct(df)
R
library(dplyr)
df <- data.frame (lang =c ('Java','C','Python','GO','RUST','Javascript',
'Cpp','Java','Julia','Typescript','Python','GO'),
value = c (21,21,3,5,180,9,12,21,6,0,3,6),
usage =c(21,21,0,99,44,48,53,21,6,8,0,6))
df %>%
filter(!duplicated(cbind(value, usage)))
输出:
lang value usage
1 Java 21 21
2 C 21 21
3 Python 3 0
4 GO 5 99
5 RUST 180 44
6 Javascript 9 48
7 Cpp 12 53
8 Julia 6 6
9 Typescript 0 8
基于多列删除重复行
我们可以在' value '的基础上去除重复值 & “用法”列,绕过这些列名作为不同函数的参数。
Syntax: distinct(df, col1,col2, .keep_all= TRUE)
Parameters:
df: dataframe object
col1,col2: column name based on which duplicate rows will be removed
示例:基于多列删除重复行的R程序
电阻
library(dplyr)
df <- data.frame (lang =c ('Java','C','Python','GO','RUST','Javascript',
'Cpp','Java','Julia','Typescript','Python','GO'),
value = c (21,21,3,5,180,9,12,21,6,0,3,6),
usage =c(21,21,0,99,44,48,53,21,6,8,0,6))
distinct(df, value, usage, .keep_all= TRUE)
输出:
lang value usage
1 Java 21 21
2 Python 3 0
3 GO 5 99
4 RUST 180 44
5 Javascript 9 48
6 Cpp 12 53
7 Julia 6 6
8 Typescript 0 8
从数据框中删除所有重复的行
在这种情况下,我们只需将整个数据帧作为参数传递给distinct ()函数,然后它会检查所有变量/列的所有重复行并删除它们。
Syntax: distinct(df)
Parameters:
df: dataframe object
示例:从数据库中删除所有重复行的 R 程序
电阻
library(dplyr)
df <- data.frame (lang =c ('Java','C','Python','GO','RUST','Javascript',
'Cpp','Java','Julia','Typescript','Python','GO'),
value = c (21,21,3,5,180,9,12,21,6,0,3,6),
usage =c(21,21,0,99,44,48,53,21,6,8,0,6))
distinct(df)
输出:
lang value usage
1 Java 21 21
2 C 21 21
3 Python 3 0
4 GO 5 99
5 RUST 180 44
6 Javascript 9 48
7 Cpp 12 53
8 Julia 6 6
9 Typescript 0 8
10 GO 6 6
使用重复()函数
在这种方法中,我们使用了duplicated()来删除所有重复的行,这里使用duplicated函数 检查重复的行,然后在重复的函数中传递列名/变量。
注意:我们使用了NOT ( ! )运算符,因为我们想过滤掉或删除重复的行,因为重复的函数提供了重复的行,我们使用 ' ! '运算符。
Syntax:
df %>%
filter(!duplicated(cbind(col1, col2,..)))
Parameters:
col1,col2: Pass the names of columns based on which you want to remove duplicated values
cbind():It is used to bind together column names such that multiple column names can be used for filtering
duplicated(): returns the duplicate rows
示例:使用duplicate()删除重复项的R程序
电阻
library(dplyr)
df <- data.frame (lang =c ('Java','C','Python','GO','RUST','Javascript',
'Cpp','Java','Julia','Typescript','Python','GO'),
value = c (21,21,3,5,180,9,12,21,6,0,3,6),
usage =c(21,21,0,99,44,48,53,21,6,8,0,6))
df %>%
filter(!duplicated(cbind(value, usage)))
输出:
lang value usage
1 Java 21 21
2 Python 3 0
3 GO 5 99
4 RUST 180 44
5 Javascript 9 48
6 Cpp 12 53
7 Julia 6 6
8 Typescript 0 8