如何找到 R 中两个数据帧之间的差异?
在本文中,我们将讨论如何在 R 编程语言中找到两个数据帧之间的差异或比较两个数据帧或数据集。
方法一:使用Intersect函数
R 中的 Intersect函数有助于获取两个数据集中的公共元素。
句法:
intersect(names(data_short), names(data_long))
例子:
R
first <-
data.frame(
"1" = c('0.44','0.554','0.67','0.64'),
"2" = c('0.124','0.22','0.82','0.994'),
"3" = c('0.82','1.22','0.73','1.23')
)
second <-
data.frame(
"1" = runif(4),
"2" = runif(4),
"3" = runif(4),
"d" = runif(4),
"e" = runif(4)
)
second[intersect(names(first), names(second))]
R
first <-
data.frame(
"1" = c('0.44','0.554','0.67','0.64'),
"2" = c('0.124','0.22','0.82','0.994'),
"3" = c('0.82','1.22','0.73','1.23')
)
second <-
data.frame(
"1" = runif(4),
"2" = runif(4),
"3" = runif(4),
"d" = runif(4),
"e" = runif(4)
)
second[setdiff(names(second), names(first))]
R
library("dplyr")
first <-
data.frame(
"1" = c('0.44','0.554','0.67','0.64'),
"2" = c('0.124','0.22','0.82','0.994'),
"3" = c('0.82','1.22','0.73','1.23')
)
second <-
data.frame(
"1" = runif(4),
"2" = runif(4),
"3" = runif(4),
"d" = runif(4),
"e" = runif(4)
)
second%>%select(which(!(colnames(second) %in% colnames(first))))
输出:
1 2 3
1 0.562627228 0.9391250 0.6437934
2 0.003867576 0.7131200 0.9313777
3 0.129852760 0.2657934 0.9291285
4 0.325867139 0.2367633 0.1211350
方法二:使用setdiff()
与 intersect 不同,此函数有助于查看第一个数据框中缺少的列。
句法:
setdiff( dataframe2, dataframe 1)
例子:
电阻
first <-
data.frame(
"1" = c('0.44','0.554','0.67','0.64'),
"2" = c('0.124','0.22','0.82','0.994'),
"3" = c('0.82','1.22','0.73','1.23')
)
second <-
data.frame(
"1" = runif(4),
"2" = runif(4),
"3" = runif(4),
"d" = runif(4),
"e" = runif(4)
)
second[setdiff(names(second), names(first))]
输出:
d e
1 0.7899783 0.04363003
2 0.9167861 0.39865991
3 0.3314494 0.13963663
4 0.7005957 0.73401069
方法 3:使用 colnames 和 dplyr
我们将从 dplyr 中选择以获取将对其执行某些操作的数据帧的列,以获得两个数据帧之间所需的差异。
例子:
电阻
library("dplyr")
first <-
data.frame(
"1" = c('0.44','0.554','0.67','0.64'),
"2" = c('0.124','0.22','0.82','0.994'),
"3" = c('0.82','1.22','0.73','1.23')
)
second <-
data.frame(
"1" = runif(4),
"2" = runif(4),
"3" = runif(4),
"d" = runif(4),
"e" = runif(4)
)
second%>%select(which(!(colnames(second) %in% colnames(first))))
输出:
d e
1 0.7899783 0.04363003
2 0.9167861 0.39865991
3 0.3314494 0.13963663
4 0.7005957 0.73401069