在 R 中查找不在其他数据框中的行
查找一个数据帧中存在而另一个数据帧中不存在的行称为集差。在本文中,我们将看到执行相同操作的不同方法。
方法一:使用sqldf()
在此方法中,只需传递查找 set-difference 的 sql 查询
句法:
sqldf(“sql query”)
我们的查询将是 sqldf('SELECT * FROM df1 EXCEPT SELECT * FROM df2')。它将排除 df1 中也存在于 df2 中的所有行,并将仅返回仅存在于 df1 中的行。
示例 1:
R
require(sqldf)
df1 <- data.frame(a = 1:5, b=letters[1:5])
df2 <- data.frame(a = 1:3, b=letters[1:3])
print("df1 is ")
print(df1)
print("df2 is ")
print(df2)
res <- sqldf('SELECT * FROM df1 EXCEPT SELECT * FROM df2')
print("rows from df1 which are not in df2")
print(res)
R
require(sqldf)
df1 <- data.frame(name = c("kapil","sachin","rahul"), age=c(23,22,26))
df2 <- data.frame(name = c("kapil"), age = c(23))
print("df1 is ")
print(df1)
print("df2 is ")
print(df2)
res <- sqldf('SELECT * FROM df1 EXCEPT SELECT * FROM a2')
print("rows from df1 which are not in df2")
print(res)
R
df1 <- data.frame(a = 1:5, b=letters[1:5], c= c(1,3,5,7,9))
df2 <- data.frame(a = 1:5, b=letters[1:5], c = c(2,4,6,8,10))
print("df1 is ")
print(df1)
print("df2 is ")
print(df2)
res <-setdiff(df1, df2)
print("rows from df1 which are not in df2")
print(res)
R
df1 <- data.frame(name = c("kapil","sachin","rahul"), age=c(23,22,26))
df2 <- data.frame(name = c("kapil","rahul", "sachin"), age = c(23, 22, 26))
print("df1 is ")
print(df1)
print("df2 is ")
print(df2)
res <- setdiff(df1, df2)
print("rows from df1 which are not in df2")
print(res)
示例 2:
电阻
require(sqldf)
df1 <- data.frame(name = c("kapil","sachin","rahul"), age=c(23,22,26))
df2 <- data.frame(name = c("kapil"), age = c(23))
print("df1 is ")
print(df1)
print("df2 is ")
print(df2)
res <- sqldf('SELECT * FROM df1 EXCEPT SELECT * FROM a2')
print("rows from df1 which are not in df2")
print(res)
方法 2:使用setdiff()
这是一个 R 内置函数,用于查找两个数据帧的集差。
句法:
setdiff(df1,df2)
它将返回 df1 中不存在于 df2 中的行。
示例 1:
电阻
df1 <- data.frame(a = 1:5, b=letters[1:5], c= c(1,3,5,7,9))
df2 <- data.frame(a = 1:5, b=letters[1:5], c = c(2,4,6,8,10))
print("df1 is ")
print(df1)
print("df2 is ")
print(df2)
res <-setdiff(df1, df2)
print("rows from df1 which are not in df2")
print(res)
输出:
示例 2:
电阻
df1 <- data.frame(name = c("kapil","sachin","rahul"), age=c(23,22,26))
df2 <- data.frame(name = c("kapil","rahul", "sachin"), age = c(23, 22, 26))
print("df1 is ")
print(df1)
print("df2 is ")
print(df2)
res <- setdiff(df1, df2)
print("rows from df1 which are not in df2")
print(res)
输出: