如何在 Pandas 中的某些匹配条件下 LEFT ANTI 加入
LEFT ANTI Join与 semi-join 相反。不包括交集,它返回左表。它只返回左表中的列,而不是右表。
方法一:使用 isin()
在创建的数据帧上,我们使用 isin()函数执行左连接和子集,以检查合并数据集的部分是否在合并数据集的子集中。
语法:
DataFrame.isin(values)
Parameters:
- values: iterable, Series, DataFrame or dict
Returns:
DataFrame
示例:
在下面的代码中,我们使用指标来查找“Left_only”的行并将合并的数据集子集,并将其分配给 df。最后,我们检索仅在我们的第一个数据帧 df1 中的部分。输出是两个数据帧的反连接。
Python3
# importing packages
import pandas as pd
# anti-join
# creating dataframes using pd.DataFrame() method.
df1 = pd.DataFrame({
"city": ["new york", "chicago", "orlando", 'mumbai'],
"temperature": [21, 14, 35, 30],
"humidity": [65, 68, 75, 75],
})
df2 = pd.DataFrame({
"city": ["chicago", "new york", "orlando"],
"humidity": [67, 60, 70]
})
# carrying out anti join using merge method
df3 = df1.merge(df2, on='city', how='left', indicator=True)
df = df3.loc[df3['_merge'] == 'left_only', 'city']
d = df1[df1['city'].isin(df)]
print(d)
Python3
# code
import pandas as pd
# inverse of semi-join:
# creating dataframes using pd.DataFrame() method.
df1 = pd.DataFrame({
"city": ["new york", "chicago", "orlando", 'mumbai'],
"temperature": [21, 14, 35, 30],
"humidity": [65, 68, 75, 75],
})
df2 = pd.DataFrame({
"city": ["chicago", "new york", "orlando"],
"humidity": [67, 60, 70]
})
# carrying out anti join using merge method
df3 = df1.merge(df2, on='city')
df = df1[~df1['city'].isin(df3['city'])]
print(df)
输出:
city temperature humidity
3 mumbai 30 75
方法二:使用半连接
我们可以在半连接上使用“~”运算符。它导致反加入。
半连接:与内连接类似,半连接返回交集,但它只返回左表中的列,而不是右表。它没有重复的值。
Syntax:
[~df1[‘column_name’].isin(df2[‘column_name’])]
where,
- df1 is the first dataframe
- df2 is the second dataframe
- column_name is the matching column in both the dataframes
示例:
在这个例子中,我们在'city'上合并df1和df2,默认它是'inner join',合并后,我们排除df3中的df1部分并打印出结果数据帧。
Python3
# code
import pandas as pd
# inverse of semi-join:
# creating dataframes using pd.DataFrame() method.
df1 = pd.DataFrame({
"city": ["new york", "chicago", "orlando", 'mumbai'],
"temperature": [21, 14, 35, 30],
"humidity": [65, 68, 75, 75],
})
df2 = pd.DataFrame({
"city": ["chicago", "new york", "orlando"],
"humidity": [67, 60, 70]
})
# carrying out anti join using merge method
df3 = df1.merge(df2, on='city')
df = df1[~df1['city'].isin(df3['city'])]
print(df)
输出:
city temperature humidity
3 mumbai 30 75