如何在 Pandas 中使用“NOT IN”过滤器?
在本文中,我们将讨论 Pandas 中的 NOT IN 过滤器,NOT IN 是一个成员运算符,用于检查数据是否存在于 dataframe 中。如果该值不存在,它将返回 true,否则返回 false
让我们创建一个示例数据框
Python3
# import pandas module
import pandas as pd
# create dataframe
data1 = pd.DataFrame({'name': ['sravan', 'harsha', 'jyothika'],
'subject1': ['python', 'R', 'php'],
'marks': [96, 89, 90]}, index=[0, 1, 2])
# display
data1
Python3
# import pandas module
import pandas as pd
# create dataframe
data1 = pd.DataFrame({'name': ['sravan', 'harsha', 'jyothika'],
'subject1': ['python', 'R', 'php'],
'marks': [96, 89, 90]}, index=[0, 1, 2])
# consider a list
list1 = ['harsha', 'jyothika']
# filter in name column
print(data1[~data1['name'].isin(list1)])
print("============")
# consider a list
list2 = ['R']
# filter in name column
print(data1[~data1['subject1'].isin(list2)])
print("============")
# consider a list
list3 = [96, 89]
# filter in name column
print(data1[~data1['marks'].isin(list3)])
Python3
# import pandas module
import pandas as pd
# create dataframe
data1 = pd.DataFrame({'name': ['sravan', 'harsha', 'jyothika'],
'subject1': ['python', 'R', 'php'],
'marks': [96, 89, 90]}, index=[0, 1, 2])
# consider a list
list1 = ['harsha', 'jyothika', 96]
# filter in name and marks column
print(data1[~data1[['name', 'marks']].isin(list1).any(axis=1)])
print("============")
# consider a list
list2 = ['R', 'sravan']
# filter in name and subject1 column
print(data1[~data1[['subject1', 'name']].isin(list2).any(axis=1)])
Python3
# import pandas module
import numpy as np
import pandas as pd
# create dataframe
data1 = pd.DataFrame({'name': ['sravan', 'harsha', 'jyothika'],
'subject1': ['python', 'R', 'php'],
'marks': [96, 89, 90]}, index=[0, 1, 2])
# consider a list
list1 = ['harsha', 'jyothika', 96]
# filter in name column
data1[~np.isin(data1['name'], list1)]
输出:
方法 1:对一列使用 NOT IN 过滤器
我们正在使用 isin()运算符来获取数据框中的给定值,并且这些值是从列表中获取的,因此我们正在过滤该列表中存在的数据框一列值。
Syntax: dataframe[~dataframe[column_name].isin(list)]
where
- dataframe is the input dataframe
- column_name is the column that is filtered
- list is the list of values to be removed in that column
Python3
# import pandas module
import pandas as pd
# create dataframe
data1 = pd.DataFrame({'name': ['sravan', 'harsha', 'jyothika'],
'subject1': ['python', 'R', 'php'],
'marks': [96, 89, 90]}, index=[0, 1, 2])
# consider a list
list1 = ['harsha', 'jyothika']
# filter in name column
print(data1[~data1['name'].isin(list1)])
print("============")
# consider a list
list2 = ['R']
# filter in name column
print(data1[~data1['subject1'].isin(list2)])
print("============")
# consider a list
list3 = [96, 89]
# filter in name column
print(data1[~data1['marks'].isin(list3)])
输出:
方法 2:对多列使用 NOT IN 过滤器
现在我们可以使用 any()函数过滤多个列。此函数将检查任何给定列中存在的值,并且列在 [[]] 中以逗号分隔。
Syntax: dataframe[~dataframe[[columns]].isin(list).any(axis=1)]
Python3
# import pandas module
import pandas as pd
# create dataframe
data1 = pd.DataFrame({'name': ['sravan', 'harsha', 'jyothika'],
'subject1': ['python', 'R', 'php'],
'marks': [96, 89, 90]}, index=[0, 1, 2])
# consider a list
list1 = ['harsha', 'jyothika', 96]
# filter in name and marks column
print(data1[~data1[['name', 'marks']].isin(list1).any(axis=1)])
print("============")
# consider a list
list2 = ['R', 'sravan']
# filter in name and subject1 column
print(data1[~data1[['subject1', 'name']].isin(list2).any(axis=1)])
输出:
方法 3:使用带有 NOT IN 过滤器的 numpy
这类似于上面的功能。
Syntax: dataframe[~numpy.isin(dataframe[‘column’], list)]
Python3
# import pandas module
import numpy as np
import pandas as pd
# create dataframe
data1 = pd.DataFrame({'name': ['sravan', 'harsha', 'jyothika'],
'subject1': ['python', 'R', 'php'],
'marks': [96, 89, 90]}, index=[0, 1, 2])
# consider a list
list1 = ['harsha', 'jyothika', 96]
# filter in name column
data1[~np.isin(data1['name'], list1)]
输出: