Pandas – 从多列中查找唯一值
先决条件:熊猫
在本文中,我们将讨论从 Pandas DataFrame 的多列中获取唯一值的各种方法。
方法一:使用 pandas Unique() 和 Concat() 方法
Pandas 系列又名列有一个 unique() 方法,它仅从列中过滤掉唯一值。第一个输出仅显示唯一的名字。我们可以使用 pandas concat()方法扩展此方法,并将所有所需的列连接到 1 个单列中,然后找到结果列的唯一值。
Python3
import pandas as pd
import numpy as np
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
'Prateek', 'Pyare', 'Prateek'],
'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
'Lal', 'Mishra'],
'Age': [26, 25, 25, 27, 28, 30]})
# To get unique values in 1 series/column
print(f"Unique FN: {df['FirstName'].unique()}")
# Extending the idea from 1 column to multiple columns
print(f"Unique Values from 3 Columns:\
{pd.concat([df['FirstName'],df['LastName'],df['Age']]).unique()}")
Python3
import pandas as pd
import numpy as np
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
'Prateek', 'Pyare', 'Prateek'],
'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
'Lal', 'Mishra'],
'Age': [26, 25, 25, 27, 28, 30]})
print(np.unique(df[['LastName', 'FirstName']].values))
# Will throw error as Age is numerical datatype
# and LastName is str
# print(np.unique(df[['LastName','Age']].values))
Python3
import pandas as pd
import numpy as np
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
'Prateek', 'Pyare', 'Prateek'],
'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
'Lal', 'Mishra'],
'Age': [26, 25, 25, 27, 28, 30]})
# Typecasting pandas series into set and then
# taking set union (|)
print(set(df.FirstName) | set(df.LastName) | set(df.Age))
输出:
Unique FN: [‘Arun’ ‘Navneet’ ‘Shilpa’ ‘Prateek’ ‘Pyare’]
Unqiue Values from 3 Columns:[‘Arun’ ‘Navneet’ ‘Shilpa’ ‘Prateek’ ‘Pyare’ ‘Singh’ ‘Yadav’ ‘Shukla’
‘Lal’ ‘Mishra’ 26 25 27 28 30]
方法 2:使用 Numpy.unique() 方法
在 np.unique() 方法的帮助下,我们可以从 np.unique() 方法中作为参数给出的数组中获取唯一值。
注意:这种方法有一个限制,即我们不能将 str 和数值列组合在一起,因此如果出现这种情况,我们需要将不同数据类型的列组合在一起,那么请使用方法 1。
蟒蛇3
import pandas as pd
import numpy as np
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
'Prateek', 'Pyare', 'Prateek'],
'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
'Lal', 'Mishra'],
'Age': [26, 25, 25, 27, 28, 30]})
print(np.unique(df[['LastName', 'FirstName']].values))
# Will throw error as Age is numerical datatype
# and LastName is str
# print(np.unique(df[['LastName','Age']].values))
输出:
[‘Arun’ ‘Lal’ ‘Mishra’ ‘Navneet’ ‘Prateek’ ‘Pyare’ ‘Shilpa’ ‘Shukla’
‘Singh’ ‘Yadav’]
方法 3:在Python使用集合
Set 具有仅包含唯一值的属性,因此我们将单个系列转换为 Set 对象,然后采用它们的集合并集。与方法 2 不同,这也适用于所有数据类型组合。
蟒蛇3
import pandas as pd
import numpy as np
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
'Prateek', 'Pyare', 'Prateek'],
'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
'Lal', 'Mishra'],
'Age': [26, 25, 25, 27, 28, 30]})
# Typecasting pandas series into set and then
# taking set union (|)
print(set(df.FirstName) | set(df.LastName) | set(df.Age))
输出:
{‘Singh’, ‘Pyare’, ‘Mishra’, 27, ‘Navneet’, ‘Arun’, ‘Lal’, ‘Shukla’, 30, 25, 26, ‘Yadav’, 28, ‘Shilpa’, ‘Prateek’}