📜  Pandas – 从多列中查找唯一值

📅  最后修改于: 2022-05-13 01:55:39.162000             🧑  作者: Mango

Pandas – 从多列中查找唯一值

先决条件:熊猫

在本文中,我们将讨论从 Pandas DataFrame 的多列中获取唯一值的各种方法。

方法一:使用 pandas Unique() 和 Concat() 方法

Pandas 系列又名列有一个 unique() 方法,它仅从列中过滤掉唯一值。第一个输出仅显示唯一的名字。我们可以使用 pandas concat()方法扩展此方法,并将所有所需的列连接到 1 个单列中,然后找到结果列的唯一值。

Python3
import pandas as pd
import numpy as np
 
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
                                 'Prateek', 'Pyare', 'Prateek'],
                    
                   'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
                                'Lal', 'Mishra'],
                    
                   'Age': [26, 25, 25, 27, 28, 30]})
 
# To get unique values in 1 series/column
print(f"Unique FN: {df['FirstName'].unique()}")
 
# Extending the idea from 1 column to multiple columns
print(f"Unique Values from 3 Columns:\
{pd.concat([df['FirstName'],df['LastName'],df['Age']]).unique()}")


Python3
import pandas as pd
import numpy as np
 
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
                                 'Prateek', 'Pyare', 'Prateek'],
                    
                   'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
                                'Lal', 'Mishra'],
                    
                   'Age': [26, 25, 25, 27, 28, 30]})
 
print(np.unique(df[['LastName', 'FirstName']].values))
 
# Will throw error as Age is numerical datatype
# and LastName is str
# print(np.unique(df[['LastName','Age']].values))


Python3
import pandas as pd
import numpy as np
 
 
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
                                 'Prateek', 'Pyare', 'Prateek'],
                    
                   'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
                                'Lal', 'Mishra'],
                    
                   'Age': [26, 25, 25, 27, 28, 30]})
 
# Typecasting pandas series into set and then
# taking set union (|)
print(set(df.FirstName) | set(df.LastName) | set(df.Age))


输出:

方法 2:使用 Numpy.unique() 方法

在 np.unique() 方法的帮助下,我们可以从 np.unique() 方法中作为参数给出的数组中获取唯一值。

注意:这种方法有一个限制,即我们不能将 str 和数值列组合在一起因此如果出现这种情况,我们需要将不同数据类型的列组合在一起,那么请使用方法 1。

蟒蛇3



import pandas as pd
import numpy as np
 
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
                                 'Prateek', 'Pyare', 'Prateek'],
                    
                   'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
                                'Lal', 'Mishra'],
                    
                   'Age': [26, 25, 25, 27, 28, 30]})
 
print(np.unique(df[['LastName', 'FirstName']].values))
 
# Will throw error as Age is numerical datatype
# and LastName is str
# print(np.unique(df[['LastName','Age']].values))

输出:

方法 3:在Python使用集合

Set 具有仅包含唯一值的属性,因此我们将单个系列转换为 Set 对象,然后采用它们的集合并集。与方法 2 不同,这也适用于所有数据类型组合。

蟒蛇3

import pandas as pd
import numpy as np
 
 
# Creating a custom dataframe.
df = pd.DataFrame({'FirstName': ['Arun', 'Navneet', 'Shilpa',
                                 'Prateek', 'Pyare', 'Prateek'],
                    
                   'LastName': ['Singh', 'Yadav', 'Yadav', 'Shukla',
                                'Lal', 'Mishra'],
                    
                   'Age': [26, 25, 25, 27, 28, 30]})
 
# Typecasting pandas series into set and then
# taking set union (|)
print(set(df.FirstName) | set(df.LastName) | set(df.Age))

输出: