如何计算 Pandas 数据框列的不同值？

让我们看看如何计算 Pandas 数据框列的不同值？

考虑下面给出的表格结构，它必须创建为 Dataframe。列是身高、体重和年龄。 8 个学生的记录形成行。

	height	weight	age
Steve	165	63.5	20
Ria	165	64	22
Nivi	164	63.5	22
Jane	158	54	21
Kate	167	63.5	23
Lucy	160	62	22
Ram	158	64	20
Niki	165	64	21

第一步是为上述表格创建数据框。看看下面的代码片段。

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# show the Dataframe
df

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# variable to hold the count
cnt = 0
  
# list to hold visited values
visited = []
  
# loop for counting the unique
# values in height
for i in range(0, len(df['height'])):
    
    if df['height'][i] not in visited: 
        
        visited.append(df['height'][i])
          
        cnt += 1
  
print("No.of.unique values :",
      cnt)
  
print("unique values :",
      visited)

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# counting unique values
n = len(pd.unique(df['height']))
  
print("No.of.unique values :", 
      n)

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# check the values of 
# each row for each column
n = df.nunique(axis=0)
  
print("No.of.unique values in each column :\n",
      n)

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# count no. of unique 
# values in height column
n = df.height.nunique()
  
print("No.of.unique values in height column :",
      n)

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
  
# getting the list of unique values
li = list(df.height.value_counts())
  
# print the unique value counts
print("No.of.unique values :",
      len(li))

输出：

数据框

方法一：使用for循环。

Dataframe 已经创建，可以使用for 循环进行硬编码并计算特定列中唯一值的数量。例如在上表中，如果希望计算列height中唯一值的数量。这个想法是使用一个变量cnt来存储计数和一个访问过的具有先前访问过的值的列表。然后 for 循环遍历 'height' 列，对于每个值，它检查是否已在访问列表中访问了相同的值。如果之前没有访问过该值，则计数加 1。

下面是实现：

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# variable to hold the count
cnt = 0
  
# list to hold visited values
visited = []
  
# loop for counting the unique
# values in height
for i in range(0, len(df['height'])):
    
    if df['height'][i] not in visited: 
        
        visited.append(df['height'][i])
          
        cnt += 1
  
print("No.of.unique values :",
      cnt)
  
print("unique values :",
      visited)

输出：

No.of.unique values : 5
unique values : [165, 164, 158, 167, 160]

但是当 Dataframe 的大小增加并包含数千行和列时，这种方法效率不高。为了提高效率，下面列出了三种可用的方法：

pandas.unique()
Dataframe.nunique()
Series.value_counts()

方法2：使用unique()。

unique 方法将一维数组或 Series 作为输入，并返回其中的唯一项列表。返回值是一个 NumPy 数组，其中的内容基于传递的输入。如果提供索引作为输入，则返回值也将是唯一值的索引。

Syntax: pandas.unique(Series)

编程需要懂一点英语

例子：

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# counting unique values
n = len(pd.unique(df['height']))
  
print("No.of.unique values :", 
      n)

输出：

No.of.unique values : 5

方法3：使用 Dataframe.nunique() 。

此方法返回指定轴中唯一值的计数。语法是：

Syntax: Dataframe.nunique (axis=0/1, dropna=True/False)

编程需要懂一点英语

例子：

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# check the values of 
# each row for each column
n = df.nunique(axis=0)
  
print("No.of.unique values in each column :\n",
      n)

输出：

No.of.unique values in each column :
height    5
weight    4
age       4
dtype: int64

要获取指定列中唯一值的数量：

Syntax: Dataframe.col_name.nunique()

编程需要懂一点英语

例子：

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# count no. of unique 
# values in height column
n = df.height.nunique()
  
print("No.of.unique values in height column :",
      n)

输出：

No.of.unique values in height column : 5

方法3：使用 Series.value_counts() 。

此方法返回指定列中所有唯一值的计数。

Syntax: Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

编程需要懂一点英语

例子：

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
  
# getting the list of unique values
li = list(df.height.value_counts())
  
# print the unique value counts
print("No.of.unique values :",
      len(li))

输出：

No.of.unique values : 5