📌  相关文章
📜  如何计算 Pandas 数据框列的不同值?

📅  最后修改于: 2022-05-13 01:55:45.279000             🧑  作者: Mango

如何计算 Pandas 数据框列的不同值?

让我们看看如何计算 Pandas 数据框列的不同值?

考虑下面给出的表格结构,它必须创建为 Dataframe。列是身高、体重和年龄。 8 个学生的记录形成行。

 heightweightage
Steve165   63.5   20
Ria165    64  22
Nivi164   63.522
Jane158    5421
Kate167   63.523
Lucy160   6222
Ram158    6420
Niki1656421

第一步是为上述表格创建数据框。看看下面的代码片段。

Python3
# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# show the Dataframe
df


Python3
# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# variable to hold the count
cnt = 0
  
# list to hold visited values
visited = []
  
# loop for counting the unique
# values in height
for i in range(0, len(df['height'])):
    
    if df['height'][i] not in visited: 
        
        visited.append(df['height'][i])
          
        cnt += 1
  
print("No.of.unique values :",
      cnt)
  
print("unique values :",
      visited)


Python3
# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# counting unique values
n = len(pd.unique(df['height']))
  
print("No.of.unique values :", 
      n)


Python3
# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# check the values of 
# each row for each column
n = df.nunique(axis=0)
  
print("No.of.unique values in each column :\n",
      n)


Python3
# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# count no. of unique 
# values in height column
n = df.height.nunique()
  
print("No.of.unique values in height column :",
      n)


Python3
# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
  
# getting the list of unique values
li = list(df.height.value_counts())
  
# print the unique value counts
print("No.of.unique values :",
      len(li))


输出:

数据框

方法一:使用for循环。

Dataframe 已经创建,可以使用for 循环进行硬编码并计算特定列中唯一值的数量。例如在上表中,如果希望计算列height中唯一值的数量。这个想法是使用一个变量cnt来存储计数和一个访问过的具有先前访问过的值的列表。然后 for 循环遍历 'height' 列,对于每个值,它检查是否已在访问列表中访问了相同的值。如果之前没有访问过该值,则计数加 1。

下面是实现:

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# variable to hold the count
cnt = 0
  
# list to hold visited values
visited = []
  
# loop for counting the unique
# values in height
for i in range(0, len(df['height'])):
    
    if df['height'][i] not in visited: 
        
        visited.append(df['height'][i])
          
        cnt += 1
  
print("No.of.unique values :",
      cnt)
  
print("unique values :",
      visited)

输出 :

No.of.unique values : 5
unique values : [165, 164, 158, 167, 160]

但是当 Dataframe 的大小增加并包含数千行和列时,这种方法效率不高。为了提高效率,下面列出了三种可用的方法:

  • pandas.unique()
  • Dataframe.nunique()
  • Series.value_counts()

方法2:使用unique()。

unique 方法将一维数组或 Series 作为输入,并返回其中的唯一项列表。返回值是一个 NumPy 数组,其中的内容基于传递的输入。如果提供索引作为输入,则返回值也将是唯一值的索引。

例子:

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# counting unique values
n = len(pd.unique(df['height']))
  
print("No.of.unique values :", 
      n)

输出:

No.of.unique values : 5

方法3:使用 Dataframe.nunique()

此方法返回指定轴中唯一值的计数。语法是:

例子:

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# check the values of 
# each row for each column
n = df.nunique(axis=0)
  
print("No.of.unique values in each column :\n",
      n)

输出:

No.of.unique values in each column :
height    5
weight    4
age       4
dtype: int64

要获取指定列中唯一值的数量:

例子:

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# count no. of unique 
# values in height column
n = df.height.nunique()
  
print("No.of.unique values in height column :",
      n)

输出:

No.of.unique values in height column : 5

方法3:使用 Series.value_counts()

此方法返回指定列中所有唯一值的计数。

例子:

Python3

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
  
# getting the list of unique values
li = list(df.height.value_counts())
  
# print the unique value counts
print("No.of.unique values :",
      len(li))

输出:

No.of.unique values : 5