如何计算 Pandas 数据框列的不同值?
让我们看看如何计算 Pandas 数据框列的不同值?
考虑下面给出的表格结构,它必须创建为 Dataframe。列是身高、体重和年龄。 8 个学生的记录形成行。 height weight age Steve 165 63.5 20 Ria 165 64 22 Nivi 164 63.5 22 Jane 158 54 21 Kate 167 63.5 23 Lucy 160 62 22 Ram 158 64 20 Niki 165 64 21
第一步是为上述表格创建数据框。看看下面的代码片段。
Python3
# import library
import pandas as pd
# create a Dataframe
df = pd.DataFrame({
'height' : [165, 165, 164,
158, 167, 160,
158, 165],
'weight' : [63.5, 64, 63.5,
54, 63.5, 62,
64, 64],
'age' : [20, 22, 22,
21, 23, 22,
20, 21]},
index = ['Steve', 'Ria', 'Nivi',
'Jane', 'Kate', 'Lucy',
'Ram', 'Niki'])
# show the Dataframe
df
Python3
# import library
import pandas as pd
# create a Dataframe
df = pd.DataFrame({
'height' : [165, 165, 164,
158, 167, 160,
158, 165],
'weight' : [63.5, 64, 63.5,
54, 63.5, 62,
64, 64],
'age' : [20, 22, 22,
21, 23, 22,
20, 21]},
index = ['Steve', 'Ria', 'Nivi',
'Jane', 'Kate', 'Lucy',
'Ram', 'Niki'])
# variable to hold the count
cnt = 0
# list to hold visited values
visited = []
# loop for counting the unique
# values in height
for i in range(0, len(df['height'])):
if df['height'][i] not in visited:
visited.append(df['height'][i])
cnt += 1
print("No.of.unique values :",
cnt)
print("unique values :",
visited)
Python3
# import library
import pandas as pd
# create a Dataframe
df = pd.DataFrame({
'height' : [165, 165, 164,
158, 167, 160,
158, 165],
'weight' : [63.5, 64, 63.5,
54, 63.5, 62,
64, 64],
'age' : [20, 22, 22,
21, 23, 22,
20, 21]},
index = ['Steve', 'Ria', 'Nivi',
'Jane', 'Kate', 'Lucy',
'Ram', 'Niki'])
# counting unique values
n = len(pd.unique(df['height']))
print("No.of.unique values :",
n)
Python3
# import library
import pandas as pd
# create a Dataframe
df = pd.DataFrame({
'height' : [165, 165, 164,
158, 167, 160,
158, 165],
'weight' : [63.5, 64, 63.5,
54, 63.5, 62,
64, 64],
'age' : [20, 22, 22,
21, 23, 22,
20, 21]},
index = ['Steve', 'Ria', 'Nivi',
'Jane', 'Kate', 'Lucy',
'Ram', 'Niki'])
# check the values of
# each row for each column
n = df.nunique(axis=0)
print("No.of.unique values in each column :\n",
n)
Python3
# import library
import pandas as pd
# create a Dataframe
df = pd.DataFrame({
'height' : [165, 165, 164,
158, 167, 160,
158, 165],
'weight' : [63.5, 64, 63.5,
54, 63.5, 62,
64, 64],
'age' : [20, 22, 22,
21, 23, 22,
20, 21]},
index = ['Steve', 'Ria', 'Nivi',
'Jane', 'Kate', 'Lucy',
'Ram', 'Niki'])
# count no. of unique
# values in height column
n = df.height.nunique()
print("No.of.unique values in height column :",
n)
Python3
# import library
import pandas as pd
# create a Dataframe
df = pd.DataFrame({
'height' : [165, 165, 164,
158, 167, 160,
158, 165],
'weight' : [63.5, 64, 63.5,
54, 63.5, 62,
64, 64],
'age' : [20, 22, 22,
21, 23, 22,
20, 21]},
index = ['Steve', 'Ria', 'Nivi',
'Jane', 'Kate', 'Lucy',
'Ram', 'Niki'])
# getting the list of unique values
li = list(df.height.value_counts())
# print the unique value counts
print("No.of.unique values :",
len(li))
输出:
方法一:使用for循环。
Dataframe 已经创建,可以使用for 循环进行硬编码并计算特定列中唯一值的数量。例如在上表中,如果希望计算列height中唯一值的数量。这个想法是使用一个变量cnt来存储计数和一个访问过的具有先前访问过的值的列表。然后 for 循环遍历 'height' 列,对于每个值,它检查是否已在访问列表中访问了相同的值。如果之前没有访问过该值,则计数加 1。
下面是实现:
Python3
# import library
import pandas as pd
# create a Dataframe
df = pd.DataFrame({
'height' : [165, 165, 164,
158, 167, 160,
158, 165],
'weight' : [63.5, 64, 63.5,
54, 63.5, 62,
64, 64],
'age' : [20, 22, 22,
21, 23, 22,
20, 21]},
index = ['Steve', 'Ria', 'Nivi',
'Jane', 'Kate', 'Lucy',
'Ram', 'Niki'])
# variable to hold the count
cnt = 0
# list to hold visited values
visited = []
# loop for counting the unique
# values in height
for i in range(0, len(df['height'])):
if df['height'][i] not in visited:
visited.append(df['height'][i])
cnt += 1
print("No.of.unique values :",
cnt)
print("unique values :",
visited)
输出 :
No.of.unique values : 5
unique values : [165, 164, 158, 167, 160]
但是当 Dataframe 的大小增加并包含数千行和列时,这种方法效率不高。为了提高效率,下面列出了三种可用的方法:
- pandas.unique()
- Dataframe.nunique()
- Series.value_counts()
方法2:使用unique()。
unique 方法将一维数组或 Series 作为输入,并返回其中的唯一项列表。返回值是一个 NumPy 数组,其中的内容基于传递的输入。如果提供索引作为输入,则返回值也将是唯一值的索引。
Syntax: pandas.unique(Series)
例子:
Python3
# import library
import pandas as pd
# create a Dataframe
df = pd.DataFrame({
'height' : [165, 165, 164,
158, 167, 160,
158, 165],
'weight' : [63.5, 64, 63.5,
54, 63.5, 62,
64, 64],
'age' : [20, 22, 22,
21, 23, 22,
20, 21]},
index = ['Steve', 'Ria', 'Nivi',
'Jane', 'Kate', 'Lucy',
'Ram', 'Niki'])
# counting unique values
n = len(pd.unique(df['height']))
print("No.of.unique values :",
n)
输出:
No.of.unique values : 5
方法3:使用 Dataframe.nunique() 。
此方法返回指定轴中唯一值的计数。语法是:
Syntax: Dataframe.nunique (axis=0/1, dropna=True/False)
例子:
Python3
# import library
import pandas as pd
# create a Dataframe
df = pd.DataFrame({
'height' : [165, 165, 164,
158, 167, 160,
158, 165],
'weight' : [63.5, 64, 63.5,
54, 63.5, 62,
64, 64],
'age' : [20, 22, 22,
21, 23, 22,
20, 21]},
index = ['Steve', 'Ria', 'Nivi',
'Jane', 'Kate', 'Lucy',
'Ram', 'Niki'])
# check the values of
# each row for each column
n = df.nunique(axis=0)
print("No.of.unique values in each column :\n",
n)
输出:
No.of.unique values in each column :
height 5
weight 4
age 4
dtype: int64
要获取指定列中唯一值的数量:
Syntax: Dataframe.col_name.nunique()
例子:
Python3
# import library
import pandas as pd
# create a Dataframe
df = pd.DataFrame({
'height' : [165, 165, 164,
158, 167, 160,
158, 165],
'weight' : [63.5, 64, 63.5,
54, 63.5, 62,
64, 64],
'age' : [20, 22, 22,
21, 23, 22,
20, 21]},
index = ['Steve', 'Ria', 'Nivi',
'Jane', 'Kate', 'Lucy',
'Ram', 'Niki'])
# count no. of unique
# values in height column
n = df.height.nunique()
print("No.of.unique values in height column :",
n)
输出:
No.of.unique values in height column : 5
方法3:使用 Series.value_counts() 。
此方法返回指定列中所有唯一值的计数。
Syntax: Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)
例子:
Python3
# import library
import pandas as pd
# create a Dataframe
df = pd.DataFrame({
'height' : [165, 165, 164,
158, 167, 160,
158, 165],
'weight' : [63.5, 64, 63.5,
54, 63.5, 62,
64, 64],
'age' : [20, 22, 22,
21, 23, 22,
20, 21]},
index = ['Steve', 'Ria', 'Nivi',
'Jane', 'Kate', 'Lucy',
'Ram', 'Niki'])
# getting the list of unique values
li = list(df.height.value_counts())
# print the unique value counts
print("No.of.unique values :",
len(li))
输出:
No.of.unique values : 5