Pandas 中的绝对频率和相对频率
频率是给定样本中结果的出现次数。它可以用两种不同的方式来称呼。
1. 绝对频率:
它是特定类别中的观察数。它总是有一个整数值,或者我们可以说它有离散值。
例子:
Following data are given about pass or fail of students in an exam held of Mathematics in a class.P, P, F, P, F, P, P, F, F, P, P, P
where, P = Passed and F = Failed.
Solution:
From the given data we can say that,
There are 8 students who passed the exam
There are 4 students who failed the exam
在Python中的实现:
让我们将在 Pass(P) 和 Fail(F) 两个类别中声明的 12 人的结果分别归类为 1 和 0。
P, P, F, P, F, P, P, F, F, P, P, P
1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1
import pandas as pd
data = [1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1]
# Create Data Frame using pandas library
# .value_counts() counts the number of
# occurrences of particular observation
df = pd.Series(data).value_counts()
print(df)
1 8
0 4
dtype: int64
2.相对频率:
它是给定数据集中特定类别的观察值的分数。它具有浮动值,也以百分比表示。让我们考虑数学考试中通过和不及格的学生的给定示例。然后,
relative frequency of passed students = 8 / ( 8 + 4 ) = 0.666 = 66.6 %
relative frequency of failed students = 4 / ( 8 + 4 ) = 0.333 = 33.3 %
import pandas as pd
data = [1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1]
# Create Data Frame using pandas library
# .value_counts() counts the number of
# occurrences of particular observation
df = pd.Series(data).value_counts()
print(df / len(data))
1 0.666667
0 0.333333
dtype: float64