如何在Python中使用 Matplotlib 计算和绘制累积分布函数？

先决条件： Matplotlib

Matplotlib 是Python中的一个库，它是 NumPy 库的数值数学扩展。在 x 处评估的实值随机变量 X 的累积分布函数(CDF) 或仅 X 的分布函数是 X 取值小于或等于 x 的概率。

CDF 的特性：

每个累积分布函数F(X) 都是非递减的
如果 cdf函数的最大值在 x 处，则 F(x) = 1。
CDF 的范围从 0 到 1。

方法一：使用直方图

CDF 可以使用 PDF（概率分布函数）计算。随机变量的每个点都会累积贡献形成 CDF。

例子：

A combination set containing 2 balls which can be either red or blue can be in the following set.

{RR, RB, BR, BB}

t -> No of red balls.

P(x = t) -> t = 0 : 1 / 4 [BB]

t = 1 : 2 / 4 [RB, BR]

t = 2 : 1 / 4 [RR]

CDF :

F(x) = P(x<=t)

x = 0 : P(0) -> 1 / 4

x = 1 : P(1) + P(0) -> 3 / 4

x = 2 : P(2) + P(1) + P(0) -> 1

编程需要懂一点英语

方法

导入模块
声明数据点的数量
初始化随机值
使用上述数据绘制直方图
获取直方图数据
使用直方图数据查找 PDF
计算 CDF
绘制 CDF

例子：

Python3

# defining the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
  
# No of Data points
N = 500
  
# initializing random values
data = np.random.randn(N)
  
# getting data of the histogram
count, bins_count = np.histogram(data, bins=10)
  
# finding the PDF of the histogram using count values
pdf = count / sum(count)
  
# using numpy np.cumsum to calculate the CDF
# We can also find using the PDF values by looping and adding
cdf = np.cumsum(pdf)
  
# plotting PDF and CDF
plt.plot(bins_count[1:], pdf, color="red", label="PDF")
plt.plot(bins_count[1:], cdf, label="CDF")
plt.legend()

Python3

# defining the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
  
# No of data points used
N = 500
  
# normal distribution
data = np.random.randn(N)
  
# sort the data in ascending order
x = np.sort(data)
  
# get the cdf values of y
y = np.arange(N) / float(N)
  
# plotting
plt.xlabel('x-axis')
plt.ylabel('y-axis')
  
plt.title('CDF using sorting the data')
  
plt.plot(x, y, marker='o')

输出：

PDF 和 CDF 的直方图：

绘制的 CDF：

CDF绘图

方法二：数据排序

此方法描述了如何使用排序数据计算和绘制 CDF。为此，我们首先对数据进行排序，然后进行进一步的计算。

方法

导入模块
声明数据点的数量
创建数据
按升序对数据进行排序
获取 CDF
绘制 CDF
显示图

例子：

蟒蛇3

# defining the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
  
# No of data points used
N = 500
  
# normal distribution
data = np.random.randn(N)
  
# sort the data in ascending order
x = np.sort(data)
  
# get the cdf values of y
y = np.arange(N) / float(N)
  
# plotting
plt.xlabel('x-axis')
plt.ylabel('y-axis')
  
plt.title('CDF using sorting the data')
  
plt.plot(x, y, marker='o')

输出：