SciPy-统计 - 芒果文档

📌 相关文章

📜 SciPy-统计

📅 最后修改于: 2020-11-05 04:35:36 🧑 作者: Mango

所有统计功能均位于子包scipy.stats中，可以使用info(stats)函数获得这些功能的相当完整的清单。也可以从stats子软件包的docstring中获取可用的随机变量列表。该模块包含大量的概率分布以及不断增长的统计功能库。

每个单变量分布都有自己的子类，如下表所示-

Sr. No.	Class & Description
1	rv_continuous A generic continuous random variable class meant for subclassing
2	rv_discrete A generic discrete random variable class meant for subclassing
3	rv_histogram Generates a distribution given by a histogram

Sr. No.

Class & Description

rv_continuous

A generic continuous random variable class meant for subclassing

rv_discrete

A generic discrete random variable class meant for subclassing

rv_histogram

Generates a distribution given by a histogram

正常连续随机变量

随机变量X可以取任何值的概率分布是连续随机变量。 location(loc)关键字指定平均值。 scale(scale)关键字指定标准偏差。

作为rv_continuous类的实例， norm对象从该类继承了通用方法的集合，并使用特定于此特定发行版的详细信息来完善它们。

要在多个点上计算CDF，我们可以传递一个列表或NumPy数组。让我们考虑以下示例。

from scipy.stats import norm
import numpy as np
print norm.cdf(np.array([1,-1., 0, 1, 3, 4, -2, 6]))

上面的程序将生成以下输出。

array([ 0.84134475, 0.15865525, 0.5 , 0.84134475, 0.9986501 ,
0.99996833, 0.02275013, 1. ])

要找到分布的中位数，我们可以使用百分比点函数(PPF)，它是CDF的倒数。让我们通过使用以下示例来理解。

from scipy.stats import norm
print norm.ppf(0.5)

上面的程序将生成以下输出。

0.0

要生成一系列随机变量，我们应该使用size关键字参数，如以下示例所示。

from scipy.stats import norm
print norm.rvs(size = 5)

上面的程序将生成以下输出。

array([ 0.20929928, -1.91049255, 0.41264672, -0.7135557 , -0.03833048])

以上输出不可复制。要生成相同的随机数，请使用种子函数。

均匀分布

使用均一函数可以生成均一分布。让我们考虑以下示例。

from scipy.stats import uniform
print uniform.cdf([0, 1, 2, 3, 4, 5], loc = 1, scale = 4)

上面的程序将生成以下输出。

array([ 0. , 0. , 0.25, 0.5 , 0.75, 1. ])

建立离散分布

让我们生成一个随机样本，并将观察到的频率与概率进行比较。

二项分布

作为rv_discrete类的实例， binom对象从其继承了通用方法的集合，并使用针对此特定发行版的详细信息来完善它们。让我们考虑以下示例。

from scipy.stats import uniform
print uniform.cdf([0, 1, 2, 3, 4, 5], loc = 1, scale = 4)

上面的程序将生成以下输出。

array([ 0. , 0. , 0.25, 0.5 , 0.75, 1. ])

描述性统计

Min，Max，Mean和Variance等基本统计信息将NumPy数组作为输入，并返回各自的结果。下表描述了scipy.stats包中可用的一些基本统计功能。

Sr. No.	Function & Description
1	describe() Computes several descriptive statistics of the passed array
2	gmean() Computes geometric mean along the specified axis
3	hmean() Calculates the harmonic mean along the specified axis
4	kurtosis() Computes the kurtosis
5	mode() Returns the modal value
6	skew() Tests the skewness of the data
7	f_oneway() Performs a 1-way ANOVA
8	iqr() Computes the interquartile range of the data along the specified axis
9	zscore() Calculates the z score of each value in the sample, relative to the sample mean and standard deviation
10	sem() Calculates the standard error of the mean (or standard error of measurement) of the values in the input array

其中一些函数在scipy.stats.mstats中具有相似的版本，可用于掩码数组。让我们用下面给出的例子来理解这一点。

from scipy import stats
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9])
print x.max(),x.min(),x.mean(),x.var()

上面的程序将生成以下输出。

(9, 1, 5.0, 6.666666666666667)

T检验

让我们了解一下T检验在SciPy中的作用。

ttest_1样品

计算一组得分的平均值的T检验。这是对原假设的独立检验，原假设是独立观察值’a’的样本的期望值(均值)等于给定的总体均值popmean 。让我们考虑以下示例。

from scipy import stats
rvs = stats.norm.rvs(loc = 5, scale = 10, size = (50,2))
print stats.ttest_1samp(rvs,5.0)

上面的程序将生成以下输出。

Ttest_1sampResult(statistic = array([-1.40184894, 2.70158009]),
pvalue = array([ 0.16726344, 0.00945234]))

比较两个样本

在以下示例中，有两个样本，它们可以来自相同或不同的分布，我们想测试这些样本是否具有相同的统计属性。

ttest_ind-为两个独立的分数样本的平均值计算T检验。这是针对零假设的两个方面的检验，该假设为两个独立样本具有相同的平均(预期)值。此测试假设默认情况下总体具有相同的方差。

如果我们观察到来自相同或不同总体的两个独立样本，则可以使用此检验。让我们考虑以下示例。

from scipy import stats
rvs1 = stats.norm.rvs(loc = 5,scale = 10,size = 500)
rvs2 = stats.norm.rvs(loc = 5,scale = 10,size = 500)
print stats.ttest_ind(rvs1,rvs2)

上面的程序将生成以下输出。

Ttest_indResult(statistic = -0.67406312233650278, pvalue = 0.50042727502272966)

您可以使用相同长度但均值不同的新数组来测试相同对象。在loc中使用其他值并测试相同的值。