Python统计 |方差()
统计模块提供了非常强大的工具,可以用来计算任何与统计相关的东西。 Variation()就是这样一个函数。此函数有助于计算数据样本的方差(样本是填充数据的子集)。
仅当需要计算样本的方差时才应使用variance()函数。还有另一个称为 pvariance() 的函数,用于计算整个总体的方差。
在纯统计中,方差是变量与其均值的平方偏差。基本上,它从平均值或中值测量一组随机数据的分布。较低的方差值表明数据聚集在一起并且没有广泛分散,而较高的值表明给定集合中的数据与平均值相比分散得更多。
方差是科学中的重要工具,其中数据的统计分析很常见。它是给定数据集的标准差的平方,也称为分布的第二中心矩。它通常表示为在纯统计中。
方差由以下公式计算:
It’s calculated by mean of square minus square of mean
Syntax : variance( [data], xbar )
Parameters :
[data] : An iterable with real valued numbers.
xbar (Optional) : Takes actual mean of data-set as value.
Returnype : Returns the actual variance of the values passed as parameter.
Exceptions :
StatisticsError is raised for data-set less than 2-values passed as parameter.
Throws impossible values when the value provided as xbar doesn’t match actual mean of the data-set.
代码#1:
Python3
# Python code to demonstrate the working of
# variance() function of Statistics Module
# Importing Statistics module
import statistics
# Creating a sample of data
sample = [2.74, 1.23, 2.63, 2.22, 3, 1.98]
# Prints variance of the sample set
# Function will automatically calculate
# it's mean and set it as xbar
print("Variance of sample set is % s"
%(statistics.variance(sample)))
Python3
# Python code to demonstrate variance()
# function on varying range of data-types
# importing statistics module
from statistics import variance
# importing fractions as parameter values
from fractions import Fraction as fr
# tuple of a set of positive integers
# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)
# tuple of a set of negative integers
sample2 = (-2, -4, -3, -1, -5, -6)
# tuple of a set of positive and negative numbers
# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
# tuple of a set of fractional numbers
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))
# tuple of a set of floating point values
sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
# Print the variance of each samples
print("Variance of Sample1 is % s " %(variance(sample1)))
print("Variance of Sample2 is % s " %(variance(sample2)))
print("Variance of Sample3 is % s " %(variance(sample3)))
print("Variance of Sample4 is % s " %(variance(sample4)))
print("Variance of Sample5 is % s " %(variance(sample5)))
Python3
# Python code to demonstrate
# the use of xbar parameter
# Importing statistics module
import statistics
# creating a sample list
sample = (1, 1.3, 1.2, 1.9, 2.5, 2.2)
# calculating the mean of sample set
m = statistics.mean(sample)
# calculating the variance of sample set
print("Variance of Sample set is % s"
%(statistics.variance(sample, xbar = m)))
Python3
# Python code to demonstrate the error caused
# when garbage value of xbar is entered
# Importing statistics module
import statistics
# creating a sample list
sample = (1, 1.3, 1.2, 1.9, 2.5, 2.2)
# calculating the mean of sample set
m = statistics.mean(sample)
# Actual value of mean after calculation
# comes out to 1.6833333333333333
# But to demonstrate xbar error let's enter
# -100 as the value for xbar parameter
print(statistics.variance(sample, xbar = -100))
Python3
# Python code to demonstrate StatisticsError
# importing Statistics module
import statistics
# creating an empty data-srt
sample = []
# will raise Statistics Error
print(statistics.variance(sample))
输出 :
Variance of sample set is 0.40924
代码#2:在一系列数据类型上演示variance()
Python3
# Python code to demonstrate variance()
# function on varying range of data-types
# importing statistics module
from statistics import variance
# importing fractions as parameter values
from fractions import Fraction as fr
# tuple of a set of positive integers
# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)
# tuple of a set of negative integers
sample2 = (-2, -4, -3, -1, -5, -6)
# tuple of a set of positive and negative numbers
# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
# tuple of a set of fractional numbers
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))
# tuple of a set of floating point values
sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
# Print the variance of each samples
print("Variance of Sample1 is % s " %(variance(sample1)))
print("Variance of Sample2 is % s " %(variance(sample2)))
print("Variance of Sample3 is % s " %(variance(sample3)))
print("Variance of Sample4 is % s " %(variance(sample4)))
print("Variance of Sample5 is % s " %(variance(sample5)))
输出 :
Variance of Sample 1 is 15.80952380952381
Variance of Sample 2 is 3.5
Variance of Sample 3 is 61.125
Variance of Sample 4 is 1/45
Variance of Sample 5 is 0.17613000000000006
代码#3:演示 xbar 参数的使用
Python3
# Python code to demonstrate
# the use of xbar parameter
# Importing statistics module
import statistics
# creating a sample list
sample = (1, 1.3, 1.2, 1.9, 2.5, 2.2)
# calculating the mean of sample set
m = statistics.mean(sample)
# calculating the variance of sample set
print("Variance of Sample set is % s"
%(statistics.variance(sample, xbar = m)))
输出 :
Variance of Sample set is 0.3656666666666667
代码 #4 :当xbar的值与平均值/平均值不同时显示错误
Python3
# Python code to demonstrate the error caused
# when garbage value of xbar is entered
# Importing statistics module
import statistics
# creating a sample list
sample = (1, 1.3, 1.2, 1.9, 2.5, 2.2)
# calculating the mean of sample set
m = statistics.mean(sample)
# Actual value of mean after calculation
# comes out to 1.6833333333333333
# But to demonstrate xbar error let's enter
# -100 as the value for xbar parameter
print(statistics.variance(sample, xbar = -100))
输出 :
0.3656666666663053
注意:它的精度与代码#3 中的输出不同代码 #4:展示 StatisticsError
Python3
# Python code to demonstrate StatisticsError
# importing Statistics module
import statistics
# creating an empty data-srt
sample = []
# will raise Statistics Error
print(statistics.variance(sample))
输出 :
Traceback (most recent call last):
File "/home/64bf6d80f158b65d2b75c894d03a7779.py", line 10, in
print(statistics.variance(sample))
File "/usr/lib/python3.5/statistics.py", line 555, in variance
raise StatisticsError('variance requires at least two data points')
statistics.StatisticsError: variance requires at least two data points
应用:
方差是统计和处理大量数据的一个非常重要的工具。就像,当全知均值未知(样本均值)时,方差被用作有偏估计量。真实世界的观察,例如一家公司全天所有股票的涨跌值,不可能是所有可能的观察结果。因此,方差是从一组有限的数据中计算出来的,尽管在考虑到整个人口的情况下计算时它不会匹配,但它仍然会给用户一个足以消除其他计算的估计。