标准差和方差是在值集中分布的两种最常用的度量。一组数字的标准偏差(σ)是这些数字散布的程度。通过计算方差的平方根来获得标准偏差的值。一组数字的方差是该组中每个值偏离均值的平均程度。换句话说,它等于值与其平均值的平方差的平均值。
未分组数据的标准偏差和方差
未分组数据的方差计算如下:
- 计算所提供值的平均值。
- 计算每个值与平均值之间的差。这种差异也称为均值偏差。
- 对在步骤2中获得的每个值求平方,并对所有平方值求和。
- 将计算出的总和除以平均值。
用于显示方差的公式如下所示:
其中x̄是平均值,n是集合中值的数量。
要计算标准差(σ),我们首先使用前面的步骤计算方差,然后计算其平方根:
色散量度:范围,偏差和方差
统计分散是一组值散布的程度。方差,标准偏差和范围(即数据集中最大值和最小值之间的差)都是分散度度量的示例。范围,标准偏差和方差越大,值的离散度越大。
范围,方差和标准偏差的样本问题
以下示例说明了这三个概念。我们假设两组随机数:Set1 = {1、3、7、9、11、15},Set2 = {10、20、33、67、82}
示例1:此示例说明如何计算数据集的范围。
解决方案:
- The range is the difference between the highest value and the lowest value for a given set of values.
- In Set1, the largest value is 15 and the smallest value is 1. Therefore, the range of Set1 is 15 – 1 = 14.
- In Set2, the largest value is 82 while the smallest value is 10. therefore , the range is 82 – 10 = 70.
We conclude that Set2 has a higher dispersion because it has a higher range.
示例2:此示例说明如何计算数据集的方差
解决方案:
- To calculate the variance of Set1, we first have to calculate the mean:
M1 = (1 + 3 + 7 + 9 + 11 + 15) / 6 = 23/3 = 7.7 - The deviation of the values 1, 3, 7, 9, 11, 15 from the mean, respectively, are: 6.7, 4.7, 0.7, 1.3, 3.3, 7.3.
V1 = ((6.7)^2 + (4.7)^2 + (0.7)^2 + (1.3)^2 + (3.3)^2 + (7.3)^2) / 6 = 133.34 / 6 = 22.2 - To calculate the variance of Set2, we first have to calculate the mean:
M2 = (10 + 20 + 33 + 67 + 82) / 5 = 42.4 - The deviation of the values 10, 20, 33, 67, 82 from the mean, respectively, are: 32.4, 22.4, 9.4, 24.6, 39.6
V1 = ((32.4)^2 + (22.4)^2 + (9.4)^2 + (24.6)^2 + (39.6)^2) / 5 = 3813.2 / 5 = 762.64 - We conclude that Set2 has a higher dispersion because it has a higher variance.
示例3:此示例说明了如何计算标准偏差。
解决方案:
- From the values of V1 and V2 obtained in the previous example, we calculate:
σ1 = √(22.2) = 4.7
σ2 = √(762.64) = 27.6 - We conclude that Set2 has a higher dispersion because it has a higher standard deviation.
分组数据的范围和均值偏差
分组的数据分为两种类型:第一种是连续频率分布,其中将值分组为间隔,每个间隔与一个频率值关联。第二种是离散频率分布,其中每个值都与一个频率值相关联。
范围
- 为了计算连续频率分布的范围,我们计算了最小间隔的下限和最大间隔的上限之差。假设最小间隔为(a -f),最大间隔为(v – z):
- 对于离散的频率分布,我们只需计算出最小值(S)和最大值(L)之间的差即可:
吝啬的
- 要计算连续频率分布的平均值,我们取每个间隔中心的值,然后将每个值乘以其间隔的频率值。然后,我们对这些值求和,然后将总和乘以值的总数(所有频率值的总和)。使用以下公式:
- 离散频率分布的均值计算与连续频率分布的均值计算相同,但有一个区别。离散频率分布具有离散值而不是间隔。因此,我们将每个离散值乘以其频率值,然后将这些乘积相加并除以总频率值,而不是取一个间隔中心的值。使用相同的公式。但是,在这种情况下,xi是离散值i,fi是离散值i的频率。
平均偏差
- 为了计算连续频率分布的均值偏差,我们计算了每个间隔的中点与均值之间的差。然后,我们将每个差乘以间隔的频率,然后将所有产生的值相加。最后,我们将总和除以值的总数(总频率)。使用以下公式:
- 离散频率分布的均值偏差的计算与连续频率分布的均值的计算相同,但是我们不取间隔中间的值,而是取每个离散值,计算出该值与均值之差,将差值与离散值的频率相乘,然后将这些乘积相加并除以总频率值。使用相同的公式。但是,在这种情况下,xi是离散值i,fi是离散值i的频率。
均值,中位数和众数的计算
平均值,中位数和众数可以告诉我们哪个值可以代表数据集,每种值均以不同的方式表示。下面解释这三种集中趋势的度量:
- 为了计算平均值,我们将值的总和除以给定值的数量。
- 中位数基本上是当集合按升序或降序排列时在数据集中心的数字。在具有多个值n的数据集中:
如果n是一个奇数,我们计算(n-1 / 2)。考虑到第一个值的索引为1,第二个值为2,依此类推,所得索引的值为中间值。
如果n是偶数,则将索引(n / 2)和(n / 2 +1)处的值相加,然后将和除以2得到平均值。该值是集合的中位数 - 模式是一组值中最频繁的数字
样本问题
问题1:给定一组未分组的值{7,8,3,6,7,8,9,9,7,5,-2}。计算该集合的均值,中位数和众数。
解决方案:
Mean:
We first sum the values: sum = 7 + 8 + 3 + 6 + 7 + 8 + 9 + 7 + 5 + -2 = 58
We have n = 10 values in the set. Therefore, we divide the sum by 10.
Mean = 58/10 = 5.8
Median:
We first arrange the values in ascending order:
-2, 3, 5, 6, 7, 7, 7, 8, 8, 9
The number of values here is n = 10. Therefore, we take the value at n/2 = 10/2 = 5, which is 7, and the value at n/2 +1 = 10/2 + 1 = 6, which is also . the average value is 7+7/2 = 7.
Median = 7
Mode:
We can see that the number 7 is repeated 3 times in the set, 8 is repeated twice, and the rest of the values are repeated once. Therefore, the most frequent value is 7.
Mode = 7
示例2 :给定一组未分组的值{1、4、9、9、6、30、21、6、1}。计算该集合的均值,中位数和众数。
解决方案:
Mean:
Sum = 1 + 4 + 9 + 9 + 6 + 30 + 21 + 6 +1 = 87
Mean = 87/9 = 9.7
Median:
The values in ascending order: 1, 1, 4, 6, 6, 9, 9, 21, 30
The number of values here is n = 9. Therefore, we take the value at (n/2) + 1 = 4 + 1 = 5, which is 6.
Median = 6
Mode:
The numbers 1, 6, and 9 are each repeated twice in the set, 8 is repeated twice, while the rest of the values are only repeated once. Therefore, we have multiple values of mode. The set is trimodal, meaning that it has three modes.
Mode= 1, 6, 9
示例3 :给定一组具有连续频率分布的分组数据:
Interval (class) | Frequency |
---|---|
2-4 | 3 |
4-6 | 4 |
6-8 | 2 |
计算范围,均值和均值偏差。
解决方案:
Range:
The lowest value in the lowest interval = 2, and the highest value in the highest interval = 8
Range = 8 – 2 = 6
Mean:
Centre values for each interval (respectively): 3, 5, 7
Sum of each centre value multiplied by its frequency = 3*3 + 5*4 + 7*2 = 43
Mean = 43/(3 + 4 + 2) = 4.8
Mean Deviation:
Difference between each mid-point and the mean (respectively): |3 – 4.8| = 1.8, 5 – 4.8 = 0.2, 7 – 4.8 = 2.2
Sum of differences multiplied by the frequencies: 1.8*3 + 0.2*4 + 2.2*2 = 10.6
Mean Deviation = 10.6 / 9 = 1.2
示例4 :给定一组具有离散频率分布的分组数据:
Value(class) | Frequency |
---|---|
1 | 3 |
5 | 4 |
7 | 2 |
计算范围,均值和均值偏差。
解决方案:
Range:
The lowest value = 1, and the highest value in the highest interval = 7
Range = 7 – 1= 6
Mean:
Sum of each discrete value multiplied by its frequency = 1*3 + 5*4 + 7*2 = 37
Mean = 37/(3 + 4 + 2) = 4.1
Mean Deviation:
Difference between each value and the mean (respectively): |1 – 4.1| = 3.1, 5 – 4.1 = 0.9, 7 – 4.1 = 2.9
Sum of differences multiplied by the frequencies: 3.1*3 + 0.9*4 + 4.1*2 = 21.1
Mean Deviation = 21.1 / 9 = 2.3