置信区间 - 芒果文档

📌 相关文章

📜 置信区间

📅 最后修改于: 2021-05-04 18:41:29 🧑 作者: Mango

先决条件： t检验，z检验

简而言之，置信区间是我们可以确定真实值存在的范围。选择间隔的置信度级别将确定置信区间将包含真实参数值的概率。此值范围通常用于处理基于人群的数据，以一定的置信度来提取特定的有价值的信息，因此称为“置信区间”。

图1.显示了置信区间的总体外观。

图1：置信区间图

置信度：

置信水平描述了与采样方法相关的不确定性。

假设我们使用相同的采样方法(例如样本均值)为每个样本计算不同的间隔估计。一些区间估计将包括真实的总体参数，而另一些则不会。

90％的置信度意味着我们期望90％的区间估计包括总体参数。 95％的置信度意味着95％的间隔将包括总体参数。

例如，假设您正在调查特定城市中男性的平均身高。为此，您设置了95％的置信度，并发现95％的置信区间为(168,182)。这意味着，如果一遍又一遍地重复此操作，则95％的时间男人的身高会落在168厘米至182厘米之间。

构建置信区间：

构造置信区间涉及4个步骤。

Step 1: Identify the sample problem. Choose the statistic (like sample mean, etc) that 
    you will use to estimate population parameter.

Step 2: Select a confidence level. (Usually, it is 90%, 95% or 99%)

Step 3: Find the margin of error. (Usually given) If not given, use the following formula:-
    Margin of error = Critical value * Standard deviation 

Step 4: Specify the confidence interval. The uncertainty is denoted by the confidence level. 
    And the range of the confidence interval is defined by Eq-1.

等式1

where, 
Sample_Statistic --> Can be any kind of statistic. (eg. sample mean)
Margin_of_Error  --> generally, its (± 2.5)

计算置信区间

CI的计算需要两个统计参数。

平均值(μ)-算术平均值是数字的平均值。它定义为n个数字的总和除以直到n的数字计数。 (式2)

$\mu=\frac{1+2+3+\ldots+n}{n} \quad {.. Eq 2}$

标准偏差(σ) – 它是数字分布程度的度量。它定义为每个数字与均值之差的平方的总和。 (式3)

$\sigma=\sqrt{\sum \frac{\left(x_{i}-\mu\right)^{2}}{n}} \quad {... Eq 3}$

a)使用t分布

当样本量n <30时，我们使用t分布。

考虑以下示例。随机抽取10名UFC战斗机样本，并测量其重量。发现平均重量为240kg。构建平均重量的95％置信区间估计值样品标准偏差为25 kg。为所有UFC战斗机的真实平均体重找到一个样本的置信区间。

Step 1 - Subtract 1 from your sample size.[Eq-4] 
     This gives the degrees of freedom (df), required in Step-3.

$d f=n-1 \quad {... Eq 4}$

where, 
df = degree of freedom
n = sample size

使用等式4，我们得到df = 10 – 1 = 9。

Step 2 - Subtract the confidence interval from 1, then divide by two.
 [Eq-5]
     This gives the significance level (α), required in Step-3.

$\alpha=\frac{1-C L}{2} \quad {... Eq 5}$

α = Significance level
CL = Confidence Level

使用等式5，我们得到α=(1 – .95)/ 2 = 0.025

Step 3 - Use the values of α and df in the t-distribution table and find the value of t.

(df)/(α)	0.1	0.05	0.025	. .
∞	1.282	1.645	1.960	. .
1	3.078	6.314	12.706	. .
2	1.886	2.920	4.303	. .
:	:	:	:	. .
8	1.397	1.860	2.306	. .
9	1.383	1.833	2.262	. .

使用t分布表中的df和α值，我们得出t = 2.262。

Step 4 - Use the t-value obtained in step 3 in the formula given for Confidence Interval 
      with t-distribution. [Eq-6]

$\mu \pm t\left(\frac{\sigma}{\sqrt{n}}\right) \quad {...Eq6}$

where,
μ = mean
t = chosen t-value from the table above
σ = the standard deviation
n = number of observations

因此，将值放在等式6中，我们得到

$\begin{array}{l} \Rightarrow 240 \pm(2.262)^{*}(25 / \sqrt{10}) \\ \Rightarrow 240 \pm 17.883 \\ \Rightarrow(240-17.883,240+17.883) \\ \Rightarrow(222.117,257.883) \end{array}$

where,
Lower Limit = 222.117
Upper Limit = 257.883

因此，我们有95％的信心UFC战斗机的真实平均重量在222.117至257.883之间。

b)使用z分布

当样本量n> 30时，我们使用z分布。当标准偏差已知时，Z检验会更有用。

考虑以下示例。随机抽取50名成年女性，并测量其RBC计数。样本平均值为4.63，RBC计数的标准偏差为0.54。为成年女性的真实平均RBC计数构建95％的置信区间估计。

Step 1 - Find the mean. [Eq-2] (If not already given)
Step 2 - Find the standard deviation. [Eq-3] (If not already given)
Step 3 - Determine the z-value for the specified confidence interval. 
     (some common values in the table given below)

Confidence Interval	z-value
90%	1.645
95%	1.960
99%	2.576

Step 4 - Use the z-value obtained in step 3 in the formula given for Confidence Interval 
      with z-distribution. [Eq-7]

$\mu \pm z\left(\frac{\sigma}{\sqrt{n}}\right) \quad {.....Eq7}$

where,
μ = mean
z = chosen z-value from the table above
σ = the standard deviation
n = number of observations

将值放在等式7中，我们得到

$\begin{array}{l} \Rightarrow4.63 \pm(1.960)^{*}(0.54 / \sqrt{50}) \\ \Rightarrow 4.63 \pm 0.149 \\ \Rightarrow(4.63-0.149,4.63+0.149) \\ \Rightarrow(4.480,4.780) \end{array}$

where,
Lower Limit = 4.480
Upper Limit = 4.780

因此，我们有95％的信心，成年女性的真实平均RBC计数在4.480和4.780之间。

置信区间是统计的基本概念之一。它告诉您有关数据的声明。基于存在的数据，可以使用各种采样方法，例如均值，中位数等。人们还可以确定何时使用哪种分布以获得最佳结果。如有任何疑问/疑问，请在下面评论。