如何在 R 中找到置信区间？

置信区间表示统计数据中存在多少不确定性。换句话说，它被定义为描述人口参数的区间，其概率为 1 – α。置信区间的表达式如下所示，

x̄ ± t_{α / 2,N – 1}S_x̄

Here,

x̄ ± t_{α / 2}: It signifies the value required to form an area of α / 2 (each tail of a t-distribution where

degree of freedom = n – 1)

S_x̄= s / √n : It represents the standard error of the mean

编程需要懂一点英语

确定 R 中的置信区间：

首先，我们需要创建样本数据。 R 提供内置数据集。在本文中，我们将使用 iris 数据集进行说明。 iris 数据集以厘米为单位描述了萼片长度、萼片宽度、花瓣长度和花瓣宽度。它提供了三种鸢尾花的 50 朵花的数据。品种有：

鸢尾
杂色
弗吉尼亚

R

# Printing the contents of iris inbuilt dataset
print(iris)

R

# R program to determine the mean
 
# Calculate the mean of the Sepal.Length
mean_value <- mean(iris$Sepal.Length)

R

# Compute the size
n <- length(iris$Sepal.Length)
 
# Find the standard deviation
standard_deviation <- sd(iris$Sepal.Length)
 
# Find the standard error
standard_error <- standard_deviation / sqrt(n)

R

alpha = 0.05
degrees_of_freedom = sample.n - 1
t_score = qt(p=alpha/2, df=degrees_of_freedom,lower.tail=F)
print(t_score)

R

margin_error <- t_score * standard_error

R

# Calculate the lower bound 
lower_bound <- mean_value - margin_error
 
# Calculate the upper bound
upper_bound <- mean_value + margin_error

R

# R program to find the confidence interval
 
# Calculate the mean of the sample data
mean_value <- mean(iris$Sepal.Length)
 
# Compute the size
n <- length(iris$Sepal.Length)
 
# Find the standard deviation
standard_deviation <- sd(iris$Sepal.Length)
 
# Find the standard error
standard_error <- standard_deviation / sqrt(n)
alpha = 0.05
degrees_of_freedom = n - 1
t_score = qt(p=alpha/2, df=degrees_of_freedom,lower.tail=F)
margin_error <- t_score * standard_error
 
# Calculating lower bound and upper bound
lower_bound <- mean_value - margin_error
upper_bound <- mean_value + margin_error
 
# Print the confidence interval
print(c(lower_bound,upper_bound))

R

# Calculate the mean and standard error
l_model <- lm(Sepal.Length ~ 1, iris)

R

# Find the confidence interval
 
confint(model, level=0.95)

R

# R program to find the confidence interval
 
# Calculate the mean and standard error
model <- lm(Sepal.Length ~ 1, iris)
 
# Find the confidence interval
confint(model, level=0.95)

输出：

方法 1：使用基础 R 计算区间

在这种方法中，我们将使用数学公式和 R 函数逐步找到置信区间。您可以按照以下步骤确定 R 中的置信区间。

第 1 步：计算平均值。第一步是确定给定样本数据的平均值。

R

# R program to determine the mean
 
# Calculate the mean of the Sepal.Length
mean_value <- mean(iris$Sepal.Length)

第 2 步：现在让我们计算平均值的标准误差。

为了计算均值的标准误差 (S _x̄ )，我们需要找到标准差 (s) 和样本数据的长度 (n)。

R

# Compute the size
n <- length(iris$Sepal.Length)
 
# Find the standard deviation
standard_deviation <- sd(iris$Sepal.Length)
 
# Find the standard error
standard_error <- standard_deviation / sqrt(n)

第 3 步：确定与置信水平相关的 t 分数。

在这一步中，我们将计算与置信水平相关的 t 分数。我们需要在下尾和上尾有准确的 α / 2 概率。 R 提供了 qt()函数，我们可以使用它轻松计算 t-score。语法如下，

句法：

qt(random_variable, degree_of_freedom)

编程需要懂一点英语

参数：

random_variable: It must be a random variable

degree_of_freedom: It must be degree of Freedom

编程需要懂一点英语

R

alpha = 0.05
degrees_of_freedom = sample.n - 1
t_score = qt(p=alpha/2, df=degrees_of_freedom,lower.tail=F)
print(t_score)

第 4 步：计算误差范围并形成置信区间。

误差范围由下式给出，

t_{α / 2,N – 1}S_x̄

编程需要懂一点英语

它可以很容易地计算为，

R

margin_error <- t_score * standard_error

置信区间等于平均值 +/- 误差范围。可以计算为，

R

# Calculate the lower bound 
lower_bound <- mean_value - margin_error
 
# Calculate the upper bound
upper_bound <- mean_value + margin_error

结合所有步骤

例子：

R

# R program to find the confidence interval
 
# Calculate the mean of the sample data
mean_value <- mean(iris$Sepal.Length)
 
# Compute the size
n <- length(iris$Sepal.Length)
 
# Find the standard deviation
standard_deviation <- sd(iris$Sepal.Length)
 
# Find the standard error
standard_error <- standard_deviation / sqrt(n)
alpha = 0.05
degrees_of_freedom = n - 1
t_score = qt(p=alpha/2, df=degrees_of_freedom,lower.tail=F)
margin_error <- t_score * standard_error
 
# Calculating lower bound and upper bound
lower_bound <- mean_value - margin_error
upper_bound <- mean_value + margin_error
 
# Print the confidence interval
print(c(lower_bound,upper_bound))

输出：

方法 2：使用 confint()函数计算置信区间

我们可以使用 R 中的内置函数计算置信区间。步骤如下，

第 1 步：计算平均值和标准误差。

R 为我们提供了 lm()函数，用于将线性模型拟合到数据帧中。我们可以使用此函数计算平均值和标准误差（找到置信区间所需的）。语法如下，

句法：

lm(fitting_formula, dataframe)

编程需要懂一点英语

参数：

fitting_formula: It must be the formula for the linear model.

dataframe: It must be the name of the data frame that contains the data.

编程需要懂一点英语

R

# Calculate the mean and standard error
l_model <- lm(Sepal.Length ~ 1, iris)

第 2 步：找到置信区间。

现在，为了找到置信区间，我们在 R 中有 confint()函数。该函数专门用于计算拟合模型中一个或多个参数的置信区间。语法如下，

句法：

confint(object, parm, level = 0.95, …)

编程需要懂一点英语

参数：

object: It represents fitted model object.

parm : It represents parameters to be given confidence intervals (either a vector)

level : It represents the confidence level.

… : It represents additional argument for different methods.

编程需要懂一点英语

R

# Find the confidence interval
 
confint(model, level=0.95)

结合所有步骤

例子：

R

# R program to find the confidence interval
 
# Calculate the mean and standard error
model <- lm(Sepal.Length ~ 1, iris)
 
# Find the confidence interval
confint(model, level=0.95)

输出：