📜  R 编程中的方差分析测试

📅  最后修改于: 2022-05-13 01:54:28.817000             🧑  作者: Mango

R 编程中的方差分析测试

ANOVA 也称为方差分析,用于研究 R 编程中分类变量和连续变量之间的关系。它是一种针对总体方差的假设检验。

R – ANOVA 检验

ANOVA 测试涉及设置:

  • 零假设:所有总体均值均等。
  • 替代假设:至少一个总体平均值与其他平均值不同。

ANOVA 检验有两种类型:

  • 单向方差分析:它考虑了一个分类组。
  • 双向方差分析:它考虑了两个分类组。

数据集

使用 mtcars(motor trend car road test)数据集,包含 32 个汽车品牌和 11 个属性。数据集预装在 R 的dplyr包中。

要开始使用 ANOVA,我们需要安装和加载dplyr包。

用 R 语言执行单向方差分析测试

一种方法是使用 mtcars 数据集执行 ANOVA 测试,该数据集在 disp 属性、连续属性和齿轮属性、分类属性之间预装了 dplyr 包。

R
# Installing the package
install.packages(dplyr)
 
# Loading the package
library(dplyr)
 
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear),
        xlab = "gear", ylab = "disp")
 
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu = mu01 = mu02(There is no difference
# between average displacement for different gear)
# H1 = Not all means are equal
 
# Step 2: Calculate test statistics using aov function
mtcars_aov <- aov(mtcars$disp~factor(mtcars$gear))
summary(mtcars_aov)
 
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05
 
# Step 4: Compare test statistics with F-Critical value
# and conclude test p < alpha, Reject Null Hypothesis


R
# Installing the package
install.packages(dplyr)
 
# Loading the package
library(dplyr)
 
# Variance in mean within group and between group
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 0),
        xlab = "gear", ylab = "disp", main = "Automatic")
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 1),
            xlab = "gear", ylab = "disp", main = "Manual")
 
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu0 = mu01 = mu02(There is no difference between
# average displacement for different gear)
# H1 = Not all means are equal
 
# Step 2: Calculate test statistics using aov function
mtcars_aov2 <- aov(mtcars$disp~factor(mtcars$gear) *
                            factor(mtcars$am))
summary(mtcars_aov2)
 
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05
 
# Step 4: Compare test statistics with F-Critical value
# and conclude test p < alpha, Reject Null Hypothesis


输出:

箱线图显示了齿轮相对于位移的平均值。听到分类变量是使用因子函数的齿轮,连续变量是 disp。

总结表明齿轮属性对位移非常重要(三颗星表示)。此外,P 值小于 0.05,因此证明齿轮对位移是显着的,即相互相关,我们拒绝零假设。

在 R 中执行双向 ANOVA 测试

使用 mtcars 数据集执行双向 ANOVA 测试,该数据集在 disp 属性、连续属性和齿轮属性、分类属性、am 属性、分类属性之间预装了 dplyr 包。

R

# Installing the package
install.packages(dplyr)
 
# Loading the package
library(dplyr)
 
# Variance in mean within group and between group
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 0),
        xlab = "gear", ylab = "disp", main = "Automatic")
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 1),
            xlab = "gear", ylab = "disp", main = "Manual")
 
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu0 = mu01 = mu02(There is no difference between
# average displacement for different gear)
# H1 = Not all means are equal
 
# Step 2: Calculate test statistics using aov function
mtcars_aov2 <- aov(mtcars$disp~factor(mtcars$gear) *
                            factor(mtcars$am))
summary(mtcars_aov2)
 
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05
 
# Step 4: Compare test statistics with F-Critical value
# and conclude test p < alpha, Reject Null Hypothesis

输出:

箱线图显示了齿轮相对于位移的平均值。听到分类变量是齿轮和使用哪个因子函数,连续变量是 disp。

总结表明,gear 属性对位移非常重要(三颗星表示),am 属性对位移没有太大意义。齿轮的P值小于0.05,证明齿轮对位移有显着影响,即相互关联。 am 的 P 值大于 0.05,am 对位移不显着,即互不相关。

结果

我们从箱线图和摘要中看到了显着的结果。

  • 排量与汽车中的齿轮密切相关,即排量取决于 p < 0.05 的齿轮。
  • 在 p = 0.05 和 am 的汽车中,排量与齿轮密切相关,但与变速箱模式无关。