R 编程中的方差分析测试
ANOVA 也称为方差分析,用于研究 R 编程中分类变量和连续变量之间的关系。它是一种针对总体方差的假设检验。
R – ANOVA 检验
ANOVA 测试涉及设置:
- 零假设:所有总体均值均等。
- 替代假设:至少一个总体平均值与其他平均值不同。
ANOVA 检验有两种类型:
- 单向方差分析:它考虑了一个分类组。
- 双向方差分析:它考虑了两个分类组。
数据集
使用 mtcars(motor trend car road test)数据集,包含 32 个汽车品牌和 11 个属性。数据集预装在 R 的dplyr包中。
要开始使用 ANOVA,我们需要安装和加载dplyr包。
用 R 语言执行单向方差分析测试
一种方法是使用 mtcars 数据集执行 ANOVA 测试,该数据集在 disp 属性、连续属性和齿轮属性、分类属性之间预装了 dplyr 包。
R
# Installing the package
install.packages(dplyr)
# Loading the package
library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear),
xlab = "gear", ylab = "disp")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu = mu01 = mu02(There is no difference
# between average displacement for different gear)
# H1 = Not all means are equal
# Step 2: Calculate test statistics using aov function
mtcars_aov <- aov(mtcars$disp~factor(mtcars$gear))
summary(mtcars_aov)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05
# Step 4: Compare test statistics with F-Critical value
# and conclude test p < alpha, Reject Null Hypothesis
R
# Installing the package
install.packages(dplyr)
# Loading the package
library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 0),
xlab = "gear", ylab = "disp", main = "Automatic")
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 1),
xlab = "gear", ylab = "disp", main = "Manual")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu0 = mu01 = mu02(There is no difference between
# average displacement for different gear)
# H1 = Not all means are equal
# Step 2: Calculate test statistics using aov function
mtcars_aov2 <- aov(mtcars$disp~factor(mtcars$gear) *
factor(mtcars$am))
summary(mtcars_aov2)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05
# Step 4: Compare test statistics with F-Critical value
# and conclude test p < alpha, Reject Null Hypothesis
输出:
箱线图显示了齿轮相对于位移的平均值。听到分类变量是使用因子函数的齿轮,连续变量是 disp。
总结表明齿轮属性对位移非常重要(三颗星表示)。此外,P 值小于 0.05,因此证明齿轮对位移是显着的,即相互相关,我们拒绝零假设。
在 R 中执行双向 ANOVA 测试
使用 mtcars 数据集执行双向 ANOVA 测试,该数据集在 disp 属性、连续属性和齿轮属性、分类属性、am 属性、分类属性之间预装了 dplyr 包。
R
# Installing the package
install.packages(dplyr)
# Loading the package
library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 0),
xlab = "gear", ylab = "disp", main = "Automatic")
boxplot(mtcars$disp~mtcars$gear, subset = (mtcars$am == 1),
xlab = "gear", ylab = "disp", main = "Manual")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu0 = mu01 = mu02(There is no difference between
# average displacement for different gear)
# H1 = Not all means are equal
# Step 2: Calculate test statistics using aov function
mtcars_aov2 <- aov(mtcars$disp~factor(mtcars$gear) *
factor(mtcars$am))
summary(mtcars_aov2)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05
# Step 4: Compare test statistics with F-Critical value
# and conclude test p < alpha, Reject Null Hypothesis
输出:
箱线图显示了齿轮相对于位移的平均值。听到分类变量是齿轮和使用哪个因子函数,连续变量是 disp。
总结表明,gear 属性对位移非常重要(三颗星表示),am 属性对位移没有太大意义。齿轮的P值小于0.05,证明齿轮对位移有显着影响,即相互关联。 am 的 P 值大于 0.05,am 对位移不显着,即互不相关。
结果
我们从箱线图和摘要中看到了显着的结果。
- 排量与汽车中的齿轮密切相关,即排量取决于 p < 0.05 的齿轮。
- 在 p = 0.05 和 am 的汽车中,排量与齿轮密切相关,但与变速箱模式无关。