如何计算 R 中的点估计？

点估计是一种用于从给定的总体数据样本中找到总体参数的估计值或近似值的技术。针对以下两个测量参数计算点估计值：

Measuring parameter	Population Parameter	Point Estimate
Proportion	π	p
Mean	μ	x̄

本文重点介绍如何在 R 编程语言中计算点估计。

人口比例的点估计

人口比例的点估计可以使用以下数学公式计算，

Syntax: p′ = x / n

Here,

x : Signifies the number of successes
n : Signifies the sample size.
p′ is the point estimate of population proportion

编程需要懂一点英语

例子：

假设我们想估计某一天上课的学生比例。样本数据由 20 个数据元素组成。

R

# define data
data <- c('Present', 'Absent', 'Absent', 'Absent',
          'Absent', 'Absent', 'Present', 'Present', 
          'Absent', 'Present',
          'Present', 'Present', 'Present', 'Present', 
          'Present', 'Present', 'Absent', 'Present', 
          'Present', 'Present')
  
# find total sample size
n <- length(data)
  
# find number who are present
k <- sum(data == 'Present') 
  
# find sample proportion
p <- k/n
  
# print
print(paste("Sample proportion of students who are present", p))

R

# define data
data <- c('Present', 'Absent', 'Absent', 'Absent',
          'Absent', 'Absent', 'Present', 'Present', 
          'Absent', 'Present',
          'Present', 'Present', 'Present', 'Present',
          'Present', 'Present', 'Absent', 'Present',
          'Present', 'Present')
  
# find total sample size
total <- length(data)
  
# find number who responded 'Yes'
favourable <- sum(data == 'Present') 
  
# find sample proportion
ans <- favourable/total
  
# calculate margin of error
margin <- qnorm(0.975)*sqrt(ans*(1-ans)/total)
  
# calculate lower and upper bounds of 
# confidence interval
low <- ans - margin
print(low)
  
high <- ans + margin
print(high)

R

#define data
data <- c(170, 180, 165, 170, 165, 
          175, 160, 162, 156, 159, 
          160, 167, 168, 174, 180, 
          167, 169, 180, 190, 195)
  
#calculate sample mean
ans <- mean(data, na.rm = TRUE)
  
#print the mean height
print(paste("The sample mean is", ans))

R

# define data
data <- c(170, 180, 165, 170, 165, 175, 
          160, 162, 156, 159, 160, 167,
          168, 174, 180, 167, 169, 180,
          190, 195)
  
# Total number of students
total <- length(data)
  
# Point estimate of mean
favourable <- mean(data, na.rm = TRUE)
s <- sd(data)
  
# calculate margin of error
margin <- qt(0.975,df=total-1)*s/sqrt(total)
  
# calculate lower and upper bounds of 
# confidence interval
low <- favourable - margin
print(low)
  
high <- favourable + margin
print(high)

输出：

例子：

请注意，我们可以使用以下源代码计算总体比例的 95% 置信区间，

R

# define data
data <- c('Present', 'Absent', 'Absent', 'Absent',
          'Absent', 'Absent', 'Present', 'Present', 
          'Absent', 'Present',
          'Present', 'Present', 'Present', 'Present',
          'Present', 'Present', 'Absent', 'Present',
          'Present', 'Present')
  
# find total sample size
total <- length(data)
  
# find number who responded 'Yes'
favourable <- sum(data == 'Present') 
  
# find sample proportion
ans <- favourable/total
  
# calculate margin of error
margin <- qnorm(0.975)*sqrt(ans*(1-ans)/total)
  
# calculate lower and upper bounds of 
# confidence interval
low <- ans - margin
print(low)
  
high <- ans + margin
print(high)

输出：

因此，总体比例的 95% 置信区间为 [0.440, 0.859]。

总体均值的点估计

可以使用 R 中的 mean()函数计算总体均值的点估计。语法如下，

Syntax: mean(x, trim = 0, na.rm = FALSE, …)

Here,

x: It is the input vector
trim: It is used to drop some observations from both end of the sorted vector
na.rm: It is used to remove the missing values from the input vector

编程需要懂一点英语

例子：

假设我们要估计班级中学生身高的总体平均值。样本数据由 20 个数据元素组成。

R

#define data
data <- c(170, 180, 165, 170, 165, 
          175, 160, 162, 156, 159, 
          160, 167, 168, 174, 180, 
          167, 169, 180, 190, 195)
  
#calculate sample mean
ans <- mean(data, na.rm = TRUE)
  
#print the mean height
print(paste("The sample mean is", ans))

输出：

因此，样本表示高度为 170.6 厘米。

例子：

请注意，我们可以使用以下源代码计算总体平均值的 95% 置信区间，

R

# define data
data <- c(170, 180, 165, 170, 165, 175, 
          160, 162, 156, 159, 160, 167,
          168, 174, 180, 167, 169, 180,
          190, 195)
  
# Total number of students
total <- length(data)
  
# Point estimate of mean
favourable <- mean(data, na.rm = TRUE)
s <- sd(data)
  
# calculate margin of error
margin <- qt(0.975,df=total-1)*s/sqrt(total)
  
# calculate lower and upper bounds of 
# confidence interval
low <- favourable - margin
print(low)
  
high <- favourable + margin
print(high)

输出：

因此，总体均值的 95% 置信区间为 [165.782, 175.417]。