📜  如何在Python中计算学生化残差?

📅  最后修改于: 2022-05-13 01:54:29.253000             🧑  作者: Mango

如何在Python中计算学生化残差?

学生化残差是一个统计术语,它被定义为通过将残差除以其估计的标准偏差获得的商。这是用于检测轮廓的关键技术。实际上,可以声称数据集中具有大于 3(绝对值)的学生化残差的任何类型的观察都是异常值。

我们的系统中应该已经安装了以下Python库:

  • 熊猫
  • 麻木的
  • 统计模型

您可以在终端上使用以下命令在系统上安装这些软件包。

pip3 install pandas numpy statsmodels matplotlib

在Python中计算学生化残差的步骤

第 1 步:导入库。

我们需要在上面安装的程序中导入库。

Python3
# Importing necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt


Python3
# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84,
                                    96, 86, 75, 97, 89],
                   'Benchmark': [27, 28, 18, 18, 29, 30,
                                 25, 25, 24, 29]})


Python3
# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()


Python3
# Producing studenterized residual
stud_res = simple_regression_model.outlier_test()


Python3
# Python program to calculate studenterized residual
 
# Importing necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
 
# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84,
                                    96, 86, 75, 97, 89],
                   'Benchmark': [27, 28, 18, 18, 29, 30,
                                 25, 25, 24, 29]})
 
# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()
 
# Producing studenterized residual
result = simple_regression_model.outlier_test()
 
print(result)


Python3
# Python program to draw the plot
# of stundenterized resiual
 
# Importing necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
 
# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84,
                                    96, 86, 75, 97, 89],
                   'Benchmark': [27, 28, 18, 18, 29, 30,
                                 25, 25, 24, 29]})
 
# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()
 
# Producing studenterized residual
result = simple_regression_model.outlier_test()
 
# Defining predictor variable values and
# studentized residuals
x = dataframe['Score']
y = result['student_resid']
 
# Creating a scatterplot of predictor variable
# vs studentized residuals
plt.scatter(x, y)
plt.axhline(y=0, color='black', linestyle='--')
plt.xlabel('Points')
plt.ylabel('Studentized Residuals')
 
# Save the plot
plt.savefig("Plot.png")



第 2 步:创建数据框。

首先,我们需要创建一个数据框。借助 pandas 的包,我们可以创建一个数据框。片段如下,

Python3

# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84,
                                    96, 86, 75, 97, 89],
                   'Benchmark': [27, 28, 18, 18, 29, 30,
                                 25, 25, 24, 29]})

第三步:建立一个简单的线性回归模型。

现在我们需要为创建的数据集建立一个简单的线性回归模型。为了拟合简单的线性回归模型, Python提供了 statsmodels 包中的 ols()函数。

例子:

Python3

# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()


第 4 步:产生学生化残差。

为了生成包含数据集中每个观察的学生化残差的数据帧,我们可以使用 outlier_test()函数。

句法:

Python3

# Producing studenterized residual
stud_res = simple_regression_model.outlier_test()


下面是完整的实现。

Python3

# Python program to calculate studenterized residual
 
# Importing necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
 
# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84,
                                    96, 86, 75, 97, 89],
                   'Benchmark': [27, 28, 18, 18, 29, 30,
                                 25, 25, 24, 29]})
 
# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()
 
# Producing studenterized residual
result = simple_regression_model.outlier_test()
 
print(result)

输出:

输出是一个数据框,其中包含:

  • 学生化残差
  • 学生化残差的未调整 p 值
  • 学生化残差的 Bonferroni 校正 p 值

我们可以看到数据集中第一个观测值的学生化残差为 -1.121201,第二个观测值的学生化残差为 0.954871,以此类推。

可视化:

现在让我们进入学生化残差的可视化。在 metaplotlib 的帮助下,我们可以绘制预测变量值 VS 对应的学生化残差的图。

例子:

Python3

# Python program to draw the plot
# of stundenterized resiual
 
# Importing necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
 
# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84,
                                    96, 86, 75, 97, 89],
                   'Benchmark': [27, 28, 18, 18, 29, 30,
                                 25, 25, 24, 29]})
 
# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()
 
# Producing studenterized residual
result = simple_regression_model.outlier_test()
 
# Defining predictor variable values and
# studentized residuals
x = dataframe['Score']
y = result['student_resid']
 
# Creating a scatterplot of predictor variable
# vs studentized residuals
plt.scatter(x, y)
plt.axhline(y=0, color='black', linestyle='--')
plt.xlabel('Points')
plt.ylabel('Studentized Residuals')
 
# Save the plot
plt.savefig("Plot.png")

输出:

绘图.png: