📅  最后修改于: 2023-12-03 15:34:03.958000             🧑  作者: Mango
In statistics, the R-squared (coefficient of determination) is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables.
In this article, we will explore how to calculate R-squared in Python, using different approaches and libraries.
import numpy as np
def r_squared(y_true, y_pred):
residual = np.sum((y_true - y_pred) ** 2)
total = np.sum((y_true - np.mean(y_true)) ** 2)
r2 = 1 - (residual / total)
return r2
This function takes two arrays, y_true
and y_pred
, which represent the true and predicted values of the dependent variable, respectively. It calculates the residuals between the true and predicted values, as well as the total sum of squares or variance of the dependent variable. Finally, it calculates the R-squared as 1 minus the ratio of the residual to the total sum of squares.
For example, suppose we have the following data for the dependent variable y
and independent variable x
:
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 5, 4, 6])
We can calculate the predicted values of y
using a linear regression model as follows:
from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(x.reshape(-1, 1), y)
y_pred = reg.predict(x.reshape(-1, 1))
Then we can calculate the R-squared using our r_squared
function:
r2 = r_squared(y, y_pred)
print(r2)
Output:
0.3454545454545459
import statsmodels.api as sm
def r_squared(y_true, y_pred):
ssr = np.sum((y_pred - y_true) ** 2)
sst = np.sum((y_true - np.mean(y_true)) ** 2)
r2 = 1 - (ssr / sst)
return r2
This function takes the same arguments as the previous function, but uses the statsmodels
library to fit a linear regression model and obtain the predicted values of y
.
For example, using the same data as before, we can calculate the R-squared as follows:
x = sm.add_constant(x)
model = sm.OLS(y, x).fit()
y_pred = model.predict(x)
r2 = r_squared(y, y_pred)
print(r2)
Output:
0.3454545454545458
In this article, we have shown how to calculate R-squared in Python using Numpy and Statsmodels. While there are other libraries and methods for calculating R-squared, these are some of the most common and versatile ones. The R-squared is a useful tool for evaluating the goodness of fit of a regression model and understanding the amount of variance explained by the independent variables.