如何在Python实现梯度下降以找到局部最小值?
梯度下降是一种迭代算法,用于通过寻找最佳参数来最小化函数。梯度下降可以应用于任何维度函数,即 1-D、2-D、3-D。在本文中,我们将致力于寻找抛物线函数(2-D) 的全局最小值,并将在Python实现梯度下降以找到线性回归方程 (1-D) 的最佳参数。在深入研究实现部分之前,让我们确定实现梯度下降算法所需的参数集。为了实现梯度下降算法,我们需要一个需要最小化的成本函数、迭代次数、一个学习率来确定每次迭代时的步长,同时向最小值移动,权重和偏差的部分导数来更新参数在每次迭代中,以及一个预测函数。
到目前为止,我们已经看到了梯度下降所需的参数。现在让我们用梯度下降算法映射参数,并通过一个例子来更好地理解梯度下降。让我们考虑抛物线方程 y=4x 2 。通过查看方程,我们可以确定抛物线函数在 x = 0 处最小,即在 x=0, y=0 处。因此 x=0 是抛物线函数y=4x 2的局部最小值。现在让我们看看梯度下降的算法以及如何通过应用梯度下降来获得局部最小值:
梯度下降算法
应与当前点的函数梯度的负值(远离梯度)成比例地进行步骤以找到局部最小值。梯度上升是通过采取与梯度的正数成比例的步长(向梯度移动)来接近函数局部最大值的过程。
repeat until convergence
{
w = w - (learning_rate * (dJ/dw))
b = b - (learning_rate * (dJ/db))
}
步骤 1:初始化所有必要的参数并导出抛物线方程 4x 2的梯度函数。 x 2的导数是2x,所以抛物线方程4x 2的导数将是8x。
x0 = 3 (random initialization of x)
learning_rate = 0.01 (to determine the step size while moving towards local minima)
梯度 = (计算梯度函数)
第 2 步:让我们执行 3 次梯度下降迭代:
对于每次迭代,继续根据梯度下降公式更新 x 的值。
Iteration 1:
x1 = x0 - (learning_rate * gradient)
x1 = 3 - (0.01 * (8 * 3))
x1 = 3 - 0.24
x1 = 2.76
Iteration 2:
x2 = x1 - (learning_rate * gradient)
x2 = 2.76 - (0.01 * (8 * 2.76))
x2 = 2.76 - 0.2208
x2 = 2.5392
Iteration 3:
x3 = x2 - (learning_rate * gradient)
x3 = 2.5392 - (0.01 * (8 * 2.5392))
x3 = 2.5392 - 0.203136
x3 = 2.3360
从上面的梯度下降的三个迭代中,我们可以注意到 x 的值是通过迭代递减的,并且通过运行梯度下降进行更多的迭代会慢慢收敛到 0(局部最小值)。现在你可能有一个问题,我们应该运行多少次迭代梯度下降?
我们可以设置一个停止阈值,即当 x 的前一个值和当前值之间的差异变得小于停止阈值时,我们停止迭代。当涉及到机器学习算法和深度学习算法的梯度下降实现时,我们尝试最小化使用梯度下降算法的成本函数。现在我们已经清楚梯度下降的内部工作,让我们看看梯度下降的Python实现,我们将最小化线性回归算法的成本函数并找到最佳拟合线。在我们的例子中,参数如下:
预测函数
线性回归算法的预测函数是由 y=wx+b 给出的线性方程。
prediction_function (y) = (w * x) + b
Here, x is the independent variable
y is the dependent variable
w is the weight associcated with input variable
b is the bias
成本函数
成本函数用于根据所做的预测计算损失。在线性回归中,我们使用均方误差来计算损失。均方误差是实际值和预测值之间的平方差之和。
成本函数(J) =
这里,n是样本数
偏导数(梯度)
使用成本函数计算权重和偏差的偏导数。我们得到:
参数更新
通过减去学习率及其各自梯度的乘积来更新权重和偏差。
w = w - (learning_rate * (dJ/dw))
b = b - (learning_rate * (dJ/db))
梯度下降的Python实现
在实现部分,我们将编写两个函数,一个是将实际输出和预测输出作为输入并返回损失的代价函数,第二个是实际的梯度下降函数,它以自变量目标变量作为输入并使用梯度下降算法找到最佳拟合线。迭代次数、learning_rate 和停止阈值是梯度下降算法的调整参数,可由用户调整。在主函数,我们将初始化线性相关的随机数据并对数据应用梯度下降算法以找到最佳拟合线。使用梯度下降算法找到的最佳权重和偏差稍后用于在主函数绘制最佳拟合线。迭代指定必须完成参数更新的次数,停止阈值是停止梯度下降算法的两次连续迭代之间损失的最小变化。
Python3
# Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
def mean_squared_error(y_true, y_predicted):
# Calculating the loss or cost
cost = np.sum((y_true-y_predicted)**2) / len(y_true)
return cost
# Gradient Descent Function
# Here iterations, learning_rate, stopping_threshold
# are hyperparameters that can be tuned
def gradient_descent(x, y, iterations = 1000, learning_rate = 0.0001,
stopping_threshold = 1e-6):
# Initializing weight, bias, learning rate and iterations
current_weight = 0.1
current_bias = 0.01
iterations = iterations
learning_rate = learning_rate
n = float(len(x))
costs = []
weights = []
previous_cost = None
# Estimation of optimal parameters
for i in range(iterations):
# Making predictions
y_predicted = (current_weight * x) + current_bias
# Calculationg the current cost
current_cost = mean_squared_error(y, y_predicted)
# If the change in cost is less than or equal to
# stopping_threshold we stop the gradient descent
if previous_cost and abs(previous_cost-current_cost)<=stopping_threshold:
break
previous_cost = current_cost
costs.append(current_cost)
weights.append(current_weight)
# Calculating the gradients
weight_derivative = -(2/n) * sum(x * (y-y_predicted))
bias_derivative = -(2/n) * sum(y-y_predicted)
# Updating weights and bias
current_weight = current_weight - (learning_rate * weight_derivative)
current_bias = current_bias - (learning_rate * bias_derivative)
# Printing the parameters for each 1000th iteration
print(f"Iteration {i+1}: Cost {current_cost}, Weight \
{current_weight}, Bias {current_bias}")
# Visualizing the weights and cost at for all iterations
plt.figure(figsize = (8,6))
plt.plot(weights, costs)
plt.scatter(weights, costs, marker='o', color='red')
plt.title("Cost vs Weights")
plt.ylabel("Cost")
plt.xlabel("Weight")
plt.show()
return current_weight, current_bias
def main():
# Data
X = np.array([32.50234527, 53.42680403, 61.53035803, 47.47563963, 59.81320787,
55.14218841, 52.21179669, 39.29956669, 48.10504169, 52.55001444,
45.41973014, 54.35163488, 44.1640495 , 58.16847072, 56.72720806,
48.95588857, 44.68719623, 60.29732685, 45.61864377, 38.81681754])
Y = np.array([31.70700585, 68.77759598, 62.5623823 , 71.54663223, 87.23092513,
78.21151827, 79.64197305, 59.17148932, 75.3312423 , 71.30087989,
55.16567715, 82.47884676, 62.00892325, 75.39287043, 81.43619216,
60.72360244, 82.89250373, 97.37989686, 48.84715332, 56.87721319])
# Estimating weight and bias using gradient descent
estimated_weight, eatimated_bias = gradient_descent(X, Y, iterations=2000)
print(f"Estimated Weight: {estimated_weight}\nEstimated Bias: {eatimated_bias}")
# Making predictions using estimated parameters
Y_pred = estimated_weight*X + eatimated_bias
# Plotting the regression line
plt.figure(figsize = (8,6))
plt.scatter(X, Y, marker='o', color='red')
plt.plot([min(X), max(X)], [min(Y_pred), max(Y_pred)], color='blue',markerfacecolor='red',
markersize=10,linestyle='dashed')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
if __name__=="__main__":
main()
输出:
Iteration 1: Cost 4352.088931274409, Weight 0.7593291142562117, Bias 0.02288558130709
Iteration 2: Cost 1114.8561474350017, Weight 1.081602958862324, Bias 0.02918014748569513
Iteration 3: Cost 341.42912086804455, Weight 1.2391274084945083, Bias 0.03225308846928192
Iteration 4: Cost 156.64495290904443, Weight 1.3161239281746984, Bias 0.03375132986012604
Iteration 5: Cost 112.49704004742098, Weight 1.3537591652024805, Bias 0.034479873154934775
Iteration 6: Cost 101.9493925395456, Weight 1.3721549833978113, Bias 0.034832195392868505
Iteration 7: Cost 99.4293893333546, Weight 1.3811467575154601, Bias 0.03500062439068245
Iteration 8: Cost 98.82731958262897, Weight 1.3855419247507244, Bias 0.03507916814736111
Iteration 9: Cost 98.68347500997261, Weight 1.3876903144657764, Bias 0.035113776874486774
Iteration 10: Cost 98.64910780902792, Weight 1.3887405007983562, Bias 0.035126910596389935
Iteration 11: Cost 98.64089651459352, Weight 1.389253895811451, Bias 0.03512954755833985
Iteration 12: Cost 98.63893428729509, Weight 1.38950491235671, Bias 0.035127053821718185
Iteration 13: Cost 98.63846506273883, Weight 1.3896276808137857, Bias 0.035122052266051224
Iteration 14: Cost 98.63835254057648, Weight 1.38968776283053, Bias 0.03511582492978764
Iteration 15: Cost 98.63832524036214, Weight 1.3897172043139192, Bias 0.03510899846107016
Iteration 16: Cost 98.63831830104695, Weight 1.389731668997059, Bias 0.035101879159522745
Iteration 17: Cost 98.63831622628217, Weight 1.389738813163012, Bias 0.03509461674147458
Estimated Weight: 1.389738813163012
Estimated Bias: 0.03509461674147458