3 回答

TA贡献1111条经验 获得超0个赞
我认为您的代码有点太复杂了,它需要更多的结构,因为否则您将迷失在所有方程式和运算中。最后,此回归可归结为以下四个操作:
计算假设h = X * theta
计算损耗= h-y,也许是成本的平方(loss ^ 2)/ 2m
计算梯度= X'*损耗/ m
更新参数theta = theta-alpha *渐变
就您而言,我想您已经m与混淆了n。这里m表示训练集中的示例数量,而不是特征数量。
让我们看看我的代码变化:
import numpy as np
import random
# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
def genData(numPoints, bias, variance):
x = np.zeros(shape=(numPoints, 2))
y = np.zeros(shape=numPoints)
# basically a straight line
for i in range(0, numPoints):
# bias feature
x[i][0] = 1
x[i][1] = i
# our target variable
y[i] = (i + bias) + random.uniform(0, 1) * variance
return x, y
# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
首先,我创建一个小的随机数据集,其外观应如下所示:
线性回归
如您所见,我还添加了由excel计算的生成的回归线和公式。
您需要注意使用梯度下降的回归直觉。当您完成对数据X的完整批量传递时,需要将每个示例的m损失减少为一次权重更新。在这种情况下,这是所有梯度之和的平均值,因此除以m。
接下来需要注意的是跟踪收敛并调整学习率。为此,您应该始终跟踪每次迭代的成本,甚至可能将其绘制出来。
如果运行我的示例,返回的theta将如下所示:
Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368 1.01108458]
实际上,这与excel计算的方程非常接近(y = x + 30)。请注意,当我们将偏差传递到第一列时,第一个theta值表示偏差权重。

TA贡献2011条经验 获得超2个赞
我知道这个问题已经回答了,但是我对GD函数做了一些更新:
### COST FUNCTION
def cost(theta,X,y):
### Evaluate half MSE (Mean square error)
m = len(y)
error = np.dot(X,theta) - y
J = np.sum(error ** 2)/(2*m)
return J
cost(theta,X,y)
def GD(X,y,theta,alpha):
cost_histo = [0]
theta_histo = [0]
# an arbitrary gradient, to pass the initial while() check
delta = [np.repeat(1,len(X))]
# Initial theta
old_cost = cost(theta,X,y)
while (np.max(np.abs(delta)) > 1e-6):
error = np.dot(X,theta) - y
delta = np.dot(np.transpose(X),error)/len(y)
trial_theta = theta - alpha * delta
trial_cost = cost(trial_theta,X,y)
while (trial_cost >= old_cost):
trial_theta = (theta +trial_theta)/2
trial_cost = cost(trial_theta,X,y)
cost_histo = cost_histo + trial_cost
theta_histo = theta_histo + trial_theta
old_cost = trial_cost
theta = trial_theta
Intercept = theta[0]
Slope = theta[1]
return [Intercept,Slope]
res = GD(X,y,theta,alpha)
该函数在迭代过程中降低了alpha值,从而使函数收敛速度更快,请参阅R中的示例使用Gradient Descent(Steepest Descent)估计线性回归。我在Python中应用了相同的逻辑。
添加回答
举报