2 回答
TA贡献1772条经验 获得超5个赞
已经对上一个答案投了赞成票,我继续证明该错误确实是由于您score.append()在for循环之外:
我们实际上不需要拟合任何模型;我们可以通过对您的代码进行以下修改来模拟这种情况,这不会改变问题的本质:
import numpy as np
import pandas as pd
models = ['ran', 'knn', 'log', 'xgb', 'gbc', 'svc', 'ext', 'ada', 'gnb', 'gpc', 'bag']
scores = []
cv=10
# Sequentially fit and cross validate all models
for mod in models:
acc = np.array([np.random.rand() for i in range(cv)]) # simulate your accuracy here
scores.append(acc.mean()) # as in your code, i.e outside the for loop
# Create a dataframe of results
results = pd.DataFrame({
'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic Regression', 'XGBoost', 'Gradient Boosting',
'SVC', 'Extra Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 'Bagging Classifier'],
'Score': scores})
不出所料,这基本上复制了您的错误:
ValueError: arrays must all be same length
因为,正如在另一个答案中已经讨论过的,您的scores列表只有一个元素,即acc.mean()仅来自循环的最后一次迭代:
len(scores)
# 1
scores
# [0.47317491043203785]
因此大熊猫抱怨,因为它无法填充 11 行数据框......
正如其他答案中已经建议的那样,scores.append()在for循环内移动可以解决问题:
for mod in models:
acc = np.array([np.random.rand() for i in range(cv)])
scores.append(acc.mean()) # moved inside the loop
# Create a dataframe of results
results = pd.DataFrame({
'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic Regression', 'XGBoost', 'Gradient Boosting',
'SVC', 'Extra Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 'Bagging Classifier'],
'Score': scores})
print(results)
# output:
Model Score
0 Random Forest 0.492364
1 K Nearest Neighbour 0.624068
2 Logistic Regression 0.613653
3 XGBoost 0.536488
4 Gradient Boosting 0.484195
5 SVC 0.381556
6 Extra Trees 0.274922
7 AdaBoost 0.509297
8 Gaussian Naive Bayes 0.362866
9 Gaussian Process 0.606538
10 Bagging Classifier 0.393950
您可能还想记住,您不需要model.fit()代码中的部分 -cross_val_score所有必要的拟合本身...
TA贡献1757条经验 获得超7个赞
您的代码中似乎存在缩进错误,请参阅下面已编辑的代码。在你的代码中,如果你这样做,len(scores)你会得到,1因为只有最后一个值被添加,因为 append 在循环外被调用。
# Prepare lists
models = [ran, knn, log, xgb, gbc, svc, ext, ada, gnb, gpc, bag]
scores = []
# Sequentially fit and cross validate all models
for mod in models:
mod.fit(X_train, y_train)
acc = cross_val_score(mod, X_train, y_train, scoring =
"accuracy", cv = 10)
scores.append(acc.mean())
添加回答
举报