首页猿问如何有效地比较所有模型的准确性

如何有效地比较所有模型的准确性

Python

叮当猫咪 2021-11-09 20:12:41

我已经拆分了训练数据并初始化了 11 个分类器模型，我现在想比较这些模型。我在 Ubuntu 18.04 上运行 VS Code。我试过了：# Prepare listsmodels = [ran, knn, log, xgb, gbc, svc, ext, ada, gnb, gpc, bag] scores = []# Sequentially fit and cross validate all modelsfor mod in models: mod.fit(X_train, y_train) acc = cross_val_score(mod, X_train, y_train, scoring = "accuracy", cv = 10)scores.append(acc.mean())# Creating a table of results, ranked highest to lowestresults = pd.DataFrame({ 'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic Regression', 'XGBoost', 'Gradient Boosting', 'SVC', 'Extra Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 'Bagging Classifier'], 'Score': scores})最后一部分返回：ValueError：数组的长度必须相同我数了 2 倍，确实有 11 个模型。我错过了什么？

查看完整描述

2 回答

月关宝盒

TA贡献1772条经验获得超5个赞

已经对上一个答案投了赞成票，我继续证明该错误确实是由于您score.append()在for循环之外：

我们实际上不需要拟合任何模型；我们可以通过对您的代码进行以下修改来模拟这种情况，这不会改变问题的本质：

import numpy as np

import pandas as pd

models = ['ran', 'knn', 'log', 'xgb', 'gbc', 'svc', 'ext', 'ada', 'gnb', 'gpc', 'bag']

scores = []

cv=10

# Sequentially fit and cross validate all models

for mod in models:

acc = np.array([np.random.rand() for i in range(cv)]) # simulate your accuracy here

scores.append(acc.mean()) # as in your code, i.e outside the for loop

# Create a dataframe of results

results = pd.DataFrame({

'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic Regression', 'XGBoost', 'Gradient Boosting',

'SVC', 'Extra Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 'Bagging Classifier'],

'Score': scores})

不出所料，这基本上复制了您的错误：

ValueError: arrays must all be same length

因为，正如在另一个答案中已经讨论过的，您的scores列表只有一个元素，即acc.mean()仅来自循环的最后一次迭代：

len(scores)

# 1

scores

# [0.47317491043203785]

因此大熊猫抱怨，因为它无法填充 11 行数据框......

正如其他答案中已经建议的那样，scores.append()在for循环内移动可以解决问题：

for mod in models:

acc = np.array([np.random.rand() for i in range(cv)])

scores.append(acc.mean()) # moved inside the loop

# Create a dataframe of results

results = pd.DataFrame({

'Model': ['Random Forest', 'K Nearest Neighbour', 'Logistic Regression', 'XGBoost', 'Gradient Boosting',

'SVC', 'Extra Trees', 'AdaBoost', 'Gaussian Naive Bayes', 'Gaussian Process', 'Bagging Classifier'],

'Score': scores})

print(results)

# output:

Model Score

0 Random Forest 0.492364

1 K Nearest Neighbour 0.624068

2 Logistic Regression 0.613653

3 XGBoost 0.536488

4 Gradient Boosting 0.484195

5 SVC 0.381556

6 Extra Trees 0.274922

7 AdaBoost 0.509297

8 Gaussian Naive Bayes 0.362866

9 Gaussian Process 0.606538

10 Bagging Classifier 0.393950

您可能还想记住，您不需要model.fit()代码中的部分 -cross_val_score所有必要的拟合本身...

反对回复 2021-11-09

长风秋雁

TA贡献1757条经验获得超7个赞

您的代码中似乎存在缩进错误，请参阅下面已编辑的代码。在你的代码中，如果你这样做，len(scores)你会得到，1因为只有最后一个值被添加，因为 append 在循环外被调用。

# Prepare lists

models = [ran, knn, log, xgb, gbc, svc, ext, ada, gnb, gpc, bag]

scores = []

# Sequentially fit and cross validate all models

for mod in models:

mod.fit(X_train, y_train)

acc = cross_val_score(mod, X_train, y_train, scoring =

"accuracy", cv = 10)

scores.append(acc.mean())

反对回复 2021-11-09

2 回答
0 关注
266 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何有效地比较所有模型的准确性

如何有效地比较所有模型的准确性

2 回答

添加回答