首页猿问如何获得scikit学习分类器的大...

如何获得scikit学习分类器的大多数信息功能？

Python

大话西游666 2019-11-25 14:13:08

诸如liblinear和nltk之类的机器学习包中的分类器提供了一个method show_most_informative_features()，它对于调试功能确实很有帮助：viagra = None ok : spam = 4.5 : 1.0hello = True ok : spam = 4.5 : 1.0hello = None spam : ok = 3.3 : 1.0viagra = True spam : ok = 3.3 : 1.0casino = True spam : ok = 2.0 : 1.0casino = None ok : spam = 1.5 : 1.0我的问题是，是否对scikit-learn中的分类器实施了类似的操作。我搜索了文档，但找不到类似的东西。如果尚无此类功能，是否有人知道如何解决这些值的解决方法？非常感谢！

查看完整描述

3 回答

翻阅古今

TA贡献1780条经验获得超5个赞

分类器本身不记录要素名称，它们仅显示数字数组。但是，如果您使用Vectorizer/ CountVectorizer/ TfidfVectorizer/ 提取了特征DictVectorizer，并且使用的是线性模型（例如LinearSVCNaive Bayes或Naive Bayes），则可以应用文档分类示例所使用的技巧。示例（未经测试，可能包含一个或两个错误）：

def print_top10(vectorizer, clf, class_labels):

"""Prints features with the highest coefficient values, per class"""

feature_names = vectorizer.get_feature_names()

for i, class_label in enumerate(class_labels):

top10 = np.argsort(clf.coef_[i])[-10:]

print("%s: %s" % (class_label,

" ".join(feature_names[j] for j in top10)))

这是用于多类分类的；对于二进制情况，我认为您应该clf.coef_[0]只使用。您可能需要对进行排序class_labels。

反对回复 2019-11-25

饮歌长啸

TA贡献1951条经验获得超3个赞

在larsmans代码的帮助下，我想到了以下二进制情况的代码：

def show_most_informative_features(vectorizer, clf, n=20):

feature_names = vectorizer.get_feature_names()

coefs_with_fns = sorted(zip(clf.coef_[0], feature_names))

top = zip(coefs_with_fns[:n], coefs_with_fns[:-(n + 1):-1])

for (coef_1, fn_1), (coef_2, fn_2) in top:

print "\t%.4f\t%-15s\t\t%.4f\t%-15s" % (coef_1, fn_1, coef_2, fn_2)

反对回复 2019-11-25

潇潇雨雨

TA贡献1833条经验获得超4个赞

实际上，我必须在NaiveBayes分类器上找到功能重要性，尽管我使用了上述功能，但无法基于类获得功能重要性。我浏览了scikit-learn的文档，并对上述功能进行了一些调整，以发现它可以解决我的问题。希望它也对您有帮助！

def important_features(vectorizer,classifier,n=20):

class_labels = classifier.classes_

feature_names =vectorizer.get_feature_names()

topn_class1 = sorted(zip(classifier.feature_count_[0], feature_names),reverse=True)[:n]

topn_class2 = sorted(zip(classifier.feature_count_[1], feature_names),reverse=True)[:n]

print("Important words in negative reviews")

for coef, feat in topn_class1:

print(class_labels[0], coef, feat)

print("-----------------------------------------")

print("Important words in positive reviews")

for coef, feat in topn_class2:

print(class_labels[1], coef, feat)

请注意，您的分类器（在我的情况下是NaiveBayes）必须具有feature_count_属性才能起作用。

反对回复 2019-11-25

3 回答
0 关注
389 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何获得scikit学习分类器的大多数信息功能？

如何获得scikit学习分类器的大多数信息功能？

3 回答

添加回答