为了账号安全,请及时绑定邮箱和手机立即绑定

保存 sklearn 管道的中间结果

保存 sklearn 管道的中间结果

慕姐8265434 2022-07-19 20:56:51
我有一个代码示例 - 具有两个组件(PCA 和随机森林)的 sklearn 管道,我想使用管道的中间结果以带来一些可解释性。我知道可以使用 .get_params() 来查看中间步骤,但是是否可以保存或提取中间结果以进行其他操作?我想应用 PCA 的附加功能(代码中的 1.1 和 1.2 部分)from sklearn.datasets import load_breast_cancerimport numpy as npimport pandas as pdfrom sklearn.decomposition import FastICA, PCAfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.pipeline import Pipelinefrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_reportfrom sklearn.metrics import confusion_matrix#Convert the dataset to data framecancer = load_breast_cancer()     data = np.c_[cancer.data, cancer.target]columns = np.append(cancer.feature_names, ["target"])df = pd.DataFrame(data, columns=columns)#Split data into train and test X = df.iloc[:, 0:30].valuesY = df.iloc[:, 30].valuesX_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)#Create a pipeline n_comp = 12clf = Pipeline([('pca', PCA(n_comp)), ('RandomForest', RandomForestClassifier(n_estimators=100))])clf.fit(X_train, Y_train)#Evalute the pipeline cr = classification_report(Y_test, Y_pred)print(cr)#see the intermediate steps of the pipelineprint(clf.get_params()['pca'])##1.1 if I create PCA outside of the pipeline pca = PCA(n_components=10)principalComponents = pca.fit_transform(X)##1.2 some explainability on pca outside of the pipeline pca.explained_variance_ratio_
查看完整描述

1 回答

?
智慧大石

TA贡献1946条经验 获得超3个赞

我们可以分配get_params()给一个应该返回类型对象的变量sklearn.decomposition.pca.PCA。有了这个,我们就可以访问分解的所有方法和属性。


from sklearn.datasets import load_breast_cancer

import numpy as np

import pandas as pd

from sklearn.decomposition import FastICA, PCA

from sklearn.ensemble import RandomForestClassifier

from sklearn.pipeline import Pipeline

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report

from sklearn.metrics import confusion_matrix


#Convert the dataset to data frame

cancer = load_breast_cancer()     

data = np.c_[cancer.data, cancer.target]

columns = np.append(cancer.feature_names, ["target"])

df = pd.DataFrame(data, columns=columns)



#Split data into train and test 

X = df.iloc[:, 0:30].values

Y = df.iloc[:, 30].values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)



#Create a pipeline 

n_comp = 12

clf = Pipeline([('pca', PCA(n_comp)), ('RandomForest', RandomForestClassifier(n_estimators=100))])

clf.fit(X_train, Y_train)



### --- ###

pca = clf.get_params()['pca']


type(pca)

#sklearn.decomposition.pca.PCA


pca.explained_variance_ratio_

#array([9.81327198e-01, 1.67333696e-02, 1.73934848e-03, 1.05758996e-04,

#       8.29268494e-05, 6.34081771e-06, 3.75309113e-06, 7.08990845e-07,

#       3.16742542e-07, 1.75055859e-07, 7.11274270e-08, 1.43003803e-08])


pca.components_.shape

#(12, 30)

希望这可以帮助。


查看完整回答
反对 回复 2022-07-19
  • 1 回答
  • 0 关注
  • 123 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信