为了账号安全,请及时绑定邮箱和手机立即绑定

使用 lightgbm 的特征重要性

使用 lightgbm 的特征重要性

缥缈止盈 2021-08-24 16:32:27
我正在尝试运行我的 lightgbm 进行功能选择,如下所示;初始化# Initialize an empty array to hold feature importancesfeature_importances = np.zeros(features_sample.shape[1])# Create the model with several hyperparametersmodel = lgb.LGBMClassifier(objective='binary',          boosting_type = 'goss',          n_estimators = 10000, class_weight ='balanced')然后我适合模型如下# Fit the model twice to avoid overfittingfor i in range(2):   # Split into training and validation set   train_features, valid_features, train_y, valid_y = train_test_split(train_X, train_Y, test_size = 0.25, random_state = i)   # Train using early stopping   model.fit(train_features, train_y, early_stopping_rounds=100, eval_set = [(valid_features, valid_y)],              eval_metric = 'auc', verbose = 200)   # Record the feature importances   feature_importances += model.feature_importances_但我收到以下错误Training until validation scores don't improve for 100 rounds. Early stopping, best iteration is: [6]  valid_0's auc: 0.88648ValueError: operands could not be broadcast together with shapes (87,) (83,) (87,) 
查看完整描述

2 回答

?
哆啦的时光机

TA贡献1779条经验 获得超6个赞

根据我们是训练modelusingscikit-learn还是lightgbm方法,为了获得重要性,我们应该分别选择feature_importances_属性或feature_importance()函数,就像在这个例子中一样(其中model是lgbm.fit() / lgbm.train(), 和的结果train_columns = x_train_df.columns):


import pandas as pd


def get_lgbm_varimp(model, train_columns, max_vars=50):

    

    if "basic.Booster" in str(model.__class__):

        # lightgbm.basic.Booster was trained directly, so using feature_importance() function 

        cv_varimp_df = pd.DataFrame([train_columns, model.feature_importance()]).T

    else:

        # Scikit-learn API LGBMClassifier or LGBMRegressor was fitted, 

        # so using feature_importances_ property

        cv_varimp_df = pd.DataFrame([train_columns, model.feature_importances_]).T


    cv_varimp_df.columns = ['feature_name', 'varimp']


    cv_varimp_df.sort_values(by='varimp', ascending=False, inplace=True)


    cv_varimp_df = cv_varimp_df.iloc[0:max_vars]   


    return cv_varimp_df

    

请注意,我们依赖于这样一个假设,即特征重要性值的排序就像训练期间模型矩阵列的排序(包括 one-hot dummy cols),请参阅LightGBM #209。


查看完整回答
反对 回复 2021-08-24
  • 2 回答
  • 0 关注
  • 1499 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号