1 回答

TA贡献1844条经验 获得超8个赞
你的问题的根源是这一行:
y = np.arange(100) + np.random.rand(100)
StratifiedKFold无法从连续分布中采样,因此您的错误。尝试更改这一行,您的代码将愉快地执行:
from sklearn.linear_model import ElasticNetCV
from sklearn.model_selection import KFold, StratifiedKFold
import numpy as np
x = np.arange(100, dtype=np.float64).reshape(-1, 1)
y = np.random.choice([0,1], size=100)
# KFold default implementation:
model_default = ElasticNetCV(cv=5)
model_default.fit(x, y) # works fine
# KFold given as cv explicitly:
model_kfexp = ElasticNetCV(cv=KFold(5))
model_kfexp.fit(x, y) # also works fine
# StratifiedKFold given as cv explicitly:
model_skf = ElasticNetCV(cv=StratifiedKFold(5))
model_skf.fit(x, y) # no ERROR
笔记
如果您对连续数据进行采样,请使用KFold. 如果您的目标是明确的,您可以使用两者KFold并 使用StratifiedKFold适合您需要的任何一种。
笔记2
如果您坚持在连续数据上模拟分层抽样,您可能希望应用pandas.cut到您的数据,然后对该数据进行分层抽样,最后将结果(train_id, test_id)生成器传递给cvparam:
x = np.arange(100, dtype=np.float64).reshape(-1, 1)
y = np.arange(100) + np.random.rand(100)
y_cat = pd.cut(y, 10, labels=range(10))
skf_gen = StratifiedKFold(5).split(x, y_cat)
model_skf = ElasticNetCV(cv=skf_gen)
model_skf.fit(x, y) # no ERROR
添加回答
举报