2 回答
TA贡献1803条经验 获得超6个赞
您可以使用RepeatedStratifiedKFold
,顾名思义,重复 K 折交叉验证器n
时间。要重复处理10
时间,设置,并在/大小中具有大约 n_repeats
的比例,我们可以设置:9:1
train
test
n_splits=10
from sklearn.model_selection import RepeatedStratifiedKFold
X = a[:,:-1]
y = a[:,-1]
rskf = RepeatedStratifiedKFold(n_splits=10, n_repeats=10, random_state=2)
for train_index, test_index in rskf.split(X, y):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
print(f'\nClass 1: {((y_train==1).sum()/len(y_train))*100:.0f}%')
print(f'\nShape of train: {X_train.shape[0]}')
print(f'Shape of test: {X_test.shape[0]}')
Class 1: 73%
Shape of train: 33
Shape of test: 4
Class 1: 73%
Shape of train: 33
Shape of test: 4
Class 1: 73%
Shape of train: 33
Shape of test: 4
Class 1: 73%
Shape of train: 33
Shape of test: 4
...
TA贡献1845条经验 获得超8个赞
将数据拆分为训练和测试的一种众所周知的方法是 scikit-learn train_test_split
。
model_selection.train_test_split的 API 文档。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=42)
您可以使用random_state
变量(种子),直到您的类之间的比例正确。虽然train_test_split
不会强制执行比例,但它通常遵循人口比例。
添加回答
举报