我使用了一个 70-30 平衡的数据集,并尝试使用 train_test_split sklearn 函数在训练/测试中将其拆分为分层。它在 python 3.5 中按预期工作,但在 3.7 中却不是。有我用来重现的代码:import numpy as npfrom sklearn.model_selection import train_test_splitdata = np.random.rand(1000000).reshape(100000, 10)y_0 = [0]*30000y_1 = [1]*70000y_2 = y_0 + y_1x_train, x_test, y_train, y_test = train_test_split(data, y_2, test_size=0.2, random_state=0, stratify=y_2)print('Train set size : {}'.format(len(y_train)))print('Value 1 repartition in train set : {}'.format(sum(y_train)/len(y_train)))print('Test set size : {}'.format(len(y_test)))print('Value 1 repartition in test set : {}'.format(sum(y_test)/len(y_test)))输出 Python 3.7:Train set size : 24102Value 1 repartition in train set : 0.5414903327524687Test set size : 20000Value 1 repartition in test set : 0.53775输出 Python 3.5:Train set size : 80000Value 1 repartition in train set : 0.7Test set size : 20000Value 1 repartition in test set : 0.7库版本 3.7:Python 3.7.2 numpy==1.16.1 pandas==0.24.1 python-dateutil==2.8.0 pytz==2018.9 scikit-learn==0.20.2 scipy==1.2.1 six==1.12.0库版本 3.5:Python 3.5.1 numpy==1.16.1 pandas==0.24.1 python-dateutil==2.8.0 pytz==2018.9 scikit-learn==0.20.2 scipy==1.2.1 six==1.12.0
添加回答
举报
0/150
提交
取消