2 回答
TA贡献1946条经验 获得超3个赞
也许您可以尝试制作索引数组并首先对其进行洗牌。然后将前 80 个索引用于第一个 CSV,其余 (20) 个用于第二个:
from random import shuffle
indices = list(range(1,101))
shuffle(indices)
with open('C:\\train.csv', 'w') as outf:
print('x:data,y:label', file=outf)
for i in indices[:80]:
print('./1/a_%s.csv, 1' % i, file=outf)
with open('C:\\test.csv', 'w') as outf:
print('x:data,y:label', file=outf)
for i in indices[80:]:
print('./1/a_%s.csv, 1' % i, file=outf)
TA贡献1829条经验 获得超6个赞
这是机器学习中的常见问题。scikit-learn有几个工具可以处理这个问题,例如train_test_split
from sklearn.model_selection import train_test_split
indices = list(range(1, 101))
i_a, i_b = train_test_split(indices, train_size=0.8, test_size=0.2)
现在您可以像原始代码一样迭代i_a(80 个随机索引)和i_b(20 个随机索引):
with open('C:\\train.csv', 'w') as outf:
print('x:data,y:label', file=outf)
for i in i_a:
print('./1/a_%s.csv, 1' % i, file=outf)
with open('C:\\test.csv', 'w') as outf:
print('x:data,y:label', file=outf)
for i in i_b:
print('./1/a_%s.csv, 1' % i, file=outf)
添加回答
举报
