1 回答
![?](http://img1.sycdn.imooc.com/5458477300014deb02200220-100-100.jpg)
TA贡献1982条经验 获得超2个赞
如果每个组具有总是9值是可能的用途numpy.reshape为2d array与DataFrame构造器,也可用于列的值先取列值9 key:
print (df['value'].values.reshape(-1, 9))
[['<5525962.1075855679785.JavaMail.evans@thyme>'
'Wed, 13 Dec 2000 07:04:00 -0800 (PST)' 'phillip.allen@enron.com'
'christi.nicolay@enron.com, james.steffes@enron...' 'Phillip K Allen'
'Christi L Nicolay, James D Steffes, Jeff Dasov...' 'None' 'None'
'Allen-P']
['<4650921.1075855679981.JavaMail.evans@thyme>'
'Tue, 5 Dec 2000 07:31:00 -0800 (PST)' 'ina.rangel@enron.com'
'amanda.huble@enron.com' 'Ina Rangel' 'Amanda Huble' 'None' 'None'
'Allen-P']]
df = pd.DataFrame(df['value'].values.reshape(-1, 9), columns=df['key'].iloc[:9])
print (df)
key Message-ID \
0 <5525962.1075855679785.JavaMail.evans@thyme>
1 <4650921.1075855679981.JavaMail.evans@thyme>
key Date From \
0 Wed, 13 Dec 2000 07:04:00 -0800 (PST) phillip.allen@enron.com
1 Tue, 5 Dec 2000 07:31:00 -0800 (PST) ina.rangel@enron.com
key To X-From \
0 christi.nicolay@enron.com, james.steffes@enron... Phillip K Allen
1 amanda.huble@enron.com Ina Rangel
key X-To X-cc: X-bcc: X-Origin
0 Christi L Nicolay, James D Steffes, Jeff Dasov... None None Allen-P
1 Amanda Huble None None Allen-P
如果Message-ID每个组的数据总是可以set_index与布尔掩码Series创建的助手一起使用cumsum- 比较用于识别每个组的开始:eq ==
df = df.set_index([df['key'].eq('Message-ID').cumsum(), 'key'])['value'].unstack()
print (df)
key Date From \
key
1 Wed, 13 Dec 2000 07:04:00 -0800 (PST) phillip.allen@enron.com
2 Tue, 5 Dec 2000 07:31:00 -0800 (PST) ina.rangel@enron.com
key Message-ID \
key
1 <5525962.1075855679785.JavaMail.evans@thyme>
2 <4650921.1075855679981.JavaMail.evans@thyme>
key To X-From \
key
1 christi.nicolay@enron.com, james.steffes@enron... Phillip K Allen
2 amanda.huble@enron.com Ina Rangel
key X-Origin X-To X-bcc: X-cc:
key
1 Allen-P Christi L Nicolay, James D Steffes, Jeff Dasov... None None
2 Allen-P Amanda Huble None None
添加回答
举报