2 回答
TA贡献1775条经验 获得超11个赞
这是0.11.1(即将推出)中提供的一种新方法,提供了组过滤机制,谢谢@DanAllen
In [49]: df
Out[49]:
ip
date_time
2013-05-30 06:00:41 173.199.116.171
2013-05-30 06:05:41 61.245.172.14
2013-05-30 06:10:42 74.86.158.106
2013-05-30 06:20:42 61.245.172.14
In [50]: df.groupby(pd.TimeGrouper('20min')).filter(lambda x: x.between_time('06:00:00', '06:20:00'))
Out[50]:
ip
date_time
2013-05-30 06:00:41 173.199.116.171
2013-05-30 06:05:41 61.245.172.14
2013-05-30 06:10:42 74.86.158.106
TA贡献1809条经验 获得超8个赞
第一个计算每个20分钟时段的所有行
In [11]: df1.IP.resample('20t', how='count') # I usually prefer '20min'
Out[11]:
datetime
2013-05-30 06:00:00 3
2013-05-30 06:20:00 1
dtype: int64
第二个在某些时间之间获取这些行:
In [12]: df1.IP.between_time('06:00:00', '06:20:00')
Out[12]:
datetime
2013-05-30 06:00:41 173.199.116.171
2013-05-30 06:05:41 61.245.172.14
2013-05-30 06:10:42 74.86.158.106
Name: IP, dtype: object
有可能是一个很好地解决了一般问题(所以你不需要指定的时间)使用TimeGrouper,但是这是我能做到的,打印所有分组的最好的:
In [13]: tg = pd.TimeGrouper('20t')
In [14]: g = df1.groupby(tg)
In [15]: def f(x):
print x
return x
In [16]: _ = g.apply(f) # the '_ =' bit just suppresses ouput
IP
datetime
2013-05-30 06:00:41 173.199.116.171
2013-05-30 06:05:41 61.245.172.14
2013-05-30 06:10:42 74.86.158.106
IP
datetime
2013-05-30 06:20:42 61.245.172.14
添加回答
举报