编写一个程序来通读 mbox-short.txt 并计算出每条消息按一天中的小时分布。您可以通过查找时间然后使用冒号再次拆分字符串来从“From”行中提取小时。一旦你累积了每小时的计数,打印出计数,按小时排序,如下所示。name = input('Enter file name: ')if len(name)<1: name = 'mbox-short.txt'hand = open(name)counts = dict()for line in hand: if not line.startswith('From '): continue words = line.split(' ') words = words[6] #print(words.split(':')) hour = words.split(':')[0] counts[hour] = counts.get(hour, 0) + 1for k,v in sorted(counts.items()): print(k,v)我必须使用 [6] 来削减电子邮件中的时间。但不应该是5吗?我需要从中提取小时的行如下所示:来自 stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 200
1 回答
胡说叔叔
TA贡献1804条经验 获得超8个赞
是的,你是对的,这个例子中的索引应该是 5。顺便说一下,collections模块中有一个内置对象。你可以像这样重写你的代码:
from collections import Counter
counter = Counter()
name = input('Enter file name: ')
if len(name) < 1:
name = 'mbox-short.txt'
with open(name) as fp:
for line in fp:
if line.startswith('From'):
words = line.split(' ')
time = words[5]
hour = time.split(':')[0]
counter[hour] += 1
for hour, freq in sorted(counter.items(), key=lambda x: int(x[0])):
print(hour, freq)
您还可以通过以下方式访问最常见的项目:
counter.most_common(10) # it'll show you the first 10 most common items
添加回答
举报
0/150
提交
取消