我最近获取了当地健身房的数据,我正在尝试对数据进行标准化,以便可以创建一个“健身房注册”对象,其中包含注册该会话的所有人员。文本文件如下所示:Sep 30th '20 at 9:00AM Until Sep 30th '20 at 10:00AMJD John Doe AW Alice Wonderland IM Iron ManSep 30th '20 at 8:00AM Until Sep 30th '20 at 9:00AMJD John Doe AW Alice Wonderland IM Iron Man我已经能够使用 pandas 按列 [姓名首字母,姓名] 分隔注册,但我不知道如何检测何时一行对应于时间段而不是注册的人。因此,程序运行后,每一行都应包含 [姓名首字母、姓名、时间段] 列对我来说处理这些数据最简单的方法就是采用这种格式,JD John Doe Sep 30th '20 at 9:00AM Until Sep 30th '20 at 10:00AMAW Alice Wonderland Sep 30th '20 at 9:00AM Until Sep 30th '20 at 10:00AMIM Iron Man Sep 30th '20 at 9:00AM Until Sep 30th '20 at 10:00AMJD John Doe Sep 30th '20 at 8:00AM Until Sep 30th '20 at 9:00AMAW Alice Wonderland Sep 30th '20 at 8:00AM Until Sep 30th '20 at 9:00AMIM Iron Man Sep 30th '20 at 8:00AM Until Sep 30th '20 at 9:00AM我尝试遍历每一行,一旦出现一个时隙行,我就会将该行附加到下一行,直到出现新的时隙。def testSort(): with open("1-weak-gym.txt") as fp: id= [] totalSheet=[] timeSlot = [] lastLine=[] for ln in fp: if ln.startswith("Sep"): ##this is a time slot timeSlot.clear() timeSlot.append(ln[0:]) ##save that time slot as the lastDate variable else: if (timeSlot): totalSheet.append(timeSlot) ##append the time slot totalSheet.append(ln[0:]) ##append the name line else: print('Hello eror') print(totalSheet, file=open("newOuput.txt","a"))
1 回答
慕少森
TA贡献2019条经验 获得超9个赞
您可以尝试这种方法(如果标题行末尾的时间有很强的模式):
import re
def is_time_format(s):
time_re = re.compile(r'\b((1[0-2]|0?[1-9]):([0-5][0-9])([AaPp][Mm]))')
return bool(time_re.match(s))
with open("1-weak-gym.txt") as fp:
new_lines = []
extra_info = ''
for line in fp:
last_bit = line.split(' ')[-1]
if is_time_format(last_bit):
extra_info = line
continue
else:
new_lines.append(line.rstrip() + '\t' + extra_info)
open("newOutput", 'w').writelines(new_lines)
然后您将获得正确格式的文件。
添加回答
举报
0/150
提交
取消