1 回答
TA贡献1868条经验 获得超4个赞
创建示例数据帧:
c = [1,2,3,4]
d = ["There are a total of 2,070 people died in 2001 due to the virus" , "20% of people in the village have diabetes in 2007 ",
"About 70 percent of them still believe the rumor", "In 2003 and 2020, the pneumonia pandemic spread in the world"]
f = ['2001', '2007', '-', '2003,2020']
g = ['-', '20%', '70', '-']
df = pd.DataFrame([c,d,f,g]).T
df.rename(columns = {0:'ID ', 1:'STORY', 2:'year', 3:'percentage'}, inplace = True)
断续器:
ID STORY year percentage
1 There are a total of 2,070 people died in 2001 due to the virus 2001 -
2 20% of people in the village have diabetes in 2007 2007 20%
3 About 70 percent of them still believe the rumor - 70
4 In 2003 and 2020, the pneumonia pandemic spread in the world 2003,2020 -
法典:
def year_exits_or_not(row):
if re.match(r'.*([1-3][0-9]{3})', row):
return 1
else:
return 0
def perc_or_not(row):
if re.match(r'.*\d+', row):
return 1
else:
return 0
df['existyear'] = df.year.apply(year_exits_or_not)
df['existpercentage'] = df.percentage.apply(perc_or_not)
断续器:
ID STORY existyear year existpercentage percentage
1 There are a total of 2,070 people died in 2001 due to the virus 1 2001 0 -
2 20% of people in the village have diabetes in 2007 1 2007 1 20%
3 About 70 percent of them still believe the rumor 0 - 1 70
4 In 2003 and 2020, the pneumonia pandemic spread in the world 1 2003,2020 0 -
编辑:
df.year = df.STORY.apply(lambda row: str(re.findall(r'.*?([1-3][0-9]{3})', row))[1:-1])
df.percentage = df.STORY.apply(lambda row: str(re.findall(r"(\d+)(?:%| percent)", row))[1:-1])
断续器:
ID STORY year percentage
0 1 There are a total of 2,070 people died in 2001... '2001'
1 2 20% of people in the village have diabetes in ... '2007' '20'
2 3 About 70 percent of them still believe the rumor '70'
3 4 In 2003 and 2020, the pneumonia pandemic sprea... '2003', '2020'
添加回答
举报