首页猿问如何检查是否存在并在CSV蟒蛇中提...

如何检查是否存在并在CSV蟒蛇中提取年份和百分比

Python

明月笑刀无情 2022-09-13 09:56:08

我有一个CSV文件，新闻.csv，其中包含许多数据。我想检查该行是否包含任何年份，如果是，则为 1，否则为 0。这也适用于百分比，如果行包含百分比，则返回 1，否则为 0。并且还要提取它们。以下是到目前为止我的代码。我遇到错误（值错误：通过的项目数量错误2，放置意味着1），当我尝试提取百分比news=pd.read_csv("news.csv")news['year']= news['STORY'].str.extract(r'(?!\()\b(\d+){1}')news["howmanyyear"] = news["STORY"].str.count(r'(?!\()\b(\d+){1}')news["existyear"] = news["howmany"] != 0news["existyear"] = news["existyear"].astype(int)news['percentage']= news['STORY'].str.extract(r'(\s100|\s\d{1})(\.\d+)+%')news.to_csv('news.csv')提取年份的代码似乎有效，但是，它也提取普通数字，并且只提取其中一个年份。我的 CSV 文件示例ID STORY 1 There are a total of 2,070 people died in 2001 due to the virus 2 20% of people in the village have diabetes in 2007 3 About 70 percent of them still believe the rumor 4 In 2003 and 2020, the pneumonia pandemic spread in the world以下是我想要的输出：ID STORY existyear year existpercentage percentage1 There are a total of 2,070 people died in 2001 due to the virus 1 2001 0 -2 20% of people in the village have diabetes in 2007 1 2007 1 20%3 About 70 percent of them still believe the rumor 0 - 1 704 In 2003 and 2020, the pneumonia pandemic spread in the world 1 2003,2020 0 -

查看完整描述

1 回答

MYYA

TA贡献1868条经验获得超4个赞

创建示例数据帧：

c = [1,2,3,4]

d = ["There are a total of 2,070 people died in 2001 due to the virus" , "20% of people in the village have diabetes in 2007 ",

"About 70 percent of them still believe the rumor", "In 2003 and 2020, the pneumonia pandemic spread in the world"]

f = ['2001', '2007', '-', '2003,2020']

g = ['-', '20%', '70', '-']

df = pd.DataFrame([c,d,f,g]).T

df.rename(columns = {0:'ID ', 1:'STORY', 2:'year', 3:'percentage'}, inplace = True)

断续器：

ID STORY year percentage

1 There are a total of 2,070 people died in 2001 due to the virus 2001 -

2 20% of people in the village have diabetes in 2007 2007 20%

3 About 70 percent of them still believe the rumor - 70

4 In 2003 and 2020, the pneumonia pandemic spread in the world 2003,2020 -

法典：

def year_exits_or_not(row):

if re.match(r'.*([1-3][0-9]{3})', row):

return 1

else:

return 0

def perc_or_not(row):

if re.match(r'.*\d+', row):

return 1

else:

return 0

df['existyear'] = df.year.apply(year_exits_or_not)

df['existpercentage'] = df.percentage.apply(perc_or_not)

断续器：

ID STORY existyear year existpercentage percentage

1 There are a total of 2,070 people died in 2001 due to the virus 1 2001 0 -

2 20% of people in the village have diabetes in 2007 1 2007 1 20%

3 About 70 percent of them still believe the rumor 0 - 1 70

4 In 2003 and 2020, the pneumonia pandemic spread in the world 1 2003,2020 0 -

编辑：

df.year = df.STORY.apply(lambda row: str(re.findall(r'.*?([1-3][0-9]{3})', row))[1:-1])

df.percentage = df.STORY.apply(lambda row: str(re.findall(r"(\d+)(?:%| percent)", row))[1:-1])

断续器：

ID STORY year percentage

0 1 There are a total of 2,070 people died in 2001... '2001'

1 2 20% of people in the village have diabetes in ... '2007' '20'

2 3 About 70 percent of them still believe the rumor '70'

3 4 In 2003 and 2020, the pneumonia pandemic sprea... '2003', '2020'

反对回复 2022-09-13

1 回答
0 关注
61 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何检查是否存在并在CSV蟒蛇中提取年份和百分比

如何检查是否存在并在CSV蟒蛇中提取年份和百分比

1 回答

添加回答