我必须提取以“Year”开头并以“\n”结尾的字符串,但对于出现在 Pandas 数据框中的单元格中的每一行。另外,我想删除单元格末尾的 \n 。这是数据框:df Column1 not_important1\nnot_important2\nE012-855 Year-1972\nE012-856 Year-1983\nnot_important3\nE012-857 Year-1977\nnot_important4\nnot_important5\nE012-858 Year-2012\n not_important6\nnot_important7\nE013-200 Year-1982\nE013-201 Year-1984\nnot_important8\nE013-202 Year-1987\n not_important9\nnot_important10\nE014-652 Year-1988\nE014-653 Year-1980\nnot_important11\nE014-654 Year-1989\n这就是我想要得到的:df Column1 Year-1972\nYear-1983\nYear-1977\nYear-2012 Year-1982\nYear-1984\nYear-1987 Year-1988\nYear-1980\nYear-1989这该怎么做?
1 回答
明月笑刀无情
TA贡献1828条经验 获得超4个赞
您可以使用findall和这个正则表达式r'Year.*?\\n'来捕获子字符串。然后从找到的元素列表中创建一个字符串 with ''.join,然后删除最后一个\nwith [:-2]:
import re
df['Column1'] = df['Column1'].apply(lambda x: ''.join(re.findall('Year.*?\\n', x))[:-2])
或者,如果在年份的 4 位数字之后总是\n,您可以这样做:
df['Column1'] = df['Column1'].apply(lambda x: '\n'.join(re.findall('Year-\d\d\d\d', x)))
添加回答
举报
0/150
提交
取消