为了账号安全,请及时绑定邮箱和手机立即绑定

在python中使用正则表达式拆分字符串

在python中使用正则表达式拆分字符串

浮云间 2021-05-30 08:48:58
我有多个字符串,如:a = 'avg yearly income 25,07,708.33 'b = 'current balance 1,25,000.00 in cash\n'c = 'target savings 50,00,000.00 within next five years 1,000,000.00 '我试图将它们拆分为文本字符串和数字字符串的块,示例输出如下:aa = [('avg yearly income', '25,07,708.33')]bb = [('current balance', '1,25,000.00', 'in cash')]cc = [('target savings', '50,00,000.00', 'within next five years', '1,000,000.00')]我正在使用以下代码:import reb = b.replace("\n","")aa = re.findall(r'(.*)\s+(\d+(?:,\d+)*(?:\.\d){1,2})', a)bb = re.findall(r'(.*)\s+(\d+(?:,\d+)*(?:\.\d){1,2})(.*)\s+', b)cc = re.findall(r'(.*)\s+(\d+(?:,\d+)*(?:\.\d){1,2})(.*)\s+(\d+(?:,\d+)*(?:\.\d{1,2})?)', c)我得到以下输出:aa = [('avg yearly income', '25,07,708.3')]bb = [('current balance', '1,25,000.0', '0 in')]cc = [('target savings', '50,00,000.0', '0 within next five years', '1,000,000.00')]正则表达式的模式有什么问题?
查看完整描述

3 回答

?
萧十郎

TA贡献1815条经验 获得超13个赞

代替re.findall,您可以使用re.split以字母和数字为界的空格分割字符串:


import re

d = ['avg yearly income 25,07,708.33 ', 'current balance 1,25,000.00 in cash\n', 'target savings 50,00,000.00 within next five years 1,000,000.00 ']

final_results = [re.split('(?<=[a-zA-Z])\s(?=\d)|(?<=\d)\s(?=[a-zA-Z])', i) for i in d]

new_results = [[i.rstrip() for i in b] for b in final_results]

输出:


[['avg yearly income', '25,07,708.33'], ['current balance', '1,25,000.00', 'in cash'], ['target savings', '50,00,000.00', 'within next five years', '1,000,000.00']]



查看完整回答
反对 回复 2021-06-01
?
繁花不似锦

TA贡献1851条经验 获得超4个赞

您可以re.split与ptrn一起使用r'(?<=\d)\s+(?=\w)|(?<=\w)\s+(?=\d)'


>>> ptrn = r'(?<=\d)\s+(?=\w)|(?<=\w)\s+(?=\d)'

>>> re.split(ptrn, a)

['avg yearly income', '25,07,708.33 ']

>>> re.split(ptrn, b)

['current balance', '1,25,000.00', 'in cash\n']

>>> re.split(ptrn, c)

['target savings', '50,00,000.00', 'within next five years', '1,000,000.00 ']


查看完整回答
反对 回复 2021-06-01
?
杨魅力

TA贡献1811条经验 获得超6个赞

使用re.split(); 这个例子使用你原来的正则表达式,它工作正常:


>>> r = re.compile(r'(\d+(?:,\d+)*(?:\.\d{1,2}))')

>>> r.split('avg yearly income 25,07,708.33 ')

['avg yearly income ', '25,07,708.33', ' ']

>>> r.split('current balance 1,25,000.00 in cash\n')

['current balance ', '1,25,000.00', ' in cash\n']

>>> r.split('target savings 50,00,000.00 within next five years 1,000,000.00 ')

['target savings ', '50,00,000.00', ' within next five years ', '1,000,000.00', ' ']


查看完整回答
反对 回复 2021-06-01
  • 3 回答
  • 0 关注
  • 232 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信