为了账号安全,请及时绑定邮箱和手机立即绑定

在 python 的段落中使用多条件正则表达式提取数字

在 python 的段落中使用多条件正则表达式提取数字

慕后森 2023-03-08 10:16:19
我在 .txt 文件中有这段文字:crt - 00:00:00 up 200 days, 23:35, 0 users, load average: 0.04, 0.05, 0.02Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombieCpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%stMem: 123456K total, 1234567k used, 989991k free, 11156793k buffersSwap: 456K total, 30897564k used, 785431k free, 23445897k cachedPID User Pr NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND24 455  36  63  700 800 900 456 87 35 462 root 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 threadcrt - 00:00:04 up 200 days, 23:39, 0 users, load average: 0.04, 0.05, 0.02Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombieCpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%stMem: 123456K total, 1234567k used, 989991k free, 11156793k buffersSwap: 456K total, 30897564k used, 785431k free, 23445897k cached我想要所有段落中的所有数字值,crt并且不包括和cached之间的值。直到现在我正在使用这个:PIDthreadregex.findall(r'(?<!\d)(?<=\bcrt\b.*?)(?:\d{2}:\d{2}(?::\d{2})?|\d*\.?\d+)(?!\d)(?=.*\bcached\b)', text, regex.S)但这给出了所有数字,包括PID和之间thread。有任何想法吗?
查看完整描述

2 回答

?
长风秋雁

TA贡献1757条经验 获得超7个赞

由于您已经在使用该regex模块(支持变量后视),因此您也可以轻松使用\Gand :\K

(?:^crt|\G(?!\A))(?:(?!^$)\D)*\K[.:\d]+

请参阅regex101.com 上的演示


查看完整回答
反对 回复 2023-03-08
?
眼眸繁星

TA贡献1873条经验 获得超9个赞

分解来看,这假设了几件事:

(?:

    ^crt        # start a line with crt

    |           # or 

    \G(?!\A)    # start after thre previous match (unless it is the very start of the string)

)

(?:(?!^$)\D)*\K # match any non-digit character, but stop at empty lines

[.:\d]+         # character class with ., : and digits

在Python代码中可以是:


import regex as re


junk = """

crt - 00:00:00 up 200 days, 23:35, 0 users, load average: 0.04, 0.05, 0.02

Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombie

Cpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%st

Mem: 123456K total, 1234567k used, 989991k free, 11156793k buffers

Swap: 456K total, 30897564k used, 785431k free, 23445897k cached


PID User Pr NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

24 455  36  63  700 800 900 456 87 35 46

2 root 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 thread


crt - 00:00:04 up 200 days, 23:39, 0 users, load average: 0.04, 0.05, 0.02

Tasks: 300 total, 2 running, 298 sleeping, 0 stopped, 0 zombie

Cpu(s): 12.0%us, 2.5%sy, 0.0%ni, 89.2%id, 0.0%hi, 0.1%si, 0.0%st

Mem: 123456K total, 1234567k used, 989991k free, 11156793k buffers

Swap: 456K total, 30897564k used, 785431k free, 23445897k cached

"""


rx = re.compile(r'(?:^crt|\G(?!\A))(?:(?!^$)\D)*\K[.:\d]+', re.M)


for match in rx.finditer(junk):

    print(match.group(0))

产量(缩写):


00:00:00

200

23:35

...


查看完整回答
反对 回复 2023-03-08
  • 2 回答
  • 0 关注
  • 81 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信