首页猿问从字符串中提取信息并转换为列表

从字符串中提取信息并转换为列表

Python

临摹微笑 2023-09-05 21:10:46

我有一个如下所示的字符串：[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)我想提取“X”的值和关联的文本并将其转换为列表。请参阅下面的预期输出：预期输出：['X=250.44','DECEMBER 31,']['X=307.5','respectively. The net decrease in the revenue']['X=49.5','(US$ in millions)']我们如何在 Python 中解决这个问题？我的方法：mylist = []for line in data.split("\n"): if line.strip(): x_coord = re.findall('^(X=.*)\,$', line) text = re.findall('^(]\w +)', line) mylist.append([x_coord, text])我的方法没有发现x_coord和的任何价值text。

查看完整描述

3 回答

郎朗坤

TA贡献1921条经验获得超9个赞

re解决方案：

import re

input = [

"[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,",

"[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue",

"[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)",

]

def extract(s):

match = re.search("(X=\d+(?:\.\d*)?).*?\](.*?)$",s)

return match.groups()

output = [extract(item) for item in input]

print(output)

输出：

[

('X=250.44', 'DECEMBER 31,'),

('X=307.5', 'respectively. The net decrease in the revenue'),

('X=49.5', '(US$ in millions)'),

]

解释：

\d... 数字
\d+...一位或多位数字
(?:...)...非捕获（“正常”）括号
\.\d*... 点后跟零个或多个数字
(?:\.\d*)?...可选（零或一）“小数部分”
(X=\d+(?:\.\d*)?)...第一组，X=number
.*?...零个或多个任何字符（非贪婪）
\]...]符号
$... 字符串结尾
\](.*?)$...第二组，]字符串之间和结尾之间的任何内容

反对回复 2023-09-05

斯蒂芬大帝

TA贡献1827条经验获得超8个赞

尝试这个：

(X=[^,]*)(?:.*])(.*)

import re

source = """[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,

[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue

[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)""".split('\n')

pattern = r"(X=[^,]*)(?:.*])(.*)"

for line in source:

print(re.search(pattern, line).groups())

输出：

('X=250.44', 'DECEMBER 31,')

('X=307.5', 'respectively. The net decrease in the revenue')

('X=49.5', '(US$ in millions)')

您X=在所有捕获前面，所以我只做了一个捕获组，如果重要的话，请随意添加非捕获组。

反对回复 2023-09-05

MYYA

TA贡献1868条经验获得超4个赞

使用带有命名组的正则表达式来捕获相关位：

>>> line = "[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,"

>>> m = re.search(r'(?:\(X=)(?P<x_coord>.*?)(?:,.*])(?P<text>.*)$', line)

>>> m.groups()

('250.44', 'DECEMBER 31,')

>>> m['x_coord']

'250.44'

>>> m['text']

'DECEMBER 31,'

反对回复 2023-09-05

3 回答
0 关注
149 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

从字符串中提取信息并转换为列表

从字符串中提取信息并转换为列表

3 回答

添加回答