首页猿问如何从文本中解析参数？

如何从文本中解析参数？

Python

守着星空守着你 2021-12-26 10:11:07

我有一个看起来像这样的文字：ENGINE = CollapsingMergeTree ( first_param ,( second_a ,second_b, second_c, ,second d), third, fourth)引擎可以不同（而不是CollapsingMergeTree，可以有不同的词，ReplacingMergeTree、SummingMergeTree...）但文本的格式总是ENGINE = word()。“=”号周围可以有空格，但不是强制性的。括号内是几个参数，通常是一个单词和逗号，但有些参数在括号中，如上例中的 second。换行符可以在任何地方。行可以以逗号、括号或其他任何形式结束。我需要提取n个参数（我不知道提前多少）。在上面的例子中，有 4 个参数：第一个 = first_paramsecond = (second_a, second_b, second_c, second_d) [带括号的提取]第三 = 第三第四 = 第四如何使用python（正则表达式或其他任何东西）做到这一点？

查看完整描述

2 回答

拉风的咖菲猫

TA贡献1995条经验获得超2个赞

我想出了一个正则表达式解决你的问题。我尽量将正则表达式模式保持为“通用”，因为我不知道文本中是否总会有换行符和空格，这意味着该模式选择了很多空格，然后将其删除。

#Import the module for regular expressions

import re

#Text to search. I CORRECTED IT A BIT AS YOUR EXAMPLE SAID second d AND second_c WAS FOLLOWED BY TWO COMMAS. I am assuming those were typos.

text = '''ENGINE = CollapsingMergeTree (

first_param

second_a

,second_b, second_c

,second_d), third, fourth)'''

#Regex search pattern. re.S means . which represents ANY character, includes \n (newlines)

pattern = re.compile('ENGINE = CollapsingMergeTree \((.*?),\((.*?)\),(.*?), (.*?)\)', re.S) #ENGINE = CollapsingMergeTree \((.*?),\((.*?)\), (.*?), (.*?)\)

#Apply the pattern to the text and save the results in variable 'result'. result[0] would return whole text.

#The items you want are sub-expressions which are enclosed in parentheses () and can be accessed by using result[1] and above

result = re.match(pattern, text)

#result[1] will get everything after theparenteses after CollapsingMergeTree until it reaches a , (comma), but with whitespace and newlines. re.sub is used to replace all whitespace, including newlines, with nothing

first = re.sub('\s', '', result[1])

#result[2] will get second a-d, but with whitespace and newlines. re.sub is used to replace all whitespace, including newlines, with nothing

second = re.sub('\s', '', result[2])

third = re.sub('\s', '', result[3])

fourth = re.sub('\s', '', result[4])

print(first)

print(second)

print(third)

print(fourth)

输出：

first_param

second_a,second_b,second_c,second_d

third

fourth

正则表达式解释：\ = 转义控制字符，这是一个正则表达式会解释为特殊含义的字符。更多在这里。

\( = 转义括号

() = 将括号中的表达式标记为子组。见结果[1]等。

. = 匹配任何字符（包括换行符，因为 re.S）

* = 匹配前面表达式的 0 次或多次出现。

? = 匹配前面表达式的 0 或 1 次出现。

笔记： *？组合被称为非贪婪重复，这意味着前面的表达式只匹配一次，而不是一遍又一遍。

我不是专家，但我希望我的解释是正确的。

我希望这有帮助。

反对回复 2021-12-26

慕尼黑8549860

TA贡献1818条经验获得超11个赞

对于任何语言，您可能想要使用适当的解析器（因此查找如何手动滚动解析器以用于简单语言），但是由于您在此处显示的一小部分看起来与 Python 兼容，因此您可以将其解析为如果是 Python 使用ast模块（来自标准库）然后操作结果。

反对回复 2021-12-26

2 回答
0 关注
179 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何从文本中解析参数？

如何从文本中解析参数？

2 回答

添加回答