2 回答
TA贡献1995条经验 获得超2个赞
我想出了一个正则表达式解决你的问题。我尽量将正则表达式模式保持为“通用”,因为我不知道文本中是否总会有换行符和空格,这意味着该模式选择了很多空格,然后将其删除。
#Import the module for regular expressions
import re
#Text to search. I CORRECTED IT A BIT AS YOUR EXAMPLE SAID second d AND second_c WAS FOLLOWED BY TWO COMMAS. I am assuming those were typos.
text = '''ENGINE = CollapsingMergeTree (
first_param
,(
second_a
,second_b, second_c
,second_d), third, fourth)'''
#Regex search pattern. re.S means . which represents ANY character, includes \n (newlines)
pattern = re.compile('ENGINE = CollapsingMergeTree \((.*?),\((.*?)\),(.*?), (.*?)\)', re.S) #ENGINE = CollapsingMergeTree \((.*?),\((.*?)\), (.*?), (.*?)\)
#Apply the pattern to the text and save the results in variable 'result'. result[0] would return whole text.
#The items you want are sub-expressions which are enclosed in parentheses () and can be accessed by using result[1] and above
result = re.match(pattern, text)
#result[1] will get everything after theparenteses after CollapsingMergeTree until it reaches a , (comma), but with whitespace and newlines. re.sub is used to replace all whitespace, including newlines, with nothing
first = re.sub('\s', '', result[1])
#result[2] will get second a-d, but with whitespace and newlines. re.sub is used to replace all whitespace, including newlines, with nothing
second = re.sub('\s', '', result[2])
third = re.sub('\s', '', result[3])
fourth = re.sub('\s', '', result[4])
print(first)
print(second)
print(third)
print(fourth)
输出:
first_param
second_a,second_b,second_c,second_d
third
fourth
正则表达式解释:\ = 转义控制字符,这是一个正则表达式会解释为特殊含义的字符。更多在这里。
\( = 转义括号
() = 将括号中的表达式标记为子组。见结果[1]等。
. = 匹配任何字符(包括换行符,因为 re.S)
* = 匹配前面表达式的 0 次或多次出现。
? = 匹配前面表达式的 0 或 1 次出现。
笔记: *?组合被称为非贪婪重复,这意味着前面的表达式只匹配一次,而不是一遍又一遍。
我不是专家,但我希望我的解释是正确的。
我希望这有帮助。
TA贡献1818条经验 获得超11个赞
对于任何语言,您可能想要使用适当的解析器(因此查找如何手动滚动解析器以用于简单语言),但是由于您在此处显示的一小部分看起来与 Python 兼容,因此您可以将其解析为如果是 Python 使用ast
模块(来自标准库)然后操作结果。
添加回答
举报