正则表达式用于短语中的多个换行符

Python

UYOU 2023-08-08 10:34:30

我正在使用 Python 学习正则表达式，我想准备一个 RE 来匹配和收集以下输入中的句子：食品：蛋糕：由面粉、糖和其他成分制成的烘焙甜食品。电子设备：计算机：执行计算机编程操作的机器。计算机主要由CPU、显示器、键盘和鼠标组成。汽车：汽车：汽车是用于运输的四轮机动车辆。我的预期输出应该为我提供类别、项目和该项目的描述。因此，对于第一项“蛋糕”，RE 应将“食品”、“蛋糕”、“由面粉、糖和其他成分制成的烘焙甜食”分组。我当前的 RE 看起来像这样：[0-9]+\s*.\s*(\w*)\s*:\s*(\w*)\s*:\s*(.*)这似乎适用于具有没有换行符的描述的项目。如果它有换行符，即示例中的“计算机”，则 RE 只匹配其到换行符的描述。RE 丢弃该描述中的第二句话。请帮助我理解我在这里错过了什么。

查看完整描述

2 回答

大话西游666

TA贡献1817条经验获得超14个赞

如果类别、项目和描述由双换行符分隔，您可以使用此示例来解析它（regex101）：

import re

txt = '''1. Food : Cake : Baked sweet food made from flour, sugar and other ingredients.

2. Electronics : Computer : A machine to carry out a computer programming operation.

Computers mainly consists of a CPU, monitor, keyboard and a mouse.

3. Automobile : Car : Car is a four wheeled motor vehicle used for transportation.'''

for cat, item, desc in re.findall(r'^(?:\d+)\.([^:]+):([^:]+):(.*?)(?:\n\n|\Z)', txt, flags=re.M|re.S):

print(cat)

print(item)

print(desc)

print('-' * 80)

印刷：

Food

Cake

Baked sweet food made from flour, sugar and other ingredients.

--------------------------------------------------------------------------------

Electronics

Computer

A machine to carry out a computer programming operation.

Computers mainly consists of a CPU, monitor, keyboard and a mouse.

--------------------------------------------------------------------------------

Automobile

Car

Car is a four wheeled motor vehicle used for transportation.

--------------------------------------------------------------------------------

反对回复 2023-08-08

慕少森

TA贡献2019条经验获得超9个赞

这可能是一种基本方法，但它适用于您提供的示例输入：

[0-9]+\s*.\s*(\w*)\s*:\s*(\w*)\s*:\s*((?:.*[\n\r]?)+?)(?=$|\d\s*\.)

基本上，我们在描述中采用尽可能多的文本（包括换行符），直到到达文件末尾或另一个数字索引。

反对回复 2023-08-08

热搜

最近搜索清空

正则表达式用于短语中的多个换行符

正则表达式用于短语中的多个换行符

2 回答

添加回答