首页猿问 Pyton Regular...

Pyton Regular Expressions - 找到所有以连字符开头的句子并将它们放入列表中

Python

米琪卡哇伊 2023-03-01 16:48:59

我有一个文本文件，我想解析并将问题和选项放入问题和选项列表中示例文本：[更新示例文本以包括问题类型和选项中的所有变化类型]- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ; IMA. observeB. HBV DNA study\C. InterferonD. take liver biopsy- Trauma è skin erythema and Partiel skin loss ,ttt: surgeryA. H2o irrigationB. Bicarb. IrrigationC. Surgical debridment\- Old female, obese on diet control ,polydipsia , invest. Hba1c 7.5 ,all (random,Fasting, post prandial ) sugar are mild elevated urine ketone (+) ttt: IMA. Insulin “ ketonuria “\B. pioglitazoneC. ThiazolidinedionesD. fourth i forgot (not Metformin nor sulfonylurea)- Day to day variation of this not suitable for patients under warfarin therapy: IMA. retinolsB. Fresh fruits and vegitablesC. Meet and paultry\D. Old cheese我是 python 的新手，尤其是正则表达式的新手。试图找到将找到以“-”开头的句子以及新行有“A”的正则表达式。, 在 'A.' 之前将其切片并将问题放入列表中。注意：有些问题有两行长。也是一个正则表达式，用于将每组选项提取到列表中。所以最终结果将是：question list = ['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ; IM','- Old female, obese on diet control ,polydipsia , invest. Hba1c 7.5 ,all (random,Fasting, post prandial ) sugar are mild elevated urine ketone (+) ttt:IM ','etc','and so on']options list = [['A. observe','B. HBV DNA study\','C. Interferon','D. take liver biopsy'],['A. H2o irrigation\','B. Bicarb. Irrigation','C. Surgical debridment',[['A. Something Else','B. Something Else',......,'D. ']],[etc]]我猜这会有点复杂，但是对正则表达式部分的任何帮助甚至是开始都会很棒。我有一个包含 1000 个这样的问题和选项的文本文件，理想情况下我想提取所有问题和选项。import rewith open("julysmalltext.txt") as file: content = file.read() question_list = re.findall(r'', content) options_list = re.findall(r'', content)

查看完整描述

3 回答

MMMHUHU

TA贡献1834条经验获得超8个赞

这将做到：

import re

with open("data.txt") as fp:

question_list = list()

options_list = list()

for line in fp.readlines():

question = re.match(r'-.*', line)

if question:

question_list.append(question.group(0))

else:

answer = re.match(r'[ABCD]\..*', line)

if answer.group(0)[0]=='A':

options_list.append([answer.group(0)])

else:

options_list[-1].append(answer.group(0))

print(question_list)

print(options_list)

输出：

['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

[['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy'], ['A. H2o irrigation', 'B. Bicarb. Irrigation', 'C. Surgical debridment']]

另一种选择，如果您不需要嵌套问题列表：

import re

with open("data.txt") as file:

content = file.read()

question_list = re.findall(r'-.*', content)

options_list = re.findall(r'[ABCD]\..*', content)

print(question_list)

print(options_list)

输出：

['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy', 'A. H2o irrigation', 'B. Bicarb. Irrigation', 'C. Surgical debridment']

反对回复 2023-03-01

斯蒂芬大帝

TA贡献1827条经验获得超8个赞

试试这个例子。

import json

text = '''

- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM

A. observe

B. HBV DNA study

C. Interferon

D. take liver biopsy

- Trauma è skin erythema and Partiel skin loss ,ttt: surgery

A. H2o irrigation\

B. Bicarb. Irrigation

C. Surgical debridment

'''

questions = {}

letters = ['A','B','C','D','E',]

text = text.split('\n')

text[:] = [x for x in text if x]

question = ''

for line in text:

if line[0] == '-':

question = line[2:]

questions[question] = {}

elif line[0] in letters:

line = line.split('.',1)

for i in range(len(line)):

line[i] = line[i].strip()

questions[question][line[0]] = line[1]

print(json.dumps(questions,indent=2, ensure_ascii=False))

输出将非常有条理：

{

"26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM": {

"A": "observe",

"B": "HBV DNA study",

"C": "Interferon",

"D": "take liver biopsy"

"Trauma è skin erythema and Partiel skin loss ,ttt: surgery": {

"A": "H2o irrigationB. Bicarb. Irrigation",

"C": "Surgical debridment"

}

反对回复 2023-03-01

HUWWW

TA贡献1874条经验获得超12个赞

简单的：

import re

with open("julysmalltext.txt") as file:

content = file.read()

questions = re.findall('-.*?(?=\nA)', content)

options = re.findall('\w\..*?(?=\n)', content)

print(questions)

print(options)

输出：

['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy', 'A. H2o irrigation\\', 'B. Bicarb. Irrigation']

打破它：

This part:'-'表示字符串必须以'-'

This part:'.*?'表示提取中间的所有内容，但不贪心。

这部分：'(?=\nA)'表示'A'字符串前面必须有一个换行符和一个右边。

反对回复 2023-03-01

3 回答
0 关注
104 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

Pyton Regular Expressions - 找到所有以连字符开头的句子并将它们放入列表中

Pyton Regular Expressions - 找到所有以连字符开头的句子并将它们放入列表中

3 回答

添加回答