为了账号安全,请及时绑定邮箱和手机立即绑定

Pyton Regular Expressions - 找到所有以连字符开头的句子并将它们放入列表中

Pyton Regular Expressions - 找到所有以连字符开头的句子并将它们放入列表中

米琪卡哇伊 2023-03-01 16:48:59
我有一个文本文件,我想解析并将问题和选项放入问题和选项列表中示例文本:[更新示例文本以包括问题类型和选项中的所有变化类型]- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ; IMA. observeB. HBV DNA study\C. InterferonD. take liver biopsy- Trauma è skin erythema and Partiel skin loss ,ttt: surgeryA. H2o irrigationB. Bicarb. IrrigationC. Surgical debridment\- Old female, obese on diet control ,polydipsia , invest. Hba1c 7.5 ,all (random,Fasting, post prandial ) sugar are mild elevated urine ketone (+) ttt: IMA. Insulin “ ketonuria “\B. pioglitazoneC. ThiazolidinedionesD. fourth i forgot (not Metformin nor sulfonylurea)- Day to day variation of this not suitable for patients under warfarin therapy: IMA. retinolsB. Fresh fruits and vegitablesC. Meet and paultry\D. Old cheese我是 python 的新手,尤其是正则表达式的新手。试图找到将找到以“-”开头的句子以及新行有“A”的正则表达式。, 在 'A.' 之前将其切片 并将问题放入列表中。注意:有些问题有两行长。也是一个正则表达式,用于将每组选项提取到列表中。所以最终结果将是:question list = ['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ; IM','- Old female, obese on diet control ,polydipsia , invest. Hba1c 7.5 ,all (random,Fasting, post prandial ) sugar are mild elevated urine ketone (+) ttt:IM ','etc','and so on']options list = [['A. observe','B. HBV DNA study\','C. Interferon','D. take liver biopsy'],['A. H2o irrigation\','B. Bicarb. Irrigation','C. Surgical debridment',[['A. Something Else','B. Something Else',......,'D.  ']],[etc]]我猜这会有点复杂,但是对正则表达式部分的任何帮助甚至是开始都会很棒。我有一个包含 1000 个这样的问题和选项的文本文件,理想情况下我想提取所有问题和选项。import rewith open("julysmalltext.txt") as file:    content = file.read()    question_list = re.findall(r'', content)    options_list = re.findall(r'', content)
查看完整描述

3 回答

?
MMMHUHU

TA贡献1834条经验 获得超8个赞

这将做到:


import re

 

with open("data.txt") as fp:

    question_list = list()

    options_list = list()

    for line in fp.readlines():

        question = re.match(r'-.*', line)

        if question:

            question_list.append(question.group(0))

        else:

            answer = re.match(r'[ABCD]\..*', line)

            if answer.group(0)[0]=='A':

                options_list.append([answer.group(0)])

            else:

                options_list[-1].append(answer.group(0))


print(question_list)

print(options_list)

输出:


['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

[['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy'], ['A. H2o irrigation', 'B. Bicarb. Irrigation', 'C. Surgical debridment']]

另一种选择,如果您不需要嵌套问题列表:


import re


with open("data.txt") as file:

    content = file.read()

    question_list = re.findall(r'-.*', content)

    options_list = re.findall(r'[ABCD]\..*', content)


print(question_list)

print(options_list)

输出:


['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy', 'A. H2o irrigation', 'B. Bicarb. Irrigation', 'C. Surgical debridment']


查看完整回答
反对 回复 2023-03-01
?
斯蒂芬大帝

TA贡献1827条经验 获得超8个赞

试试这个例子。


import json


text = '''

- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM

A. observe

B. HBV DNA study

C. Interferon

D. take liver biopsy

- Trauma è skin erythema and Partiel skin loss ,ttt: surgery

A. H2o irrigation\

B. Bicarb. Irrigation

C. Surgical debridment

'''


questions = {}

letters = ['A','B','C','D','E',]

text = text.split('\n')

text[:]          = [x for x in text if x]

question = ''

for line in text:

    if line[0] == '-':

        question = line[2:]

        questions[question] = {}

    elif line[0] in letters:

        line = line.split('.',1)

        for i in range(len(line)):

            line[i] = line[i].strip()

        questions[question][line[0]] = line[1]



print(json.dumps(questions,indent=2, ensure_ascii=False))

输出将非常有条理:


{

  "26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM": {

    "A": "observe",

    "B": "HBV DNA study",

    "C": "Interferon",

    "D": "take liver biopsy"

  },

  "Trauma è skin erythema and Partiel skin loss ,ttt: surgery": {

    "A": "H2o irrigationB. Bicarb. Irrigation",

    "C": "Surgical debridment"

  }

}


查看完整回答
反对 回复 2023-03-01
?
HUWWW

TA贡献1874条经验 获得超12个赞

简单的:


import re


with open("julysmalltext.txt") as file:

    content = file.read()


questions = re.findall('-.*?(?=\nA)', content)

options = re.findall('\w\..*?(?=\n)', content)


print(questions)

print(options)

输出:


['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']

['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy', 'A. H2o irrigation\\', 'B. Bicarb. Irrigation']

打破它:


This part:'-'表示字符串必须以'-'

This part:'.*?'表示提取中间的所有内容,但不贪心。

这部分:'(?=\nA)'表示'A'字符串前面必须有一个换行符和一个右边。


查看完整回答
反对 回复 2023-03-01
  • 3 回答
  • 0 关注
  • 104 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信