3 回答
TA贡献1834条经验 获得超8个赞
这将做到:
import re
with open("data.txt") as fp:
question_list = list()
options_list = list()
for line in fp.readlines():
question = re.match(r'-.*', line)
if question:
question_list.append(question.group(0))
else:
answer = re.match(r'[ABCD]\..*', line)
if answer.group(0)[0]=='A':
options_list.append([answer.group(0)])
else:
options_list[-1].append(answer.group(0))
print(question_list)
print(options_list)
输出:
['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']
[['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy'], ['A. H2o irrigation', 'B. Bicarb. Irrigation', 'C. Surgical debridment']]
另一种选择,如果您不需要嵌套问题列表:
import re
with open("data.txt") as file:
content = file.read()
question_list = re.findall(r'-.*', content)
options_list = re.findall(r'[ABCD]\..*', content)
print(question_list)
print(options_list)
输出:
['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']
['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy', 'A. H2o irrigation', 'B. Bicarb. Irrigation', 'C. Surgical debridment']
TA贡献1827条经验 获得超8个赞
试试这个例子。
import json
text = '''
- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM
A. observe
B. HBV DNA study
C. Interferon
D. take liver biopsy
- Trauma è skin erythema and Partiel skin loss ,ttt: surgery
A. H2o irrigation\
B. Bicarb. Irrigation
C. Surgical debridment
'''
questions = {}
letters = ['A','B','C','D','E',]
text = text.split('\n')
text[:] = [x for x in text if x]
question = ''
for line in text:
if line[0] == '-':
question = line[2:]
questions[question] = {}
elif line[0] in letters:
line = line.split('.',1)
for i in range(len(line)):
line[i] = line[i].strip()
questions[question][line[0]] = line[1]
print(json.dumps(questions,indent=2, ensure_ascii=False))
输出将非常有条理:
{
"26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM": {
"A": "observe",
"B": "HBV DNA study",
"C": "Interferon",
"D": "take liver biopsy"
},
"Trauma è skin erythema and Partiel skin loss ,ttt: surgery": {
"A": "H2o irrigationB. Bicarb. Irrigation",
"C": "Surgical debridment"
}
}
TA贡献1874条经验 获得超12个赞
简单的:
import re
with open("julysmalltext.txt") as file:
content = file.read()
questions = re.findall('-.*?(?=\nA)', content)
options = re.findall('\w\..*?(?=\n)', content)
print(questions)
print(options)
输出:
['- 26 yrs Man Hbsag +ve ,hbeag +ve on routine screening ..what is next ;IM', '- Trauma è skin erythema and Partiel skin loss ,ttt: surgery']
['A. observe', 'B. HBV DNA study', 'C. Interferon', 'D. take liver biopsy', 'A. H2o irrigation\\', 'B. Bicarb. Irrigation']
打破它:
This part:'-'表示字符串必须以'-'
This part:'.*?'表示提取中间的所有内容,但不贪心。
这部分:'(?=\nA)'表示'A'字符串前面必须有一个换行符和一个右边。
添加回答
举报