xml文本中的Python正则表达式，查找标签

我正在使用Python搜索研究论文的XML，搜索特定字符串的项目。我已完成此操作，但我需要获取搜索结果的最前面的部分标题，即 TITLE 和 LABEL 标签及其内容。#<..... some XML .....><sec id="aj387295s3"><label>3.</label><title><italic>CHANDRA</italic> OBSERVATIONS</title><p>The 13 candidates were observed with the Advanced CCD Imaging Spectrometer (ACIS; Burke et al. <xref ref-type="bibr" rid="aj387295r8">1997</xref>) on board <italic>Chandra</italic> (Weisskopf et al. <xref ref-type="bibr" rid="aj387295r46">1996</xref>). We chose the S3 chip to image the sources because of its better low-energy sensitivity. The standard TIMED readout with a frame time of 3.2 s was used, and the data were collected in VFAINT mode. In 12 cases, our <italic>Chandra</italic> observations led us to conclude that the RASS detection was not of a candidate INS (see Table <xref ref-type="table" rid="aj387295t1">1</xref>; the <xref ref-type="sec" rid="aj387295app1">Appendix</xref> includes a case-by-case discussion of these sources).</p>#<..... more XML ....>我有一个正则表达式来获取包含“Chandra”的行，但我一直在努力获得“3.CHANDRA OBSERVATIONS”。这可能是非常明显的，但是我对正则表达式没有太多的培训。我对Chandra和其余行的正则表达式是“（。*）（c | C）handra \ b”

查看完整描述

2 回答

12345678_0001

TA贡献1802条经验获得超5个赞

如果您找到了正确的<sec>-tag，您只需要获取<label>and 中的文本<title>。

title = '{} {}'.format(sec.findtext('label'), ''.join(sec.find('title').itertext())

反对回复 2021-06-01

墨色风雨

TA贡献1853条经验获得超6个赞

不建议使用RegEx读取XML值，如注释中所述。如果您无论如何都想使用它们：

<tag>[\s\S]*?<\/tag>

这些标签之间的部分是值。

反对回复 2021-06-01

热搜

最近搜索清空

xml文本中的Python正则表达式，查找标签

xml文本中的Python正则表达式，查找标签

2 回答

添加回答