2 回答

TA贡献1878条经验 获得超4个赞
下面是一段代码,它使用 XPath 到达最深的“有效”标签,然后从那里getchildren一直tail深入到实际文本。
import lxml
xml=""" <claim id="CLM-00027" num="00027">
<claim-text> <?insert-start id="REI-00005" date="20191203" ?>27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys. <?insert-end id="REI-00005" ?></claim-text>
</claim>"""
root = lxml.etree.fromstring(xml)
e = root.xpath("/claim/claim-text")
res = e[0].getchildren()[0].tail
print(res)
输出:
'27。24.根据权利要求23所述的方法,其中所述非晶态金属选自Zr基合金、Ti基合金、Al基合金、Fe基合金、La基合金、Cu基合金、Mg基合金、Pt基合金,和Pd基合金。

TA贡献1872条经验 获得超3个赞
通过索引访问特定的子节点。
from xml.etree import ElementTree as ET
tree = ET.parse('path_to_your.xml')
root = tree.getroot()
print(root[0].text)
输出:
27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys.
添加回答
举报