为了账号安全,请及时绑定邮箱和手机立即绑定

如何从包含“<?>”的标签中解析文本

如何从包含“<?>”的标签中解析文本

哔哔one 2023-02-12 19:00:31
我的目标是获取文本: 27. The method according to claim 23 wherein...How do I go about retrieving the text inside a tag that contains <?. 我相信他们被谷歌搜索称为 php 短标签。我正在使用 lxml、xpaths,他们似乎只是没有将其注册为标签或节点。我试过 itertext() 但效果不佳。 <claim id="CLM-00027" num="00027">            <claim-text>                <?insert-start id="REI-00005" date="20191203" ?>27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys.                <?insert-end id="REI-00005" ?></claim-text>        </claim>
查看完整描述

2 回答

?
UYOU

TA贡献1878条经验 获得超4个赞

下面是一段代码,它使用 XPath 到达最深的“有效”标签,然后从那里getchildren一直tail深入到实际文本。


import lxml

xml=""" <claim id="CLM-00027" num="00027">

            <claim-text>                <?insert-start id="REI-00005" date="20191203" ?>27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys.                <?insert-end id="REI-00005" ?></claim-text>

        </claim>"""


root = lxml.etree.fromstring(xml)

e = root.xpath("/claim/claim-text")

res = e[0].getchildren()[0].tail

print(res)

输出:


'27。24.根据权利要求23所述的方法,其中所述非晶态金属选自Zr基合金、Ti基合金、Al基合金、Fe基合金、La基合金、Cu基合金、Mg基合金、Pt基合金,和Pd基合金。


查看完整回答
反对 回复 2023-02-12
?
守着一只汪

TA贡献1872条经验 获得超3个赞

通过索引访问特定的子节点。


from xml.etree import ElementTree as ET

tree = ET.parse('path_to_your.xml')


root = tree.getroot()


print(root[0].text)

输出:


        27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys.                



查看完整回答
反对 回复 2023-02-12
  • 2 回答
  • 0 关注
  • 95 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号