在python中解析带有强调标签的xml文件

我目前正在编写一个 python 脚本，可以提取 xml 文件中的所有文本。我正在使用元素树库来解释数据，但是我遇到了这个问题，但是当数据的结构如下时......<Segment StartTime="639.752" EndTime="642.270" Participant="fe016"> But I bet it's a good <Pause/> superset of it. </Segment>当我试图读出文本时，我在暂停标记之前得到了段的前半部分（“好吧。所以我们有什么”）。我想弄清楚是否有办法忽略数据段中的标签并打印出所有文本。

查看完整描述

2 回答

守着星空守着你

TA贡献1799条经验获得超8个赞

另一种解决方案。

from simplified_scrapy import SimplifiedDoc,req,utils

html = '''<Segment StartTime="639.752" EndTime="642.270" Participant="fe016">

But I bet it's a good <Pause/> superset of it.

</Segment>'''

doc = SimplifiedDoc(html)

print(doc.Segment)

print(doc.Segment.text)

结果：

{'StartTime': '639.752', 'EndTime': '642.270', 'Participant': 'fe016', 'tag': 'Segment', 'html': "\n But I bet it's a good <Pause /> superset of it.\n"}

But I bet it's a good superset of it.

这里有更多例子。https://github.com/yiyedata/simplified-scrapy-demo/blob/master/doc_examples

反对回复 2022-10-05

萧十郎

TA贡献1815条经验获得超13个赞

xml = '''<Segment StartTime="639.752" EndTime="642.270" Participant="fe016">

But I bet it's a good <Pause/> superset of it.

</Segment>'''

# solution using ETree

from xml.etree import ElementTree as ET

root = ET.fromstring(xml)

pause = root.find('./Pause')

print(root.text + pause.tail)

反对回复 2022-10-05

热搜

最近搜索清空

在python中解析带有强调标签的xml文件

在python中解析带有强调标签的xml文件

2 回答

添加回答