为了账号安全,请及时绑定邮箱和手机立即绑定

Python中通过绝对值获取XML文件中的值

Python中通过绝对值获取XML文件中的值

慕桂英3389331 2023-10-25 10:49:16
我有一个我想要检索的 XML 文件值的绝对路径。绝对路径的格式为“A/B/C”。我怎样才能在Python中做到这一点?
查看完整描述

3 回答

?
GCT1015

TA贡献1827条经验 获得超4个赞

使用ElementTree库(请注意,我的答案使用核心 python 库,而其他答案使用外部库。)


要抓取前三个句子,只需将这些行添加到您的代码中:


section = soup.find('section',class_ = "article_text post") #Finds the section tag with class "article_text post"


txt = section.p.text #Gets the text within the first p tag within the variable section (the section tag)


print(txt)

输出:


Many people will land on this page after learning that their email address has appeared in a data breach I've called "Collection #1". Most of them won't have a tech background or be familiar with the concept of credential stuffing so I'm going to write this post for the masses and link out to more detailed material for those who want to go deeper.

希望这有帮助!


查看完整回答
反对 回复 2023-10-25
?
繁星coding

TA贡献1797条经验 获得超4个赞

另一种方法。


from simplified_scrapy import SimplifiedDoc, utils, req


# Basic

xml = '''<ROOT><A><B><C>The Value</C></B></A></ROOT>'''

doc = SimplifiedDoc(xml)

print (doc.select('A>B>C'))


# Multiple

xml = '''<ROOT><A><B><C>The Value 1</C></B></A><A><B><C>The Value 2</C></B></A></ROOT>'''

doc = SimplifiedDoc(xml)

# print (doc.selects('A').select('B').select('C'))

print (doc.selects('A').select('B>C'))


# Mixed structure

xml = '''<ROOT><A><other>no B</other></A><A><other></other><B>no C</B></A><A><B><C>The Value</C></B></A></ROOT>'''

doc = SimplifiedDoc(xml)

nodes = doc.selects('A').selects('B').select('C')

for node in nodes:

  for c in node:

    if c:

      print (c)

结果:


{'tag': 'C', 'html': 'The Value'}

[{'tag': 'C', 'html': 'The Value 1'}, {'tag': 'C', 'html': 'The Value 2'}]

{'tag': 'C', 'html': 'The Value'}


查看完整回答
反对 回复 2023-10-25
?
慕慕森

TA贡献1856条经验 获得超17个赞

您可以使用lxml,您可以通过安装pip install lxml


from simplified_scrapy import SimplifiedDoc, utils, req


# Basic

xml = '''<ROOT><A><B><C>The Value</C></B></A></ROOT>'''

doc = SimplifiedDoc(xml)

print (doc.select('A>B>C'))


# Multiple

xml = '''<ROOT><A><B><C>The Value 1</C></B></A><A><B><C>The Value 2</C></B></A></ROOT>'''

doc = SimplifiedDoc(xml)

# print (doc.selects('A').select('B').select('C'))

print (doc.selects('A').select('B>C'))


# Mixed structure

xml = '''<ROOT><A><other>no B</other></A><A><other></other><B>no C</B></A><A><B><C>The Value</C></B></A></ROOT>'''

doc = SimplifiedDoc(xml)

nodes = doc.selects('A').selects('B').select('C')

for node in nodes:

  for c in node:

    if c:

      print (c)

结果:


{'tag': 'C', 'html': 'The Value'}

[{'tag': 'C', 'html': 'The Value 1'}, {'tag': 'C', 'html': 'The Value 2'}]

{'tag': 'C', 'html': 'The Value'}


查看完整回答
反对 回复 2023-10-25
  • 3 回答
  • 0 关注
  • 143 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信