为了账号安全,请及时绑定邮箱和手机立即绑定

将深度嵌套的 XML 解析为 pandas 数据框

将深度嵌套的 XML 解析为 pandas 数据框

茅侃侃 2022-01-11 16:48:32
我正在尝试获取 XML 文件的特定部分并将其移动到 pandas 数据框中。按照 xml.etree 的一些教程,我仍然坚持获取输出。到目前为止,我已经设法找到了子节点,但我无法访问它们(即无法从中获取实际数据)。所以,这就是我到目前为止所得到的。tree=ET.parse('data.xml')root=tree_edu.getroot()root.tag#find all nodes within xml datatree_edu.findall(".//")#access the nodetree.findall(".//{http://someUrl.nl/schema/enterprise/program}programSummaryText")我想要的是从节点获取数据programDescriptions,特别是从 child获取数据programDescriptionText xml:lang="nl",当然还有一些额外的。但首先要关注这个。一些需要处理的数据:<?xml version="1.0" encoding="UTF-8"?><programs xmlns="http://someUrl.nl/schema/enterprise/program"><program xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://someUrl.nl/schema/enterprise/program http://someUrl.nl/schema/enterprise/program.xsd"><customizableOnRequest>true</customizableOnRequest><editor>webmaster@url</editor><expires>2019-04-21</expires><format>Edu-dex 1.0</format><generator>www.Url.com</generator><includeInCatalog>Catalogs</includeInCatalog><inPublication>true</inPublication><lastEdited>2019-04-12T20:03:09Z</lastEdited><programAdmission>    <applicationOpen>true</applicationOpen>    <applicationType>individual</applicationType>    <maxNumberOfParticipants>12</maxNumberOfParticipants>    <minNumberOfParticipants>8</minNumberOfParticipants>    <paymentDue>up-front</paymentDue>    <requiredLevel>academic bachelor</requiredLevel>    <startDateDetermination>fixed starting date</startDateDetermination></programAdmission><programCurriculum>    <instructionMode>training</instructionMode>    <teacher>        <id>{D83FFC12-0863-44A6-BDBB-ED618627F09D}</id>        <name>SomeName</name>        <summary xml:lang="nl">        Long text of the summary. Not needed.        </summary>    </teacher>    <studyLoad period="hour">26</studyLoad></programCurriculum>
查看完整描述

1 回答

?
皈依舞

TA贡献1851条经验 获得超3个赞

试试下面的代码:(55703748.xml 包含您发布的 xml)


import xml.etree.ElementTree as ET


tree = ET.parse('55703748.xml')

root = tree.getroot()

nodes = root.findall(".//{http://someUrl.nl/schema/enterprise/program}programSummaryText")

for node in nodes:

    print(node.text)

输出


short Program Course Name summary


查看完整回答
反对 回复 2022-01-11
  • 1 回答
  • 0 关注
  • 143 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信