使用 Python xml.etree.ElementTree 遍历 XML 树的问题

我有一个结构如下所示的 XML 文件（为了这个问题的目的而简化）。对于每条记录，我想提取文章标题和“ArticleId”元素中包含DOI编号的属性“IdType”的值（有时这个属性可能会丢失），然后将文章标题存储在带有DOI的字典中作为关键。<PubmedArticleSet><PubmedArticle> <MedlineCitation Status="MEDLINE" Owner="NLM"> <Article PubModel="Print-Electronic"> <ArticleTitle>Malathion and dithane induce DNA damage in Vicia faba.</ArticleTitle> </Article> </MedlineCitation> <PubmedData> <ArticleIdList> <ArticleId IdType="pubmed">28950791</ArticleId> <ArticleId IdType="doi">10.1177/0748233717726877</ArticleId> </ArticleIdList> </PubmedData></PubmedArticle>为了实现这一点，我使用了 xml.etree.ElementTree，如下所示：import xml.etree.ElementTree as ETxmldoc = ET.parse('sample.xml')root = xmldoc.getroot()pubs = {}for elem in xmldoc.iter(tag='ArticleTitle'): title = elem.text for subelem in xmldoc.iter(tag='ArticleId'): if subelem.get("IdType") == "doi": doi = subelem.text pubs[doi] = titleif len(pubs) == 0: print "No articles found"else: for pub in pubs.keys(): print pub + ' ' + pubs[pub]但是遍历文档树的循环有问题，因为上面的代码导致：10.1177/0748233717726877 [Influence of Four Kinds of PPCPs on Micronucleus Rate of the Root-Tip Cells of Vicia-faba and Garlic].10.1016/j.crvi.2015.02.001 [Influence of Four Kinds of PPCPs on Micronucleus Rate of the Root-Tip Cells of Vicia-faba and Garlic].也就是说，我得到了正确的 DOI，但只是上一篇文章标题的副本，没有 DOI！正确的输出应该是：10.1177/0748233717726877 Malathion and dithane induce DNA damage in Vicia faba.10.1016/j.crvi.2015.02.001 Impact of dual inoculation with Rhizobium and PGPR on growth and antioxidant status of Vicia faba L. under copper stress.任何人都可以向我提供一些解决这个烦人问题的提示吗？

查看完整描述

热搜

最近搜索清空

使用 Python xml.etree.ElementTree 遍历 XML 树的问题

使用 Python xml.etree.ElementTree 遍历 XML 树的问题

添加回答