Python iterparse 正在跳过值

Python

MM们 2021-09-14 16:15:30

我使用 iterparse 来解析一个大的 xml 文件 (1,8 gb)。我将所有数据写入一个 csv 文件。t 我制作的脚本运行良好，但由于某种原因它随机跳过了几行。这是我的脚本：import xml.etree.cElementTree as ETimport csvxml_data_to_csv =open('Out2.csv','w', newline='', encoding='utf8')Csv_writer=csv.writer(xml_data_to_csv, delimiter=';')file_path = "Products_50_producten.xml"context = ET.iterparse(file_path, events=("start", "end"))EcommerceProductGuid = ""ProductNumber = ""Description = ""ShopSalesPriceInc = ""Barcode = ""AvailabilityStatus = ""Brand = ""# turn it into an iterator#context = iter(context)product_tag = Falsefor event, elem in context: tag = elem.tag if event == 'start' : if tag == "Product" : product_tag = True elif tag == 'EcommerceProductGuid' : EcommerceProductGuid = elem.text elif tag == 'ProductNumber' : ProductNumber = elem.text elif tag == 'Description' : Description = elem.text elif tag == 'SalesPriceInc' : ShopSalesPriceInc = elem.text elif tag == 'Barcode' : Barcode = elem.text elif tag == 'AvailabilityStatus' : AvailabilityStatus = elem.text elif tag == 'Brand' : Brand = elem.text if event == 'end' and tag =='Product' : product_tag = False List_nodes = [] List_nodes.append(EcommerceProductGuid) List_nodes.append(ProductNumber) List_nodes.append(Description) List_nodes.append(ShopSalesPriceInc) List_nodes.append(Barcode) List_nodes.append(AvailabilityStatus) List_nodes.append(Brand) Csv_writer.writerow(List_nodes) print(EcommerceProductGuid) List_nodes.clear() EcommerceProductGuid = "" ProductNumber = "" Description = "" ShopSalesPriceInc = "" Barcode = "" AvailabilityStatus = "" Brand = "" elem.clear()例如，如果我将“产品”复制 300 次，它会将 csv 文件中第 155 行的“EcommerceProductGuid”值留空。如果我复制 Product 400 次，它会在第 155、310 和 368 行留下一个空值。这怎么可能？

查看完整描述

2 回答

狐的传说

TA贡献1804条经验获得超3个赞

对于它的价值以及可能正在搜索的任何人，上述答案也适用于 lxml 库 iterparse() 。我在使用 lxml 时遇到了类似的问题，并认为我会尝试一下，它的工作原理几乎完全相同。

使用 start 事件获取 xml 信息时，随机启动事件将尚未拾取文本项。尝试在结束事件中获取该项目似乎已经用大型 xml 文件解决了我的问题。看起来 Daniel Haley 所做的通过检查文本是否存在增加了另一层保护。

反对回复 2021-09-14

热搜

最近搜索清空

Python iterparse 正在跳过值

Python iterparse 正在跳过值

2 回答

添加回答