为了账号安全,请及时绑定邮箱和手机立即绑定

当变量具有相同名称时从多级 XML 中提取数据子集

当变量具有相同名称时从多级 XML 中提取数据子集

扬帆大鱼 2023-02-22 16:43:28
我有大量的 xml 数据,看起来像这样(只显示了一小部分数据):<weatherdata xmlns:xsi="http://www.website.com" xsi:noNamespaceSchemaLocation="www.website.com" created="2020-07-06T14:53:48Z">  <meta>    <model name="xxxxxx" termin="2020-07-06T06:00:00Z" runended="2020-07-06T09:48:31Z" nextrun="2020-07-06T16:00:00Z" from="2020-07-06T15:00:00Z" to="2020-07-08T12:00:00Z"/>    <model name="xxxxxx" termin="2020-07-06T00:00:00Z" runended="2020-07-06T09:48:31Z" nextrun="2020-07-06T18:00:00Z" from="2020-07-08T13:00:00Z" to="2020-07-09T18:00:00Z"/>    <model name="xxxxxx" termin="2020-07-06T00:00:00Z" runended="2020-07-06T09:48:31Z" nextrun="2020-07-06T18:00:00Z" from="2020-07-09T21:00:00Z" to="2020-07-12T00:00:00Z"/>    <model name="xxxxxx" termin="2020-07-06T00:00:00Z" runended="2020-07-06T09:48:31Z" nextrun="2020-07-06T18:00:00Z" from="2020-07-12T06:00:00Z" to="2020-07-16T00:00:00Z"/>  </meta>  <product class="pointData">    <time datatype="forecast" from="2020-07-06T15:00:00Z" to="2020-07-06T15:00:00Z">     <location altitude="10" latitude="123" longitude="123">      <temperature id="TTT" unit="celsius" value="18.8"/>      <windDirection id="dd" deg="296.5" name="NW"/>      <windSpeed id="ff" mps="5.8" beaufort="4" name="Laber bris"/>      <globalRadiation value="524.2" unit="W/m^2"/>      <humidity value="59.0" unit="percent"/>      <pressure id="pr" unit="hPa" value="1022.9"/>      <cloudiness id="NN" percent="22.7"/>      <lowClouds id="LOW" percent="22.7"/>      <mediumClouds id="MEDIUM" percent="0.0"/>      <highClouds id="HIGH" percent="0.0"/>      <dewpointTemperature id="TD" unit="celsius" value="10.6"/>     </location>    </time>    <time datatype="forecast" from="2020-07-06T14:00:00Z" to="2020-07-06T15:00:00Z">     <location altitude="10" latitude="123" longitude="123">      <precipitation unit="mm" value="0.0" minvalue="0.0" maxvalue="0.0" probability="2.0"/>      <symbol id="LightCloud" number="2"/>     </location>
查看完整描述

1 回答

?
蛊毒传说

TA贡献1895条经验 获得超3个赞

考虑分别构建temperature数据框和precipitation数据框,concat然后通过节点merge中的公共值将版本连接在一起。并考虑使用列表/字典理解将所有属性值绑定在一起。timelocation


import xml.etree.ElementTree as et

import pandas as pd


tree = et.parse('Input.xml')     # load in the data

root = tree.getroot()            # get the element tree root


temp_list = []; precip_list = []


for n, x in enumerate(root.iter('time')):

    # GET LIST OF DICTIONARIES OF ALL ATTRIBUTES

    x_list = [{i.tag+'_'+k:v for k,v in i.attrib.items()} for i in x.iter('*')] 


    # COMBINE INTO SINGLE DICTIONARY    

    x_dict = {k:v for d in x_list for k,v in d.items()}


    # BUILD DATA FRAME

    df = pd.DataFrame(x_dict, index=[0])

    

    # SEPARATELY SAVE TO LIST OF DATA FRAMES

    if 'temperature_unit' in df.columns: temp_list.append(df)

    if 'precipitation_unit' in df.columns: precip_list.append(df)

    


# MERGE CONCATENATED SETS BY COMMON VARS

df = pd.merge(pd.concat(temp_list),

              pd.concat(precip_list),

              on=['time_to', 'time_datatype',

                  'location_altitude', 'location_latitude',

                  'location_longitude'],

              suffixes=['_t','_p'])


查看完整回答
反对 回复 2023-02-22
  • 1 回答
  • 0 关注
  • 66 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信