为了账号安全,请及时绑定邮箱和手机立即绑定

将列表中的项目重新格式化为不同的类型和子列表

将列表中的项目重新格式化为不同的类型和子列表

阿晨1998 2021-08-02 17:11:08
UNCLEANED = [['1 -  32/', 'Highway', '403', '43.167233', '-80.275567', '1965', '2014', '2009', '4', 'Total=64  (1)=12;(2)=19;(3)=21;(4)=12;', '65', '04/13/2012', '72.3', '', '72.3', '', '69.5', '', '70', '', '70.3', '', '70.5', '', '70.7', '72.9', ''],['1 -  43/', 'WEST', '403', '43.164531', '-80.251582', '1963', '2014', '2007', '4', 'Total=60.4  (1)=12.2;(2)=18;(3)=18;(4)=12.2;', '61', '04/13/2012', '71.5', '', '71.5', '', '68.1', '', '69', '', '69.4', '', '69.4', '', '70.3', '73.3', ''],['2 -   4/', 'STOKES', '6', '45.036739', '-81.33579', '1958', '2013', '', '1', 'Total=16  (1)=16;', '18.4', '08/28/2013', '85.1', '85.1', '', '67.8', '', '67.4', '', '69.2', '70', '70.5', '', '75.1', '', '90.1', '']]上面是一个包含三个子列表的列表的未清理版本......我需要将它转换成一个更清晰的版本,可能看起来像这样:CLEANED = [[1, 'Highway', '403', 43.167233,              -80.275567, '1965', '2014', '2009', 4,              [12.0, 19.0, 21.0, 12.0], 65.0, '04/13/2012',              [72.3, 69.5, 70.0, 70.3, 70.5, 70.7, 72.9]],             [2, 'WEST', '403', 43.164531, -80.251582,              '1963', '2014', '2007', 4, [12.2, 18.0, 18.0, 12.2], 61.0,               '04/13/2012', [71.5, 68.1, 69.0, 69.4, 69.4, 70.3,                             73.3]],             [3, 'STOKES', '6', 45.036739, -81.33579, '1958',              '2013', '', 1, [16.0], 18.4, '08/28/2013',              [85.1, 67.8, 67.4, 69.2, 70.0, 70.5, 75.1, 90.1]]            ]我认为该模式用于未清理版本中的 index[0],我只保留第一个字符。index[1],[2]保持相同的,转index[3]和[4]转换成int .....然后到达index[9],我必须忽略总数,只提取其余的数字,然后放入子列表中.....最后一件事是将日期之后的数字放入子列表中,同时排除第一个数字。我对如何不断循环直到它完成“清理” UNCLEANED 中的所有内容感到非常困惑?如果UNCLEANED不只是这三个元素呢?如果它很长,我将如何遍历它?非常感谢您的帮助
查看完整描述

3 回答

?
慕斯王

TA贡献1864条经验 获得超2个赞

这是进行上述转换的解决方案。这是一个简单的for循环:


UNCLEANED = [

['1 -  32/', 'Highway', '403', '43.167233',

 '-80.275567', '1965', '2014', '2009', '4',

 'Total=64  (1)=12;(2)=19;(3)=21;(4)=12;', '65', '04/13/2012', '72.3', '',

 '72.3', '', '69.5', '', '70', '', '70.3', '', '70.5', '', '70.7', '72.9',

 ''],

['1 -  43/', 'WEST', '403', '43.164531', '-80.251582',

 '1963', '2014', '2007', '4',

 'Total=60.4  (1)=12.2;(2)=18;(3)=18;(4)=12.2;', '61', '04/13/2012',

 '71.5', '', '71.5', '', '68.1', '', '69', '', '69.4', '', '69.4', '',

 '70.3', '73.3', ''],

['2 -   4/', 'STOKES', '6', '45.036739', '-81.33579', '1958',

 '2013', '', '1', 'Total=16  (1)=16;', '18.4', '08/28/2013', '85.1',

 '85.1', '', '67.8', '', '67.4', '', '69.2', '70', '70.5', '', '75.1', '',

 '90.1', '']

]


# Function that performs the conversion described above.

def cleanElement(elem):

    elem[0] = elem[0].split(' - ')[0]

    elem[3] = float(elem[3])

    elem[4] = float(elem[4])


    elem[8] = int(elem[8])


    tempList = elem[9].split('  ')[1].split(';')

    tempList = [float(i.split('=')[1]) for i in tempList if not i=='']

    elem[9] = tempList


    elem[10] = float(elem[10])


    elem[13] = [float(i) for i in elem[13:] if not i=='']

    elem.pop(12)


    return elem[:13]


# Function that loops in the uncleaned list and performs the conversion for each element.

def cleanList(uncleaned):

    return [cleanElement(elem) for elem in uncleaned]


cleaned = cleanList(UNCLEANED)


for i in cleaned:

    print(i)

输出:


['1', 'Highway', '403', 43.167233, -80.275567, '1965', '2014', '2009', 4, [12.0, 19.0, 21.0, 12.0], 65.0, '04/13/2012', [72.3, 69.5, 70.0, 70.3, 70.5, 70.7, 72.9]]

['1', 'WEST', '403', 43.164531, -80.251582, '1963', '2014', '2007', 4, [12.2, 18.0, 18.0, 12.2], 61.0, '04/13/2012', [71.5, 68.1, 69.0, 69.4, 69.4, 70.3, 73.3]]

['2', 'STOKES', '6', 45.036739, -81.33579, '1958', '2013', '', 1, [16.0], 18.4, '08/28/2013', [85.1, 67.8, 67.4, 69.2, 70.0, 70.5, 75.1, 90.1]]



查看完整回答
反对 回复 2021-08-03
?
三国纷争

TA贡献1804条经验 获得超7个赞

这是使用函数集合清理列表列表的另一种方法。


棘手的部分是对列表的最后一部分进行切片,其中必须将交替字符串收集到数组中并过滤空字符串。


我假设每个子数组尾部的前 3 项中的非空字符串值是所需的值。arrange处理按返回一致值的顺序放置前 3 个项目。


恕我直言,这种方式的优点是,如果您想对任何特定项目做任何不同的事情,更改代码会更容易。


import itertools as it


def get_first_char_int(item):

    first_char, *_ = item

    return int(first_char)


def identity(item):

    return item


def get_floats(item):

    tokens = ''.join(item.split(' ')[2:]).split('=')[1:]

    return [float(token.split(';')[0]) for token in tokens]


def get_float(item):

    return float(item) if item else item


UNCLEANED = [

    ['1 -  32/', 'Highway', '403', '43.167233',

     '-80.275567', '1965', '2014', '2009', '4',

     'Total=64  (1)=12;(2)=19;(3)=21;(4)=12;', '65', '04/13/2012', '72.3', '',

     '72.3', '', '69.5', '', '70', '', '70.3', '', '70.5', '', '70.7', '72.9',

     ''],

    ['1 -  43/', 'WEST', '403', '43.164531', '-80.251582',

     '1963', '2014', '2007', '4',

     'Total=60.4  (1)=12.2;(2)=18;(3)=18;(4)=12.2;', '61', '04/13/2012',

     '71.5', '', '71.5', '', '68.1', '', '69', '', '69.4', '', '69.4', '',

     '70.3', '73.3', ''],

    ['2 -   4/', 'STOKES', '6', '45.036739', '-81.33579', '1958',

     '2013', '', '1', 'Total=16  (1)=16;', '18.4', '08/28/2013', '85.1',

     '85.1', '', '67.8', '', '67.4', '', '69.2', '70', '70.5', '', '75.1', '',

     '90.1', ''],

]


functions = [ # 1:1 mapping of functions to items in each list in UNCLEANED.

    get_first_char_int,

    identity,

    identity,

    float,

    float,

    identity,

    identity,

    identity,

    int,

    get_floats,

    float,

    identity,

]

end = len(functions)

item_length, = {len(items) for items in UNCLEANED}

# Calculate argument to pass to it.islice

extra_count = item_length - end

# Extend functions by extra_count times with get_float

functions.extend(list(it.repeat(get_float, extra_count)))

#

# Handle items up to start of alternating strings and empty strings.

head_results = (

    [f(item)

     for f, item

     in zip(functions[0:end], collection[0:end])]

    for collection in UNCLEANED

)


def arrange(items):

    """Handle varying order of first 3 items of items."""

    item, *_ = items

    items[0:3] = [item, '', item]

    return items

#

# Apply arrange to the tail of each sublist

collection_ = it.chain.from_iterable(arrange(collection[end:])

                                     for collection in UNCLEANED)

#

# Handle items starting with alternating strings and empty strings.

tail_results = (

    [f(item)

     for f, item

     in it.islice(zip(functions[end:], collection_), 2, item_length)]

    for collection in UNCLEANED

)


results = [[head, [item for item in tail if item]]

            for head, tail in zip(head_results, tail_results)]


for item in results:

    print(item)

输出:


[[1, 'Highway', '403', 43.167233, -80.275567, '1965', '2014', '2009', 4, [12.0, 19.0, 21.0, 12.0], 65.0, '04/13/2012'], [72.3, 69.5, 70.0, 70.3, 70.5, 70.7, 72.9]]

[[1, 'WEST', '403', 43.164531, -80.251582, '1963', '2014', '2007', 4, [12.2, 18.0, 18.0, 12.2], 61.0, '04/13/2012'], [71.5, 68.1, 69.0, 69.4, 69.4, 70.3, 73.3]]

[[2, 'STOKES', '6', 45.036739, -81.33579, '1958', '2013', '', 1, [16.0], 18.4, '08/28/2013'], [85.1, 67.8, 67.4, 69.2, 70.0, 70.5, 75.1, 90.1]]


查看完整回答
反对 回复 2021-08-03
?
呼唤远方

TA贡献1856条经验 获得超11个赞

创建一个 clean_row(row) 函数,然后所有的“清理规则”都应该从这里调用。那你就可以了CLEANED = [clean_row(uncleaned) for uncleaned in UNCLEANED]


查看完整回答
反对 回复 2021-08-03
  • 3 回答
  • 0 关注
  • 114 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号