2 回答
TA贡献1830条经验 获得超9个赞
您可以使用递归itertools.groupby:
s = """
category1 : 0120391123123
- subcategory : 0120391123123
-- subsubcategory : 019301948109
--- subsubsubcategory : 013904123908
---- subsubsubsubcategory : 019341823908
- subcategory2 : 0934810923801
-- subsubcategory2 : 09341829308123
category2: 1309183912309
- subcategory : 10293182094
"""
import re, itertools
data = list(filter(None, s.split('\n')))
def group_data(d):
if len(d) == 1:
return [dict([re.split('\s*:\s*', d[0])])]
grouped = [[a, list(b)] for a, b in itertools.groupby(d, key=lambda x:not x.startswith('-'))]
_group = [[grouped[i][-1], grouped[i+1][-1]] for i in range(0, len(grouped), 2)]
_c = [[dict([re.split('\s*:\s*', i) for i in a]), group_data([c[1:] for c in b])] for a, b in _group]
return [i for b in _c for i in b]
print(json.dumps(group_data(data), indent=4))
输出:
[
{
"category1": "0120391123123"
},
[
{
" subcategory": "0120391123123"
},
[
{
" subsubcategory": "019301948109"
},
[
{
" subsubsubcategory": "013904123908"
},
[
{
" subsubsubsubcategory": "019341823908"
}
]
]
],
{
" subcategory2": "0934810923801"
},
[
{
" subsubcategory2": "09341829308123"
}
]
],
{
"category2": "1309183912309"
},
[
{
" subcategory": "10293182094"
}
]
]
注意:此答案假定您的最终输出应"category2"与 处于同一级别"category1",因为两者"-"的前面都不包含 a 。
添加回答
举报