1 回答
TA贡献2016条经验 获得超9个赞
传递d['hits']
给json_normalize
结果:
d = json.loads(json_text)
In [136]: %time pd.json_normalize(d['hits'])
CPU times: user 2.1 ms, sys: 41 µs, total: 2.14 ms
Wall time: 2.12 ms
Out[136]:
uuid text_about objectID search_space is_searchspace nice_to_have must_have some key some_key
0 00000000-0000-0000-0000-000000000000 some_text 00000000-0000-0000-0000-000000000000-text_about NaN NaN NaN NaN NaN NaN
1 00000000-0000-0000-0000-000000000000 NaN 00000000-0000-0000-0000-000000000000-search_space some json object True NaN NaN NaN NaN
2 00000000-0000-0000-0000-000000000000 NaN 00000000-0000-0000-0000-000000000000-nice_to_have NaN NaN [{'operator': 'AND', 'operands': [{'category':... NaN NaN NaN
3 00000000-0000-0000-0000-000000000000 NaN 00000000-0000-0000-0000-000000000000-must_have NaN NaN NaN [{'operator': 'AND', 'operands': [{'category':... NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN some json object NaN
5 10000000-0000-0000-0000-000000000001 some text 10000000-0000-0000-0000-000000000001-text_about NaN NaN NaN NaN NaN NaN
6 10000000-0000-0000-0000-000000000001 NaN 10000000-0000-0000-0000-000000000001-search_space some json object True NaN NaN NaN NaN
7 10000000-0000-0000-0000-000000000001 NaN 10000000-0000-0000-0000-000000000001-nice_to_have NaN NaN [{'operator': 'AND', 'operands': [{'category':... NaN NaN NaN
8 10000000-0000-0000-0000-000000000001 NaN 10000000-0000-0000-0000-000000000001-must_have NaN NaN NaN [{'operator': 'AND', 'operands': [{'category':... NaN NaN
9 NaN NaN NaN NaN NaN NaN NaN NaN some json object
在那里你可以选择nice_to_have:
df = pd.json_normalize(d, record_path=['hits'])
In [263]: %time df['nice_to_have'].dropna().sum()
CPU times: user 705 µs, sys: 11 µs, total: 716 µs
Wall time: 713 µs
Out[263]:
[{'operator': 'AND',
'operands': [{'category': 'Skill',
'values': [{'value': 'MySQL ', 'clusters': []}]}]},
{'operator': 'AND',
'operands': [{'category': 'Skill',
'values': [{'value': 'Frontend Programming Language ',
'clusters': [{'key': 'Programming Language~>Frontend Programming Language',
'name': 'Frontend Programming Language',
'path': ['Programming Language', 'Frontend Programming Language'],
'uuid': 'e8c5cc6c-d92b-4098-8965-41e6818fe337',
'category': 'skill',
'pretty_lineage': ['Programming Language']}]}]}]}]
希望这有用。
编辑:
回应您的评论:此 json 的主要问题是级别不一致,因此无法执行规范化并引发 KeyError。
获得以下解决方法nice_to_have:
f = list(filter(lambda x: 'nice_to_have' in x, d['hits']))
>> pd.json_normalize(f, ['nice_to_have', 'operands', 'values', 'clusters'])
key name path uuid category pretty_lineage
0 Programming Language~>Frontend Programming Lan... Frontend Programming Language [Programming Language, Frontend Programming La... e8c5cc6c-d92b-4098-8965-41e6818fe337 skill [Programming Language]
从那里你可以得到你想要得到的值。可以应用类似的解决方法来获取must_have.
添加回答
举报