3 回答
![?](http://img1.sycdn.imooc.com/545864000001644402200220-100-100.jpg)
TA贡献1831条经验 获得超4个赞
我认为你需要自己压平它,好在它并不复杂:
s = [[k, i, *j.values()] for k,v in data["reports"].items() for i, j in v.items()]
print (pd.DataFrame(s))
0 1 2 3
0 Google-Pixel 2 XL -MIoCtD9YUF2G9Esfrfz 04 Oct 2020 23:25:17:047 onCreate MainActivity 1601825117067
1 Google-Pixel 2 XL -MIoCtFVOxu8wdEHtm6q 04 Oct 2020 23:25:17:214 onCreate Service 1601825117216
2 Google-Pixel 2 XL -MIoCyBtKMQqQzUHEXsW 04 Oct 2020 23:25:37:682 onStartCommand Service 1601825137685
3 Google-Pixel 2 XL -MIoFWll9r3qwzWNoGMn 04 Oct 2020 23:36:47:687: (1.3212517, 103.860314) 1601825807693
4 Vivo 1820 -MIoF14JUm6JMZrOzDlL 04 Oct 2020 23:34:37:623 onCreate MainActivity 1601825677653
5 Vivo 1820 -MIoF1A9ZZNqTu5W-rQD 04 Oct 2020 23:34:38:016 onCreate Service 1601825678026
6 Vivo 1820 -MIoF2gNDua9FfLBTg6q 04 Oct 2020 23:34:44:235 onCreate MainActivity 1601825684248
分享
![?](http://img1.sycdn.imooc.com/54586453000163bd02200220-100-100.jpg)
TA贡献1852条经验 获得超7个赞
根据 的官方文档,pd.json_normalize()
它假设一个数组(列表)输入。然而,原始的 json 远非字典列表之类的东西,最重要的是,键“id”不存在。因此我认为绝对需要一个手工制作的解析器。
代码:
import pandas as pd
import json
file_path = "/mnt/ramdisk/in.json"
with open(file_path) as f:
dic = json.load(f)
# discard the redundant "report" layer
dic = dic["reports"]
# produce a flattened list of dict
ls = []
for k1, v1 in dic.items():
# k1 = model
for k2, v2 in v1.items():
# k2 = the hash-like id
v2["model"] = k1
v2["id"] = k2
ls.append(v2)
df = pd.json_normalize(ls)
输出
# Trim the message for printing purpose
df2 = df.copy()
df2["message"] = df["message"].apply(lambda s: s[:10])
df2
Out[28]:
message timestamp model id
0 04 Oct 202 1601825117067 Google-Pixel 2 XL -MIoCtD9YUF2G9Esfrfz
1 04 Oct 202 1601825117216 Google-Pixel 2 XL -MIoCtFVOxu8wdEHtm6q
2 04 Oct 202 1601825137685 Google-Pixel 2 XL -MIoCyBtKMQqQzUHEXsW
3 04 Oct 202 1601825807693 Google-Pixel 2 XL -MIoFWll9r3qwzWNoGMn
4 04 Oct 202 1601825677653 Vivo 1820 -MIoF14JUm6JMZrOzDlL
5 04 Oct 202 1601825678026 Vivo 1820 -MIoF1A9ZZNqTu5W-rQD
6 04 Oct 202 1601825684248 Vivo 1820 -MIoF2gNDua9FfLBTg6q
注意:深入到类哈希id所在的层似乎是有必要的。这是因为最初id是keys,但似乎必须重新格式化它们才能values正确解释为值pd.json_normalize。我在互联网上的简单调查也没有找到使用简单的内置方法来解析这种递归结构的示例。
![?](http://img1.sycdn.imooc.com/54584f3100019e9702200220-100-100.jpg)
TA贡献1807条经验 获得超9个赞
尝试一下这个(参见我上面的评论)
import pandas as pd
data = []
for k, v in test['reports'].items():
model_name = k
for model in v.items():
_data = {}
_data['model'] = model_name
_data['id'] = model[0]
_data['message'] = model[1]['message']
_data['timestamp'] = model[1]['timestamp']
data.append(_data)
df = pd.DataFrame(data)
test你的数据在哪里,从而test['reports']访问你想要解析的嵌套信息
添加回答
举报