我从网上人类患病icd-11分类下载了一个json文件,该数据最多有8层嵌套,例如: "name":"br08403", "children":[ { "name":"01 Certain infectious or parasitic diseases", "children":[ { "name":"Gastroenteritis or colitis of infectious origin", "children":[ { "name":"Bacterial intestinal infections", "children":[ { "name":"1A00 Cholera", "children":[ { "name":"H00110 Cholera" }我尝试使用以下代码:def flatten_json(nested_json): """ Flatten json object with nested keys into a single level. Args: nested_json: A nested json object. Returns: The flattened json object if successful, None otherwise. """ out = {} def flatten(x, name=''): if type(x) is dict: for a in x: flatten(x[a], name + a + '_') elif type(x) is list: i = 0 for a in x: flatten(a, name + str(i) + '_') i += 1 else: out[name[:-1]] = x flatten(nested_json) return outdf2 = pd.Series(flatten_json(dictionary)).to_frame()我得到的输出是:name br08403children_0_name 01 Certain infectious or parasitic diseaseschildren_0_children_0_name Gastroenteritis or colitis of infectious originchildren_0_children_0_children_0_name Bacterial intestinal infectionschildren_0_children_0_children_0_children_0_name 1A00 Cholera... ...children_21_children_17_children_10_name NF0A Certain early complications of trauma, n...children_21_children_17_children_11_name NF0Y Other specified effects of external causeschildren_21_children_17_children_12_name NF0Z Unspecified effects of external causeschildren_21_children_18_name NF2Y Other specified injury, poisoning or cer...children_21_children_19_name NF2Z Unspecified injury, poisoning or certain..但所需的输出是一个具有 8 列的数据框,它可以容纳嵌套名称键的最后深度,例如:
1 回答
肥皂起泡泡
TA贡献1829条经验 获得超6个赞
一种简单的pandas迭代方法。
res = requests.get("https://www.genome.jp/kegg-bin/download_htext?htext=br08403.keg&format=json&filedir=")
js = res.json()
df = pd.json_normalize(js)
for i in range(20):
df = pd.json_normalize(df.explode("children").to_dict(orient="records"))
if "children" in df.columns: df.drop(columns="children", inplace=True)
df = df.rename(columns={"children.name":f"level{i}","children.children":"children"})
if df[f"level{i}"].isna().all() or "children" not in df.columns: break
添加回答
举报
0/150
提交
取消