为了账号安全,请及时绑定邮箱和手机立即绑定

解析dataframe中json类型格式的元素

解析dataframe中json类型格式的元素

Smart猫小萌 2023-04-11 14:42:39
我有这个带有大地水准面的数据框,看起来像这样我想要做的是将每个 msaid 的每个大地水准面编号放入列表中。理想情况下,我希望有一个看起来像这样的数据框我希望这是有道理的。任何帮助,将不胜感激。这里有两个例子:159 [{"geoid":"02020000101"},{"geoid":"02020000204"},{"geoid":"02020000300"},{"geoid":"02020000400"},{"geoid":"02020000500"},{"geoid":"02020000600"},{"geoid":"02020000802"},{"geoid":"02020000901"},{"geoid":"02020000902"},{"geoid":"02020001000"},{"geoid":"02020001500"},{"geoid":"02020001601"},{"geoid":"02020001602"},{"geoid":"02020001701"},{"geoid":"02020001802"},{"geoid":"02020001900"},{"geoid":"02020002000"},{"geoid":"02020002100"},{"geoid":"02020002201"},{"geoid":"02020002400"},{"geoid":"02020002501"},{"geoid":"02020002502"},{"geoid":"02020002601"},{"geoid":"02020002712"},{"geoid":"02020002811"},{"geoid":"02020002812"},{"geoid":"02020002813"},{"geoid":"02122000100"},{"geoid":"02122000300"},{"geoid":"02170001300"},{"geoid":"02170000300"},{"geoid":"02170001100"},{"geoid":"02170000800"},{"geoid":"02261000300"},{"geoid":"02290000400"},{"geoid":"02240000400"},{"geoid":"02170000102"},{"geoid":"02170000402"},{"geoid":"02170000101"},{"geoid":"02170001201"},{"geoid":"02170001001"},{"geoid":"02170000706"},{"geoid":"02170001202"},{"geoid":"02170001004"},{"geoid":"02170000705"},{"geoid":"02170000603"},{"geoid":"02020000102"},{"geoid":"02020000201"},{"geoid":"02020000202"},{"geoid":"02020000203"},{"geoid":"02020000701"},{"geoid":"02020000702"},{"geoid":"02020000703"},{"geoid":"02020000801"},{"geoid":"02020001100"},{"geoid":"02020001200"},
查看完整描述

2 回答

?
翻过高山走不出你

TA贡献1875条经验 获得超3个赞

我下载了该文件并将其作为 csv 文件保存在我的计算机中。然后我运行了以下代码。


import pandas as pd

df = pd.read_csv('parse_this.csv')

#remove characters and convert to list

df.tracts = df.tracts.apply(lambda x: x.strip('][').split(','))

#explode tracts series

df = df.explode('tracts')

#resetting index and renaming columns

df.reset_index(drop = True, inplace = True)

df.rename(columns={"tracts": "geoid"} , inplace = True)

#removing extra characters to keep only the geoid number

df.geoid = df.geoid.apply(lambda x: x.strip('geoid{}:""'))

df


查看完整回答
反对 回复 2023-04-11
?
江户川乱折腾

TA贡献1851条经验 获得超5个赞

我希望这个例子有帮助:


#creating a dataframe for example:


d = [{'A':3,'B':[{'id':'001'},{'id':'002'}]},

    {'A':4,'B':[{'id':'003'},{'id':'004'}]},

    {'A':5,'B':[{'id':'005'},{'id':'006'}]},

    {'A':6,'B':[{'id':'007'},{'id':'008'}]}]

df = pd.DataFrame(d)

df

    A   B

0   3   [{'id': '001'}, {'id': '002'}]

1   4   [{'id': '003'}, {'id': '004'}]

2   5   [{'id': '005'}, {'id': '006'}]

3   6   [{'id': '007'}, {'id': '008'}]


#apply an explode to the column B and reset index


df = df.explode('B')

df.reset_index(drop = True, inplace = True)

df


# now it looks like this

    A    B

0   3   {'id': '001'}

1   3   {'id': '002'}

2   4   {'id': '003'}

3   4   {'id': '004'}

4   5   {'id': '005'}

5   5   {'id': '006'}

6   6   {'id': '007'}

7   6   {'id': '008'}


# now we need to remove the extra text and rename the column from B to id

df.B = df.B.apply(lambda x: x['id'])

df.rename(columns={"B": "id"} , inplace = True)


# this is the final product:

df

    A   id

0   3   001

1   3   002

2   4   003

3   4   004

4   5   005

5   5   006

6   6   007

7   6   008


查看完整回答
反对 回复 2023-04-11
  • 2 回答
  • 0 关注
  • 132 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信