首页猿问如何将json文件中的特定键插入P...

如何将json文件中的特定键插入Python中的数据框中

Python

手掌心 2022-06-22 15:38:02

如果这非常简单或已经被问到，我是 Python 新手并且使用 json 文件，所以我很困惑。我有一个从网站上抓取的 9 GB json 文件。这些数据包含大约 300 万人的信息。每个人都有属性，但并不是所有的人都具有相同的属性。一个属性对应于 json 文件中的一个键，如下所示：{ "_id": "in-00000001", "name": { "family_name": "Trump", "given_name": "Donald" }, "locality": "United States", "skills": [ "Twitter", "Real Estate", "Golf" ], "industry": "Government", "experience": [ { "org": "Republican", "end": "Present", "start": "January 2017", "title": "President of the United States" }, { "org": "The Apprentice", "end": "2015", "start": "2003", "title": "The guy that fires people" }]}所以在这里，、、、_id和name是locality属性（键）。另一个配置文件可能具有其他属性，例如、、，或者缺少在另一个配置文件中找到的某些属性，例如属性等。skillsindustryexperienceeducationawardsinterestsskills我想做的是扫描 json 文件中的每个配置文件，如果配置文件包含属性和skills，我想提取该信息并将其插入数据框中（我想我需要 Pandas ？）。从中，我想具体提取他们当前雇主的姓名，即最近在. 数据框如下所示：industryexperienceexperienceorg Industry | Current employer | Skills ___________________________________________________________________ Government | Republican | Twitter, Real Estate, Golf Marketing | Marketers R Us | Branding, Social Media, Advertising...对于具有这三个属性的所有配置文件，依此类推。我正在努力寻找一个很好的资源来解释如何做这种事情，因此我的问题是。我想粗略的伪代码是：for each profile in open(path to .json file): if profile has keys "experience", "industry" AND "skills": on the same row of the data frame: insert current employer into "current employer" column of data frame insert industry into "industry" column of data frame insert list of skills into "skills" column of data frame我只需要知道如何用 Python 编写它。

查看完整描述

1 回答

炎炎设计

TA贡献1808条经验获得超4个赞

我假设该文件包含所有配置文件，例如

{

"profile 1" : {

# Full object as in the example above

"profile 2" : {

#Full object as in the example above

}

在继续之前，让我展示一个使用 Pandas DataFrames 的正确方法。

更好地使用 Pandas DataFrames 的示例：

Pandas DataFrame 中的值不能是列表。因此，我们将不得不复制行，如下例所示。查看此问题和 JD Long 的答案以获取更多详细信息：如何在 pandas 数据框中使用列表作为值？

ID | Industry | Current employer | Skill

___________________________________________________________________

in-01 | Government | Republican | Twitter

in-01 | Government | Republican | Real Estate

in-01 | Government | Republican | Golf

in-02 | Marketing | Marketers R Us | Branding

in-02 | Marketing | Marketers R Us | Social Media

in-02 | Marketing | Marketers R Us | Advertising

在以下代码的注释中查找解释：

import json

import pandas as pd

# Create a DataFrame df with the columns as in the example

df = pd.DataFrame(data, columns = ['ID', 'Industry','Employer','Skill'])

#Load the file as json.

with open(path to .json file) as file:

#readlines() reads the file as string and loads() loads it into a dict

obj = json.loads(''.join(file.readlines()))

#Then iterate its items() as key value pairs

#But the line of code below depends on my first assumption.

#Depending on the file format, the line below might have to differ.

for prof_key, profile in obj.items():

# Verify if a profile contains all the required keys

if all(key in profile.keys() for key in ("_id","experience", "industry","skills")):

for skill in profile["skills"]:

df.loc[-1] = [profile["_id"],

profile["industry"],

[x for x in profile["experience"] if x["end"] == "Present"][0]["org"],

skill]

上面的行在df.loc[-1] = ...数据框中插入一行作为最后一行（索引-1）。

当您稍后希望使用此信息时，您将不得不使用df.groupby('ID')

让我知道您的文件中是否有不同的格式，以及此说明是否足以让您入门或您需要更多。

反对回复 2022-06-22

1 回答
0 关注
102 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何将json文件中的特定键插入Python中的数据框中

如何将json文件中的特定键插入Python中的数据框中

1 回答

添加回答