1 回答
TA贡献1808条经验 获得超4个赞
我假设该文件包含所有配置文件,例如
{
"profile 1" : {
# Full object as in the example above
},
"profile 2" : {
#Full object as in the example above
}
}
在继续之前,让我展示一个使用 Pandas DataFrames 的正确方法。
更好地使用 Pandas DataFrames 的示例:
Pandas DataFrame 中的值不能是列表。因此,我们将不得不复制行,如下例所示。查看此问题和 JD Long 的答案以获取更多详细信息:如何在 pandas 数据框中使用列表作为值?
ID | Industry | Current employer | Skill
___________________________________________________________________
in-01 | Government | Republican | Twitter
in-01 | Government | Republican | Real Estate
in-01 | Government | Republican | Golf
in-02 | Marketing | Marketers R Us | Branding
in-02 | Marketing | Marketers R Us | Social Media
in-02 | Marketing | Marketers R Us | Advertising
在以下代码的注释中查找解释:
import json
import pandas as pd
# Create a DataFrame df with the columns as in the example
df = pd.DataFrame(data, columns = ['ID', 'Industry','Employer','Skill'])
#Load the file as json.
with open(path to .json file) as file:
#readlines() reads the file as string and loads() loads it into a dict
obj = json.loads(''.join(file.readlines()))
#Then iterate its items() as key value pairs
#But the line of code below depends on my first assumption.
#Depending on the file format, the line below might have to differ.
for prof_key, profile in obj.items():
# Verify if a profile contains all the required keys
if all(key in profile.keys() for key in ("_id","experience", "industry","skills")):
for skill in profile["skills"]:
df.loc[-1] = [profile["_id"],
profile["industry"],
[x for x in profile["experience"] if x["end"] == "Present"][0]["org"],
skill]
上面的行在df.loc[-1] = ...数据框中插入一行作为最后一行(索引-1)。
当您稍后希望使用此信息时,您将不得不使用df.groupby('ID')
让我知道您的文件中是否有不同的格式,以及此说明是否足以让您入门或您需要更多。
添加回答
举报