2 回答
TA贡献2041条经验 获得超4个赞
这是一个可能的解决方案。但是,您必须事先找出所有可能的键值。我想,它可以通过编程方式完成,但我在这里对它们进行了硬编码。此外,如果有多个有价值的项目,它将采用第一个。
import pandas as pd
import json
# original dataframe
df = pd.DataFrame({'x':['''[{"key":"Gender","value":["Men"]},
{"key":"Shoe Size","value":["M"]},
{"key":"Shoe Category","value":["Men's Shoes"]},
{"key":"Color","value":["Multicolor"]},
{"key":"Manufacturer Part Number","value":["8190-W-NAVY-7.5"]},
{"key":"Brand","value":["Josmo"]}]''',
'''[{"key":"Gender","value":["Women"]},
{"key":"Shoe Size","value":["M"]},
{"key":"Shoe Category","value":["Women's Shoes"]},
{"key":"Color","value":["Multicolor"]},
{"key":"Manufacturer Part Number","value":["8190-W-NAVY-7.5"]}]'''],
'y':['A','B']})
expanded_columns = ['Gender', 'Shoe Size', 'Shoe Category', 'Color',
'Manufacturer Part Number', 'Brand']
# function to create list of values from json text
def json_to_cols(s):
l = json.loads(s)
d = {i:None for i in expanded_columns}
for row in l:
d[row['key']] = row['value'][0]
return list(d.values())
# Create new dataframe with expanded columns
df1 = df.apply(lambda row: pd.Series(json_to_cols(row['x']), index=expanded_columns),
axis=1)
new_df = df.join(df1)
print(new_df)
TA贡献1810条经验 获得超5个赞
尚不完全清楚您想要什么,但以下代码将生成一个数据框,其中列名取自y,索引取自 键x,每列的值取自 中的值x,NaN对于任何没有出现的钥匙。
output_df = pd.DataFrame(
{input_row[1]['y']:
{
pair['key']: pair['value'][0]
for pair in ast.literal_eval(input_row[1]['x'])
}
for input_row in df.iterrows()
}
)
输出:
A B
Brand Josmo NaN
Color Multicolor NaN
Gender Men Women
Heel Height NaN 1 Inches
Manufacturer Part Number 8190-W-NAVY-7.5 NaN
Shoe Category Men's Shoes NaN
Shoe Size M NaN
Size NaN XL
添加回答
举报