2 回答
TA贡献1725条经验 获得超7个赞
我在这里看到了问题,当您加入['year','Emirate']
. 例如,2001 Abu Dhabi 与 2001 Abu Dhabi 在两个数据框中都加入了“已执行和未执行”。基本上这是 mxn 关系连接数据集。除非您指定一个可以唯一标识每一行的主键,否则您最终会得到相同的结果。
TA贡献1804条经验 获得超2个赞
我假设您的数据还不正确,因为您的预期输出是可能的,但现在不符合您的逻辑。
您缺少三分之一key column,second_df其中是capacity。如果我们添加此列并执行 a left merge,我们可以实现您的预期输出。
顺便说一句,我们不需要将列设置为索引,因此解决方案如下所示。
# Clean up and create correct dataframes
first_df=pd.DataFrame([['2001','Abu Dhabi','100-','462'],
['2001','Abu Dhabi','100','44'],
['2001','Abu Dhabi','200','657'],
['2001','Dubai','100-','40'],
['2001','Dubai','100','30'],
['2001','Dubai','200','51'],
['2002','Abu Dhabi','100-','300'],
['2002','Abu Dhabi','100','220'],
['2002','Abu Dhabi','200','56'],
['2002','Dubai','100-','55'],
['2002','Dubai','100','67'],
['2002','Dubai','200','89']],columns=['Year','Emirate','Capacity','Number'])
second_df=pd.DataFrame([['2001','Abu Dhabi','100-','Performed','45'],
['2001','Abu Dhabi','100','Not Performed','76'],
['2001','Abu Dhabi','','',''],
['2001','Dubai','100-','Performed','90'],
['2001','Dubai','100','Not Performed','50'],
['2001','Dubai','','',''],
['2002','Abu Dhabi','100-','Performed','78'],
['2002','Abu Dhabi','100','Not Performed','45'],
['2002','Abu Dhabi','', '', ''],
['2002','Dubai','100-','Performed','76'],
['2002','Dubai','100','Not Performed','58'],
['2002','Dubai', '', '', '']],columns=['Year','Emirate','Capacity','Type','Value'])
# Perform a left merge to get correct output
merged=first_df.merge(second_df,how='left',on=['Year', 'Emirate', 'Capacity'])
输出
Year Emirate Capacity Number Type Value
0 2001 Abu Dhabi 100- 462 Performed 45
1 2001 Abu Dhabi 100 44 Not Performed 76
2 2001 Abu Dhabi 200 657 NaN NaN
3 2001 Dubai 100- 40 Performed 90
4 2001 Dubai 100 30 Not Performed 50
5 2001 Dubai 200 51 NaN NaN
6 2002 Abu Dhabi 100- 300 Performed 78
7 2002 Abu Dhabi 100 220 Not Performed 45
8 2002 Abu Dhabi 200 56 NaN NaN
9 2002 Dubai 100- 55 Performed 76
10 2002 Dubai 100 67 Not Performed 58
11 2002 Dubai 200 89 NaN NaN
添加回答
举报