1 回答
TA贡献1966条经验 获得超4个赞
使用MultiIndex.from_product
by 列中所有组合的级别 byMultiIndex.levels
传递到DataFrame.reindex
:
df = df.set_index(['Name','Type'])
df = df.reindex(pd.MultiIndex.from_product(df.index.levels), fill_value='0000-00-00')
print (df)
Date
Name Type
A X 2019-08-06
Y 2019-08-08
Z 0000-00-00
B X 0000-00-00
Y 2019-08-01
Z 0000-00-00
C X 0000-00-00
Y 0000-00-00
Z 2019-10-12
编辑:错误意味着,ValueError:cannot handle a non-unique multi-index!中存在重复对,处理数据的解决方案是:NameType
df = pd.DataFrame({'Date':['2019-08-06','2019-08-08','2019-08-01','2019-10-12'],
'Name':['A','A','B','C'],
'Type':['X','X','Y','Z'],
'col':list('abcd')})
print (df)
Date Name Type col
0 2019-08-06 A X a
1 2019-08-08 A X b <-duplicated pair `A, X` - Name, Type
2 2019-08-01 B Y c
3 2019-10-12 C Z d
解决方案是先通过 删除重复项DataFrame.duplicated
,然后应用于reindex
所有组合:
mask = df.duplicated(['Name','Type'])
df1 = df[~mask].set_index(['Name','Type'])
df1 = (df1.reindex(pd.MultiIndex.from_product(df1.index.levels))
.fillna({'Date':'0000-00-00', 'col':'missing'}).reset_index())
print (df1)
Name Type Date col
0 A X 2019-08-06 a
1 A Y 0000-00-00 missing
2 A Z 0000-00-00 missing
3 B X 0000-00-00 missing
4 B Y 2019-08-01 c
5 B Z 0000-00-00 missing
6 C X 0000-00-00 missing
7 C Y 0000-00-00 missing
8 C Z 2019-10-12 d
最后添加所有重复的行concat
:
df = pd.concat([df1, df[mask]]).sort_values(['Name','Type'], ignore_index=True)
print (df)
Name Type Date col
0 A X 2019-08-06 a
1 A X 2019-08-08 b
2 A Y 0000-00-00 missing
3 A Z 0000-00-00 missing
4 B X 0000-00-00 missing
5 B Y 2019-08-01 c
6 B Z 0000-00-00 missing
7 C X 0000-00-00 missing
8 C Y 0000-00-00 missing
9 C Z 2019-10-12 d
添加回答
举报