目标:获取 df 的每一列和每个客户的缺失值百分比我的 df 是关于创建的票证: id type ... priority Client0 56 113 Incident ... Low client11 56 267 Demande ... High client12 56 294 Incident ... Nan NaN3 56 197 Demande ... Low client34 56 143 Demande ... Nan client4第一次尝试 :df.notna().sum()/len(agg_global)*100Out[29]: id 97.053453 type 76.415869 priority 82.626625 client 84.596443 这非常有用,但我想在我的输出中添加更多详细信息,在列中使用“客户端”维度,如下所示:我想创建的输出: Client1 Client2 Client3 NaNid 100.000000 100.000000 100.000000 66.990424type 76.415869 66.990424 76.415869 43.761970status 100.000000 100.000000 66.990424 76.415869category 66.990424 43.761970 76.415869 43.761970entity 43.761970 100.000000 76.415869 76.415869source_demande 84.596443 100.000000 76.415869 43.761970我尝试使用“groupby”但无法获得所需的输出...: id type ... priority Clientclient ... True 97.053453 76.415869 ... 29.98632 29.98632任何建议将被认真考虑。感谢您的关注 !
2 回答
![?](http://img1.sycdn.imooc.com/545862db00017f3402200220-100-100.jpg)
一只斗牛犬
TA贡献1784条经验 获得超2个赞
您可以删除Client
不测试缺失值百分比的列,通过 测试它们,用 replace sDataFrame.isna
聚合平均值以避免丢失它们,最后转置通过:Client
NaN
DataFrame.T
print (df)
id type priority Client
0 NaN Incident Low client1
1 NaN NaN High client1
2 56 294 Incident Nan NaN
3 56 197 NaN Low client3
4 NaN Demande NaN client4
df = (df.drop('Client', 1)
.isna()
.groupby(df['Client'].fillna('NaN'))
.mean()
.rename_axis(None)
.T)
print (df)
NaN client1 client3 client4
id 0.0 1.0 0.0 1.0
type 0.0 0.5 1.0 0.0
priority 0.0 0.0 0.0 1.0
添加回答
举报
0/150
提交
取消