熊猫：一列的每个值的nan百分比

目标：获取 df 的每一列和每个客户的缺失值百分比我的 df 是关于创建的票证： id type ... priority Client0 56 113 Incident ... Low client11 56 267 Demande ... High client12 56 294 Incident ... Nan NaN3 56 197 Demande ... Low client34 56 143 Demande ... Nan client4第一次尝试：df.notna().sum()/len(agg_global)*100Out[29]: id 97.053453 type 76.415869 priority 82.626625 client 84.596443 这非常有用，但我想在我的输出中添加更多详细信息，在列中使用“客户端”维度，如下所示：我想创建的输出： Client1 Client2 Client3 NaNid 100.000000 100.000000 100.000000 66.990424type 76.415869 66.990424 76.415869 43.761970status 100.000000 100.000000 66.990424 76.415869category 66.990424 43.761970 76.415869 43.761970entity 43.761970 100.000000 76.415869 76.415869source_demande 84.596443 100.000000 76.415869 43.761970我尝试使用“groupby”但无法获得所需的输出...： id type ... priority Clientclient ... True 97.053453 76.415869 ... 29.98632 29.98632任何建议将被认真考虑。感谢您的关注！

查看完整描述

2 回答

一只斗牛犬

TA贡献1784条经验获得超2个赞

您可以删除Client不测试缺失值百分比的列，通过测试它们，用 replace sDataFrame.isna聚合平均值以避免丢失它们，最后转置通过：ClientNaNDataFrame.T

print (df)

id type priority Client

0 NaN Incident Low client1

1 NaN NaN High client1

2 56 294 Incident Nan NaN

3 56 197 NaN Low client3

4 NaN Demande NaN client4

df = (df.drop('Client', 1)

.isna()

.groupby(df['Client'].fillna('NaN'))

.mean()

.rename_axis(None)

.T)

print (df)

NaN client1 client3 client4

id 0.0 1.0 0.0 1.0

type 0.0 0.5 1.0 0.0

priority 0.0 0.0 0.0 1.0

反对回复 2023-03-22

撒科打诨

TA贡献1934条经验获得超2个赞

据我所知，使用蛮力是可能的。我会尝试使用isna函数和求和来估计每行或每列中 NaN 的数量，然后我会尝试估计百分比。

反对回复 2023-03-22

热搜

最近搜索清空

熊猫：一列的每个值的nan百分比

熊猫：一列的每个值的nan百分比

2 回答

添加回答