1 回答
TA贡献1797条经验 获得超4个赞
您可以DataFrame在函数中返回:
def z_score(x):
z = np.abs(stats.zscore(x))
c = np.where(x > 5, 1, 0)
return pd.DataFrame({'zscore':z,'label':c}, index=x.index)
df[['zscore','label']] = df.groupby(['GROUP'])['VALUE'].apply(z_score)
print (df)
GROUP VALUE zscore label
0 1 5 1.135550 0
1 2 2 1.000000 0
2 1 10 1.297771 1
3 2 20 1.000000 1
4 1 7 0.162221 1
但是为了获得更好的性能,可以在 out of 之后更改groupbyfor scoreonly 和labelcolumn count 的代码groupby:
def z_score(x):
z = np.abs(stats.zscore(x))
return z
df['zscore'] = df.groupby('GROUP')['VALUE'].transform(z_score)
#lambda function alternative
#df['zscore'] = df.groupby('GROUP')['VALUE'].transform(lambda x: np.abs(stats.zscore(x)))
df['label'] = np.where(df['VALUE'] > 5, 1, 0)
print (df)
GROUP VALUE zscore label
0 1 5 1.135550 0
1 2 2 1.000000 0
2 1 10 1.297771 1
3 2 20 1.000000 1
4 1 7 0.162221 1
添加回答
举报