大熊猫：每行适用哪个阈值？

给定分数列，例如，scores = pd.DataFrame({"score":np.random.randn(10)})和阈值thresholds = pd.DataFrame({"threshold":[0.2,0.5,0.8]},index=[7,13,33])我想找到每个分数的适用阈值，例如， score threshold 0 -1.613293 NaN 1 -1.357980 NaN 2 0.325720 7 3 0.116000 NaN 4 1.423171 33 5 0.282557 7 6 -1.195269 NaN 7 0.395739 7 8 1.072041 33 9 0.197853 NaNIOW，对于每个分数，s我都希望阈值t使得t = min(t: thresholds.threshold[t] < s)我怎么做？PS。根据已删除的答案：pd.cut(scores.score, bins=[-np.inf]+list(thresholds.threshold)+[np.inf], labels=["low"]+list(thresholds.index))

查看完整描述

3 回答

梵蒂冈之花

TA贡献1900条经验获得超5个赞

您可以使用np.digitize以下方法实现它：

indeces = [None,] + thresholds.index.tolist()

scores["score"].apply(

lambda x: indeces[np.digitize(x, thresholds["threshold"])])

反对回复 2021-04-13

肥皂起泡泡

TA贡献1829条经验获得超6个赞

您可以merge_asof通过一些操作来获得准确的结果。

(pd.merge_asof( scores.reset_index().sort_values('score'),

thresholds.reset_index(),

left_on='score', right_on= 'threshold', suffixes = ('','_'))

.drop('threshold',1).rename(columns={'index_':'threshold'})

.set_index('index').sort_index())

并使用您的数据，您将获得：

score threshold

index

0 -1.613293 NaN

1 -1.357980 NaN

2 0.325720 7.0

3 0.116000 NaN

4 1.423171 33.0

5 0.282557 7.0

6 -1.195269 NaN

7 0.395739 7.0

8 1.072041 33.0

9 0.197853 NaN

反对回复 2021-04-13

热搜

最近搜索清空

大熊猫：每行适用哪个阈值？

大熊猫：每行适用哪个阈值？

3 回答

添加回答