2 回答

TA贡献1775条经验 获得超11个赞
在 Pandas 中处理字符串数据很慢,所以使用 map by Seriesand 的列表理解mean:
from statistics import mean
L = [mean(all_scores.get(y) for y in x.split('-')) for x in word_series]
a = pd.Series(L, index=word_series.index)
print (a)
0 0.340000
1 0.760000
2 0.263333
dtype: float64
或者:
def mean(a):
return sum(a) / len(a)
L = [mean([all_scores.get(y) for y in x.split('-')]) for x in word_series]
a = pd.Series(L, index=word_series.index)
如果可能的一些值不匹配的附加参数np.nan,以get和使用numpy.nanmean:
L = [np.nanmean([all_scores.get(y, np.nan) for y in x.split('-')]) for x in word_series]
a = pd.Series(L, index=word_series.index)
或者:
def mean(a):
return sum(a) / len(a)
L = [mean([all_scores.get(y, np.nan) for y in x.split('-') if y in all_scores.index])
for x in word_series]

TA贡献1820条经验 获得超9个赞
这是一个方法
打印(一)
words
0 the-cat-is-pink
1 blue-sea
2 best-job-ever
打印(b)
all_scores
the 0.34
cat 0.56
best 0.01
ever 0.77
is 0.12
pink 0.34
job 0.01
sea 0.87
blue 0.65
b = b.reset_index()
打印(b)
index all_scores
0 the 0.34
1 cat 0.56
2 best 0.01
3 ever 0.77
4 is 0.12
5 pink 0.34
6 job 0.01
7 sea 0.87
8 blue 0.65
a['score'] = a['words'].str.split('-').apply(lambda x: sum([b[b['index'] == w].reset_index()['all_scores'][0] for w in x])/len(x))
输出
words score
0 the-cat-is-pink 0.340000
1 blue-sea 0.760000
2 best-job-ever 0.263333
添加回答
举报