2 回答
data:image/s3,"s3://crabby-images/cc9b2/cc9b2bc6efd4612cbac56eeb59d493bfa4a03dbb" alt="?"
TA贡献1806条经验 获得超8个赞
这是使用列表理解和 itertools 的更快方法 -
import itertools
#Get vocab of items
vocab = list(df1['Id'].astype(int))
#get filtered list of combinations in each row of df2
filtered = [[int(j) for j in i.split(',') if int(j) in vocab] for i in list(df2['Tag Id'])]
#Get counts of the combinations and display as a dataframe
counts = list(zip(*np.unique(filtered, return_counts=True)))
pd.DataFrame(counts, columns=['Combinations', 'Counts'])
Combinations Counts
0 [181, 987] 2
1 [300, 653, 987] 1
2 [456] 1
data:image/s3,"s3://crabby-images/cda2d/cda2dec0537c809a7fa12cc23aa6b72a6c449b80" alt="?"
TA贡献1851条经验 获得超5个赞
让我们尝试将inexplode分开,然后用和计数:Tag Idsdf1mergedf1
s = (df2['Tag Id'].str.split(',')
.explode()
.reset_index()
)
(df1.merge(s, left_on='Id', right_on='Tag Id')
.sort_values('Tag Id')
.groupby('index')
.agg(Combination=('Id',','.join))
['Combination']
.value_counts().reset_index()
)
输出:
index Combination
0 181,987 2
1 653,987,300 1
2 456 1
添加回答
举报