计算 2 个熊猫数据帧中的匹配

Python

largeQ 2021-08-24 16:36:24

我有 2 个数据框，每行包含文本作为列表。这个叫做dfDatum File File_type Text Datum 2000-01-27 2000-01-27 0864820040_000127_04.txt _04 [business, date, jan, heineken, starts, integr..我还有另一个，df_lm，看起来像这样List_type Words0 LM_cnstrain. [abide, abiding, bound, bounded, commit, commi...1 LM_litigius. [abovementioned, abrogate, abrogated, abrogate...2 LM_modal_me. [can, frequently, generally, likely, often, ou...3 LM_modal_st. [always, best, clearly, definitely, definitive...4 LM_modal_wk. [almost, apparently, appeared, appearing, appe...我想在 df 中创建新列，其中应该计算单词的匹配项，例如，df.Text[0] 中 df_lm.Words[0] 中有多少个单词注意：df 有大约 500 行，df_lm 有 6 -> 所以我需要在 df 中创建 6 个新列，以便更新后的 df 看起来像这样 Datum ...LM_cnstrain LM_litigius Lm_modal_me ...2000-01-27 ... 5 3 42000-02-25 ... 7 1 0我希望我的问题很清楚。提前致谢！编辑：我已经做了某事。类似的方法是创建一个列表并循环遍历它，但由于 df_lm 中的列表很长，这不是一个选项。代码如下所示：result_list[]for file in file_list: count_growth = 0 for word in text.split (): if word in growth: count_growth = count_growth +1 a={'Grwoth':count_growth} result_list.append(a)

查看完整描述

2 回答

慕的地6264312

TA贡献1817条经验获得超6个赞

所以我得出以下解决方案：

for file in file_list:

count_lm_constraint = 0

count_lm_litigious = 0

count_lm_modal_me = 0

for word in text.split()

if word in df_lm.iloc[0,1]:

count_lm_constraint = count_lm_constraint +1

if word in df_lm.iloc[1,1]:

count_lm_litigious = count_lm_litigious +1

if word in df_lm.iloc[2,1]:

count_lm_modal_me = count_lm_modal_me +1

a={"File": name, "Text": text,'lm_uncertain':count_lm_uncertain,'lm_positive':count_lm_positive ....}

result_list.append(a)

反对回复 2021-08-24

蝴蝶不菲

TA贡献1810条经验获得超4个赞

根据我的评论，你可以尝试这样的事情：

下面的代码必须在循环中运行，其中来自第一个 df 的文本列必须与来自下一个的所有 6 列匹配，并使列的值来自 len(c)

desc = df_lm.iloc[0,1]

matches = df.text.isin(desc)

result = df.text[matches]

如果这对您有帮助，请告诉我，否则将更新/删除答案

反对回复 2021-08-24

热搜

最近搜索清空

计算 2 个熊猫数据帧中的匹配

计算 2 个熊猫数据帧中的匹配

2 回答

添加回答