我有一个需要从 pandas 数据框列中删除的 4,000 个字符串的列表。我下面的代码适用于下面的示例,但是当我在我的 20k+ 行的 pandas 数据帧上使用它时,它需要很长时间。关于加快速度的任何想法?import pandas as pdimport redf = pd.DataFrame( { "ID": [1, 2, 3, 4, 5], "name": [ "Hello Sam how is it going today? oh yeah", "Hello Jane how is it going today? oh yeah", "It is an Hello example how are you doing today?", "how is it going today?n[soldjgf ", "how is it going today Hello World", ], })my_list = ['how is it going today?n[soldjgf', 'how are you doing today?']# =============================================================================# p = re.compile('|'.join(map(re.escape, my_list)))df['cleaned_text'] = [p.sub(' ', text) for text in df['name']]
1 回答
绝地无双
TA贡献1946条经验 获得超4个赞
使用 df.str.replace()
p = re.compile('|'.join(map(re.escape, my_list)))
df['cleaned_text'] = df['name'].str.replace(p, ' ')
添加回答
举报
0/150
提交
取消