我正在尝试从数据框中提取名称。df.['target_name'].head()3 Minnie4 Albert [unclear]Gles[/unclear]5 Eliza [unclear]Gles[/unclear]6 John Slaltery7 [unclear]P.[/unclear] Slaltery23 ? Stewart34 John Maddison35 Herbert Olney36 William Iverach37 [unclear][/unclear]38 Peter Blacksmith39 William Oliver40 EmilyName: target_name, dtype: object这是输出。我们只想去掉不必要的字符并获取名称。这就是我所做的:import redf['target_name'] = df['target_name'].astype(str) #converting it into a string. 我尝试使用这两种方法,但两者都给了我相同的输出,即 Nandf['target_name'] = df['target_name'].str.extract('([a-zA-Z ]+)', expand=False).str.strip()df['target_name3'] = df['target_name'].str.replace(r'\([^)]*\)', '').str.strip()
1 回答

杨魅力
TA贡献1811条经验 获得超6个赞
这似乎对我有用。
import pandas as pd
import re
target_name = ["Minnie", "Albert [unclear]Gles[/unclear]",
"Eliza [unclear]Gles[/unclear]",
"[unclear]P.[/unclear] Slaltery", "? Stewart"]
df = pd.DataFrame(target_name, columns = ['target_name'])
df['target_name'] = df['target_name'].astype('str').str.replace(r'\/|\?','').str.replace('\[[a-z]+\]','').str.strip()
添加回答
举报
0/150
提交
取消