我有一个df带有列hashtags的数据框:df['hashtags']>>>0 NaN1 NaN2 ['COVID19']3 ['COVID19']4 ['CoronaVirusUpdates', 'COVID19'] ... 132596 ['coronacrise', 'covid19', 'JN', 'NãoÉSóUmNúme...132597 ['covid19']132598 ['corona', 'covid19']132599 NaN132600 ['covid19']Name: hashtags, Length: 132601, dtype: object我想创建一个包含列的所有列表元素(除了 )Nan的列表。我试图通过以下方式列出列表:li = df['hashtags'].tolist()但它将列表转换为字符串并以字符串列表结尾。例如:li[:5]>>> [nan, nan, "['COVID19']", "['COVID19']", "['CoronaVirusUpdates', 'COVID19']"]我想要的输出li[:5]是这样的:['COVID19', 'COVID19', 'CoronaVirusUpdates', 'COVID19', 'coronavirus', 'covid19']
1 回答
慕田峪7331174
TA贡献1828条经验 获得超13个赞
想法是首先删除缺失值 by Series.dropna
,然后将列表 repr by 转换ast.literal_eval
为列表并在列表理解中展平嵌套列表:
df = pd.DataFrame({'hashtags':[np.nan, np.nan,
"['COVID19']", "['COVID19']",
"['CoronaVirusUpdates', 'COVID19']"]})
import ast
out = [y for x in df['hashtags'].dropna() for y in ast.literal_eval(x)]
print (out)
['COVID19', 'COVID19', 'CoronaVirusUpdates', 'COVID19']
添加回答
举报
0/150
提交
取消