我正在使用 Python 在期刊中创建关键字词云。我遇到的问题是我不希望关键字中的单独单词被拆分,而是一起考虑。我设法通过用 替换空格字符来做到这一点' ','_'但现在的问题是我得到的最终图像当然有下划线字符。这是代码:import numpy as npimport pandas as pdfrom os import pathfrom PIL import Imagefrom wordcloud import WordCloud, STOPWORDS, ImageColorGeneratorimport matplotlib.pyplot as plt# Load in the dataframedf = pd.read_csv("input/cmame_0.csv")l = df['Author Keywords'].str.split(';', expand=False).tolist()text = ';'.join([item for sublist in l if isinstance(sublist,list) for item in sublist])text = text.replace(" ", "_")stopwords = set(STOPWORDS)# Create and generate a word cloud image:wordcloud = WordCloud(stopwords=stopwords, max_font_size=50, max_words=100, background_color="white").generate(text)# Display the generated image:plt.imshow(wordcloud, interpolation='bilinear')plt.axis("off")plt.show()产生我可以在这里使用一个正则表达式,但我似乎找不到正确的正则表达式。
2 回答
慕姐8265434
TA贡献1813条经验 获得超2个赞
可能是一个迟到的答案。
你可以利用generate_from_frequencies方法。
代码片段:
from collections import Counter
from wordcloud import WordCloud
import matplotlib.pyplot as plt
words = ["hey", "hello world", "may be", "exploring world", "something fishy"]
word_cloud_lst = Counter(words)
wordcloud = WordCloud(max_font_size=50, max_words=100, background_color="white").generate_from_frequencies(word_cloud_lst)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
结果在这里。
添加回答
举报
0/150
提交
取消