为了账号安全,请及时绑定邮箱和手机立即绑定

将用户字典和其他特定单词替换为 0

将用户字典和其他特定单词替换为 0

慕哥6287543 2021-09-11 15:36:05
所以我有一个评论数据集,其中有评论简直是最好的。这是我去年买的。还在用。迄今为止没有遇到任何问题。惊人的电池寿命。在黑暗或光天化日之下工作正常。送给任何书友的最佳礼物。(这是来自原始数据集,我已经删除了所有标点符号并在我处理的数据集中使用了所有小写字母)我想要做的是将一些单词替换为 1(根据我的字典),将其他单词替换为 0。我的字典是dict = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}我希望我的输出如下:0010000000000001000000000100000我用过这个代码:df['newreviews'] = df['reviews'].map(dict).fillna("0")这总是返回 0 作为输出。我不想要这个,所以我将 1 和 0 作为字符串,但尽管如此,我还是得到了相同的结果。任何建议如何解决这个问题?
查看完整描述

3 回答

?
慕田峪7331174

TA贡献1828条经验 获得超13个赞

你可以做:


# clean the sentence

import re

sent = re.sub(r'\.','',sent)


# convert to list

sent = sent.lower().split()


# get values from dict using comprehension

new_sent = ''.join([str(1) if x in mydict else str(0) for x in sent])

print(new_sent)


'001100000000000000000000100000'


查看完整回答
反对 回复 2021-09-11
?
浮云间

TA贡献1829条经验 获得超4个赞

首先不要dict用作变量名,因为内置函数(python 保留字),然后使用list comprehensionwithget将不匹配的值替换为0.


注意:


如果数据是这样的date.Amazing- 标点符号后没有空格需要用空格替换。


df = pd.DataFrame({'reviews':['Simply the best. I bought this last year. Still using. No problems faced till date.Amazing battery life. Works fine in darkness or broad daylight. Best gift for any book lover.']})


d = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}


df['reviews']  = df['reviews'].str.replace(r'[^\w\s]+', ' ').str.lower()

df['newreviews'] = [''.join(d.get(y, '0')  for y in x.split()) for x in df['reviews']]

选择:


df['newreviews'] =  df['reviews'].apply(lambda x: ''.join(d.get(y, '0')  for y in x.split()))

print (df)

                                             reviews  \

0  simply the best  i bought this last year  stil...   


                        newreviews  

0  0011000000000001000000000100000  


查看完整回答
反对 回复 2021-09-11
?
人到中年有点甜

TA贡献1895条经验 获得超7个赞

你可以通过

df.replace(repl, regex=True, inplace=True)

df你的数据框在哪里,repl你的字典在哪里。


查看完整回答
反对 回复 2021-09-11
  • 3 回答
  • 0 关注
  • 193 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号