Spacy is_stop 不识别停用词？

当我使用 SpaCy 来识别停用词时，如果我使用en_core_web_lg语料库它不起作用，但是当我使用en_core_web_sm. 这是一个错误，还是我做错了什么？import spacynlp = spacy.load('en_core_web_lg')doc = nlp(u'The cat ran over the hill and to my lap')for word in doc: print(f' {word} | {word.is_stop}')结果： The | False cat | False ran | False over | False the | False hill | False and | False to | False my | False lap | False但是，当我更改此行以使用en_core_web_sm语料库时，会得到不同的结果：nlp = spacy.load('en_core_web_sm') The | False cat | False ran | False over | True the | True hill | False and | True to | True my | True lap | False

查看完整描述

2 回答

湖上湖

TA贡献2003条经验获得超2个赞

试试from spacy.lang.en.stop_words import STOP_WORDS，然后你就可以显式检查单词是否在集合中

from spacy.lang.en.stop_words import STOP_WORDS

import spacy

nlp = spacy.load('en_core_web_lg')

doc = nlp(u'The cat ran over the hill and to my lap')

for word in doc:

# Have to convert Token type to String, otherwise types won't match

print(f' {word} | {str(word) in STOP_WORDS}')

输出以下内容：

The | False

cat | False

ran | False

over | True

the | True

hill | False

and | True

to | True

my | True

lap | False

对我来说看起来像一个错误。但是，STOP_WORDS如果您需要，这种方法还可以让您灵活地将单词添加到集合中

反对回复 2021-06-15

热搜

最近搜索清空

Spacy is_stop 不识别停用词？

Spacy is_stop 不识别停用词？

2 回答

添加回答