首页猿问使用 nltk 之类的...

使用 nltk 之类的 Python 库缩短句子

Python

拉风的咖菲猫 2023-02-15 16:51:42

我正在使用Nltk从句子中删除停用词。例如。"I would love to fly again via American Airlines"结果："Love to fly American Airlines"我曾尝试过以下代码：# Tokenizing the text txt = "I love to fly with American Airlines"stopWords = set(stopwords.words("english")) words = word_tokenize(txt) # Creating a frequency table to keep the # score of each word freqTable = dict() for word in words: word = word.lower() if word in stopWords: continue if word in freqTable: freqTable[word] += 1 else: freqTable[word] = 1# Creating a dictionary to keep the score # of each sentence sentences = sent_tokenize(txt) sentenceValue = dict() for sentence in sentences: for word, freq in freqTable.items(): if word in sentence.lower(): if sentence in sentenceValue: sentenceValue[sentence] += freq else: sentenceValue[sentence] = freq sumValues = 0for sentence in sentenceValue: sumValues += sentenceValue[sentence] # Average value of a sentence from the original text average = int(sumValues / len(sentenceValue)) # Storing sentences into our summary. summary = '' for sentence in sentences: if (sentence in sentenceValue) and (sentenceValue[sentence] > (1.2 * average)): summary += " " + sentence print("Summary: " + summary)这个结果是一个空字符串，因为我认为这个句子太短而无法Nltk工作。只是研究是否有更简单的方法，我打算为此训练一个模型。

查看完整描述

1 回答

米脂

TA贡献1836条经验获得超3个赞

我正在使用Nltk从句子中删除停用词。

例如。"I would love to fly again via American Airlines"

结果："Love to fly American Airlines"

我曾尝试过以下代码：

# Tokenizing the text

txt = "I love to fly with American Airlines"

stopWords = set(stopwords.words("english"))

words = word_tokenize(txt)

# Creating a frequency table to keep the

# score of each word

freqTable = dict()

for word in words:

word = word.lower()

if word in stopWords:

continue

if word in freqTable:

freqTable[word] += 1

else:

freqTable[word] = 1

# Creating a dictionary to keep the score

# of each sentence

sentences = sent_tokenize(txt)

sentenceValue = dict()

for sentence in sentences:

for word, freq in freqTable.items():

if word in sentence.lower():

if sentence in sentenceValue:

sentenceValue[sentence] += freq

else:

sentenceValue[sentence] = freq

sumValues = 0

for sentence in sentenceValue:

sumValues += sentenceValue[sentence]

# Average value of a sentence from the original text

average = int(sumValues / len(sentenceValue))

# Storing sentences into our summary.

summary = ''

for sentence in sentences:

if (sentence in sentenceValue) and (sentenceValue[sentence] > (1.2 * average)):

summary += " " + sentence

print("Summary: " + summary)

这个结果是一个空字符串，因为我认为这个句子太短而无法Nltk工作。只是研究是否有更简单的方法，我打算为此训练一个模型。

反对回复 2023-02-15

1 回答
0 关注
229 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

使用 nltk 之类的 Python 库缩短句子

使用 nltk 之类的 Python 库缩短句子

1 回答

添加回答