为了账号安全,请及时绑定邮箱和手机立即绑定

使用 nltk 之类的 Python 库缩短句子

使用 nltk 之类的 Python 库缩短句子

拉风的咖菲猫 2023-02-15 16:51:42
我正在使用Nltk从句子中删除停用词。例如。"I would love to fly again via American Airlines"结果:"Love to fly American Airlines"我曾尝试过以下代码:# Tokenizing the text txt = "I love to fly with American Airlines"stopWords = set(stopwords.words("english")) words = word_tokenize(txt) # Creating a frequency table to keep the  # score of each word freqTable = dict() for word in words:     word = word.lower()     if word in stopWords:         continue    if word in freqTable:         freqTable[word] += 1    else:         freqTable[word] = 1# Creating a dictionary to keep the score # of each sentence sentences = sent_tokenize(txt) sentenceValue = dict() for sentence in sentences:     for word, freq in freqTable.items():         if word in sentence.lower():             if sentence in sentenceValue:                 sentenceValue[sentence] += freq             else:                 sentenceValue[sentence] = freq sumValues = 0for sentence in sentenceValue:     sumValues += sentenceValue[sentence] # Average value of a sentence from the original text average = int(sumValues / len(sentenceValue)) # Storing sentences into our summary. summary = '' for sentence in sentences:     if (sentence in sentenceValue) and (sentenceValue[sentence] > (1.2 * average)):         summary += " " + sentence print("Summary: " + summary)这个结果是一个空字符串,因为我认为这个句子太短而无法Nltk工作。只是研究是否有更简单的方法,我打算为此训练一个模型。
查看完整描述

1 回答

?
米脂

TA贡献1836条经验 获得超3个赞

我正在使用Nltk从句子中删除停用词。


例如。"I would love to fly again via American Airlines"


结果:"Love to fly American Airlines"


我曾尝试过以下代码:


# Tokenizing the text 

txt = "I love to fly with American Airlines"

stopWords = set(stopwords.words("english")) 

words = word_tokenize(txt) 


# Creating a frequency table to keep the  

# score of each word 


freqTable = dict() 

for word in words: 

    word = word.lower() 

    if word in stopWords: 

        continue

    if word in freqTable: 

        freqTable[word] += 1

    else: 

        freqTable[word] = 1


# Creating a dictionary to keep the score 

# of each sentence 

sentences = sent_tokenize(txt) 

sentenceValue = dict() 


for sentence in sentences: 

    for word, freq in freqTable.items(): 

        if word in sentence.lower(): 

            if sentence in sentenceValue: 

                sentenceValue[sentence] += freq 

            else: 

                sentenceValue[sentence] = freq 




sumValues = 0

for sentence in sentenceValue: 

    sumValues += sentenceValue[sentence] 


# Average value of a sentence from the original text 


average = int(sumValues / len(sentenceValue)) 


# Storing sentences into our summary. 

summary = '' 

for sentence in sentences: 

    if (sentence in sentenceValue) and (sentenceValue[sentence] > (1.2 * average)): 

        summary += " " + sentence 


print("Summary: " + summary)

这个结果是一个空字符串,因为我认为这个句子太短而无法Nltk工作。只是研究是否有更简单的方法,我打算为此训练一个模型。


查看完整回答
反对 回复 2023-02-15
  • 1 回答
  • 0 关注
  • 110 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信