2 回答
TA贡献1799条经验 获得超9个赞
line = line.translate(line.maketrans('', '', string.whitespace))
您正在删除包含此代码的行中的所有空格。删除它,它应该按预期工作。
TA贡献1826条经验 获得超6个赞
您的代码会删除空格以按空格拆分 - 这没有意义。由于您希望从给定的文本中提取每个单词,我建议您将所有单词彼此相邻地对齐,并在两者之间使用一个空格 - 这意味着您不仅要删除新行,不必要的空格,特殊/不需要的字符和数字,还要删除控制字符。
这应该可以解决问题:
import sys
import os
os.getcwd()
import string
path = "/your/path"
os.chdir(path)
# Prompt for user to input filename:
fname = input("Enter the filename: ")
try:
fhand = open(fname)
except IOError:
# Invalid filename error
print("\n")
print("Sorry, file can't be opened! Please check your spelling.")
sys.exit()
# Initialize char counts and word counts dictionary
counts = {}
worddict = {}
# create one liner with undesired characters removed
text = fhand.read().replace("\n", " ").replace("\r", "")
text = text.lower()
text = text.translate(text.maketrans("", "", string.digits))
text = text.translate(text.maketrans("", "", string.punctuation))
text = " ".join(text.split())
words = text.split(" ")
for word in words:
# Is the word already in the word dictionary?
if word in worddict:
# Increase by 1
worddict[word] += 1
else:
# Add word to dictionary with count of 1 if not there already
worddict[word] = 1
# Character count
for word in text:
# Increase count by 1 if letter
if word in counts:
counts[word] += 1
else:
counts[word] = 1
# Initialize dictionaries
lst = []
countlst = []
freqlst = []
# Count up the number of letters
for ltrs, c in counts.items():
# skip spaces
if ltrs == " ":
continue
lst.append((c, ltrs))
countlst.append(c)
# Sum up the count
totalcount = sum(countlst)
# Calculate the frequency in each dictionary
for ec in countlst:
efreq = (ec / totalcount) * 100
freqlst.append(efreq)
# Sort lists by count and percentage frequency
freqlst.sort(reverse=True)
lst.sort(reverse=True)
# Print out word counts sorted
for key in sorted(worddict.keys(), key=worddict.get, reverse=True)[:10]:
print(key, ":", worddict[key])
# Print out all letters and counts:
for ltrs, c, in lst:
print(c, "-", ltrs, "-", round(ltrs / totalcount * 100, 2), "%")
添加回答
举报