3 回答
TA贡献1815条经验 获得超10个赞
我认为您必须执行此操作才能从文件中提取所有记录并获取审核/摘要值。您不需要数据框。
#create a dictionary to store the list of review summary values
d = {'review summary':[]}
#function to extract only the review_summary from the line
def split_review_summary(full_line):
#find review/text and exclude it from the line
found = full_line.find('review/text:')
if found >= 0:
full_line = full_line[:found]
#find review summary. All text to the right is review summary
#add this to the dictionary
found = full_line.find('review/summary:')
if found >= 0:
review_summary = full_line[(found + 15):]
d['review summary'].append(review_summary)
#open the file for reading
with open ("xyz.txt","r") as f:
#read the first line
new_line = f.readline().rstrip('\n')
#loop through the rest of the lines
for line in f:
#remove newline from the data
line = line.rstrip('\n')
#if the line starts with product/productId, then its a new entry
#process the previous line and strip out the review_summary
#to do that, call split_review_summary function
if line[:17] == 'product/productId':
split_review_summary(new_line)
#reset new_line to the current line
new_line = line
else:
#append to the new_line as its part of the previous record
new_line += line
#the last full record has not been processed
#So send it to split_review_summary to extract review summary
split_review_summary(new_line)
#now dictionary d has all the review summary items
print (d)
其输出将是:
{'review summary': [' Good Quality Dog Food ', ' Not as Advertised ']}
我认为你的问题范围还包括写入新文件。
您可以打开一个文件并将字典写入一行。这将包含所有细节。我将把这部分留给你来解决。
TA贡献1828条经验 获得超6个赞
CSV 文件代表逗号分隔值。我在你的文件中没有看到任何逗号。
它看起来像一本损坏的字典(每个条目缺少分隔逗号):
my_dict ={
'productid': 12312312,
'some_key': 'I am the key!',
}
TA贡献1893条经验 获得超10个赞
我查看了 S.Ghoshal 提供的链接并得出以下结论:
#Opening your file
your_file = open('foods.txt')
#Reading every line
reviews = your_file.readlines()
reviews_array = []
dictionary = {}
#We are going through every line and skip it when we see that it's a blank line
for review in reviews:
this_line = review.split(":")
if len(this_line) > 1:
#The blank lines are less than 1 in length after the split
dictionary[this_line[0]] = this_line[1].strip()
#Every first part before ":" is the key of the dictionary, and the second part id the content.
else:
#If a blank linee was found lets save the object in the array and reset it
#for the next review
reviews_array.append(dictionary)
dictionary = {}
#Append the last object because it goes out the last else
reviews_array.append(dictionary)
f1=open("output.txt","a")
for r in reviews_array:
print(r['review/text'], file=f1)
f1.close()
现在,以 review/text 开头的行中的所有单词都将转储到文件中。接下来我需要创建一个包含所有独特单词的列表。
添加回答
举报