首页猿问如何使用 pandas...

如何使用 pandas 解析文本文件并创建列表

Python

繁星coding 2023-10-06 19:35:25

我正在尝试使用 pandas 创建一个列表/数组，其中包含以下文本文件的“评论/文本”字段中的所有单词：product/productId: B001E4KFG0 review/userId: A3SGXH7AUHU8GW review/profileName: delmartian review/helpfulness: 1/1 review/score:5.0 review/time: 1303862400 review/summary: Good Quality Dog Food review/text: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than most.product/productId: B00813GRG4 review/userId: A1D87F6ZCVE5NK review/profileName: dll pa review/helpfulness: 0/0 review/score: 1.0 review/time: 1346976000 review/summary: Not as Advertised review/text: Product arrived labeled as Jumbo Salted Peanuts...（文本文件 food.txt 位于：http://snap.stanford.edu/data/web-FineFoods.html）我的最终目标是识别评论/文本字段中出现的所有独特单词。我写了以下代码： import pandas as pd f=open("foods.txt","r") df=pd.read_csv(f,names=['product/productId','review/userId','review/profileName','review/helpfulness','review/score','review/time','review/summary']) selected = df[ df['review/summary'] ] print(selected)selected.to_csv('result.csv', sep=' ', header=False)但是，我收到以下错误：ValueError: cannot index with vector containing NA / NaN values有什么建议/意见吗？

查看完整描述

3 回答

动漫人物

TA贡献1815条经验获得超10个赞

我认为您必须执行此操作才能从文件中提取所有记录并获取审核/摘要值。您不需要数据框。

#create a dictionary to store the list of review summary values

d = {'review summary':[]}

#function to extract only the review_summary from the line

def split_review_summary(full_line):

#find review/text and exclude it from the line

found = full_line.find('review/text:')

if found >= 0:

full_line = full_line[:found]

#find review summary. All text to the right is review summary

#add this to the dictionary

found = full_line.find('review/summary:')

if found >= 0:

review_summary = full_line[(found + 15):]

d['review summary'].append(review_summary)

#open the file for reading

with open ("xyz.txt","r") as f:

#read the first line

new_line = f.readline().rstrip('\n')

#loop through the rest of the lines

for line in f:

#remove newline from the data

line = line.rstrip('\n')

#if the line starts with product/productId, then its a new entry

#process the previous line and strip out the review_summary

#to do that, call split_review_summary function

if line[:17] == 'product/productId':

split_review_summary(new_line)

#reset new_line to the current line

new_line = line

else:

#append to the new_line as its part of the previous record

new_line += line

#the last full record has not been processed

#So send it to split_review_summary to extract review summary

split_review_summary(new_line)

#now dictionary d has all the review summary items

print (d)

其输出将是：

{'review summary': [' Good Quality Dog Food ', ' Not as Advertised ']}

我认为你的问题范围还包括写入新文件。

您可以打开一个文件并将字典写入一行。这将包含所有细节。我将把这部分留给你来解决。

反对回复 2023-10-06

30秒到达战场

TA贡献1828条经验获得超6个赞

CSV 文件代表逗号分隔值。我在你的文件中没有看到任何逗号。

它看起来像一本损坏的字典（每个条目缺少分隔逗号）：

my_dict ={

'productid': 12312312,

'some_key': 'I am the key!',

}

反对回复 2023-10-06

白猪掌柜的

TA贡献1893条经验获得超10个赞

我查看了 S.Ghoshal 提供的链接并得出以下结论：

#Opening your file

your_file = open('foods.txt')

#Reading every line

reviews = your_file.readlines()

reviews_array = []

dictionary = {}

#We are going through every line and skip it when we see that it's a blank line

for review in reviews:

this_line = review.split(":")

if len(this_line) > 1:

#The blank lines are less than 1 in length after the split

dictionary[this_line[0]] = this_line[1].strip()

#Every first part before ":" is the key of the dictionary, and the second part id the content.

else:

#If a blank linee was found lets save the object in the array and reset it

#for the next review

reviews_array.append(dictionary)

dictionary = {}

#Append the last object because it goes out the last else

reviews_array.append(dictionary)

f1=open("output.txt","a")

for r in reviews_array:

print(r['review/text'], file=f1)

f1.close()

现在，以 review/text 开头的行中的所有单词都将转储到文件中。接下来我需要创建一个包含所有独特单词的列表。

反对回复 2023-10-06

3 回答
0 关注
127 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何使用 pandas 解析文本文件并创建列表

如何使用 pandas 解析文本文件并创建列表

3 回答

添加回答