我正在尝试从网站上抓取评论并使用 Python (3.7) 和 BeautifulSoup 将它们存储到 csv 中。似乎抓取成功,但是当我写入文件时,只有一列包含完整数据,其余的只是第一个字符。任何提示都将不胜感激,如果它很明显很抱歉 - 这是一个新的爱好:)from urllib.request import urlopen as uReqfrom bs4 import BeautifulSoup as soup#URL to scrapemy_url = "https://www.indeed.com/cmp/Capital-One/reviews?fcountry=ALL&lang="#open connection, grab pageuClient = uReq(my_url)page_html = uClient#html parsingpage_soup = soup(page_html, "lxml")#grab all reviews on pagecontainers = page_soup.findAll("div",{"cmp-review-container"})uClient.close()#write to csvfilename = "indeedreviewtest.csv"f=open(filename, "w")headers = "review_id, review_score, role, review_text\n"f.write(headers)#loop through each review, collect review ID, rating, role & verbatumfor container in containers: reviewid_container = container.div["data-tn-entityid"] reviewid = reviewid_container[0] score_container = container.div.div.div.meta["content"] reviewscore = score_container[0] role_container = container.find("span", attrs={"class":"cmp-reviewer- job-title"}).text reviewerrole = role_container[0] reviewtext_container = container.find("span", attrs={"class":"cmp-review-text"}).text reviewtext = reviewtext_container f.write(reviewid + "," + reviewscore + "," + reviewerrole.replace(",", "|") + "," + reviewtext.replace(",", "|") + "\n")f.close()谢谢!
添加回答
举报
0/150
提交
取消