为了账号安全,请及时绑定邮箱和手机立即绑定

如何清理此脚本的响应以使其更具可读性?

如何清理此脚本的响应以使其更具可读性?

繁星点点滴滴 2023-07-11 18:56:11
如何将该脚本的输出转换为更简洁的格式(如 csv)?当我将回复保存为文本时,它的格式很糟糕。我尝试使用 writer.writerow 但无法使用此方法来解释变量。import requestsfrom bs4 import BeautifulSoupurl = "https://www.rockauto.com/en/catalog/ford,2015,f-150,3.5l+v6+turbocharged,3308773,brake+&+wheel+hub,brake+pad,1684"response = requests.get(url)data = response.textsoup = BeautifulSoup(data, 'html.parser')meta_tag = soup.find('meta', attrs={'name': 'keywords'})category = meta_tag['content']linecodes = []partnos = []descriptions = []infos = []for tbody in soup.select('tbody[id^="listingcontainer"]'):    tmp = tbody.find('span', class_='listing-final-manufacturer')    linecodes.append(tmp.text if tmp else '-')    tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')    partnos.append(tmp.text if tmp else '-')    tmp = tbody.find('span', class_='span-link-underline-remover')    descriptions.append(tmp.text if tmp else '-')    tmp = tbody.find('div', class_='listing-text-row')    infos.append(tmp.text if tmp else '-')for row in zip(linecodes,partnos,infos,descriptions):    result = category + ' | {:<20} | {:<20} | {:<80} | {:<80}'.format(*row)    with open('complete.txt', 'a+') as f:        f.write(result + '/n')        print(result)
查看完整描述

1 回答

?
繁华开满天机

TA贡献1816条经验 获得超4个赞

你可以将它放入 pandas 数据框中

for-loop从原始代码中删除最后一个。

# imports

import requests

from bs4 import BeautifulSoup

import pandas as pd


# set pandas display options to display more rows and columns

pd.set_option('display.max_columns', 700)

pd.set_option('display.max_rows', 400)

pd.set_option('display.min_rows', 10)


# your code

url = "https://www.rockauto.com/en/catalog/ford,2015,f-150,3.5l+v6+turbocharged,3308773,brake+&+wheel+hub,brake+pad,1684"


response = requests.get(url)

data = response.text

soup = BeautifulSoup(data, 'html.parser')


meta_tag = soup.find('meta', attrs={'name': 'keywords'})


category = meta_tag['content']


linecodes = []

partnos = []

descriptions = []

infos = []

for tbody in soup.select('tbody[id^="listingcontainer"]'):

    tmp = tbody.find('span', class_='listing-final-manufacturer')

    linecodes.append(tmp.text if tmp else '-')


    tmp = tbody.find('span', class_='listing-final-partnumber as-link-if-js buyers-guide-color')

    partnos.append(tmp.text if tmp else '-')


    tmp = tbody.find('span', class_='span-link-underline-remover')

    descriptions.append(tmp.text if tmp else '-')


    tmp = tbody.find('div', class_='listing-text-row')

    infos.append(tmp.text if tmp else '-')

添加了数据框的代码

# create dataframe

df = pd.DataFrame(zip(linecodes,partnos,infos,descriptions), columns=['codes', 'parts', 'info', 'desc'])


# add the category column

df['category'] = category


# break the category column into multiple columns if desired

# skip the last 2 columns, because they are empty

df[['cat_desc', 'brand', 'model', 'engine', 'cat_part']] = df.category.str.split(',', expand=True).iloc[:, :-2]


# drop the unneeded category column

df.drop(columns='category', inplace=True)


# save to csv

df.to_csv('complete.txt', index=False)


# display(df)

              codes       parts                            info                                 desc                   cat_desc  brand   model                 engine    cat_part

0           CENTRIC    30016020  Rear; w/ Manual parking brake   Semi-Metallic; w/Shims and Hardware  2015 FORD F-150 Brake Pad   FORD   F-150   3.5L V6 Turbocharged   Brake Pad

1           CENTRIC    30116020  Rear; w/ Manual parking brake         Ceramic; w/Shims and Hardware  2015 FORD F-150 Brake Pad   FORD   F-150   3.5L V6 Turbocharged   Brake Pad

2  DYNAMIC FRICTION  1551160200     Rear; Manual Parking Brake                5000 Advanced; Ceramic  2015 FORD F-150 Brake Pad   FORD   F-150   3.5L V6 Turbocharged   Brake Pad


查看完整回答
反对 回复 2023-07-11
  • 1 回答
  • 0 关注
  • 103 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信