为了账号安全,请及时绑定邮箱和手机立即绑定

错误 - 使用 BeautifulSoup4 解析网页时

错误 - 使用 BeautifulSoup4 解析网页时

繁星点点滴滴 2021-11-09 13:43:32
我正在尝试解析网页并打印项目的链接(href)。你能帮我解决哪里出错了吗?import requestsfrom bs4 import BeautifulSouplink = "https://www.amazon.in/Power- Banks/b/ref=nav_shopall_sbc_mobcomp_powerbank?ie=UTF8&node=6612025031"def amazon(url):    sourcecode = requests.get(url)    sourcecode_text = sourcecode.text    soup = BeautifulSoup(sourcecode_text)    for link in soup.findALL('a', {'class': 'a-link-normal aok-block a- text-normal'}):        href = link.get('href')        print(href)amazon(link)输出 :C:\Users\TIMAH\AppData\Local\Programs\Python\Python37\python.exe "C:/Users/TIMAH/OneDrive/study materials/Python_Test_Scripts/Self Basic/Class_Test.py" 回溯(最近一次调用最后一次):文件“C:/Users/TIMAH/OneDrive/study materials/Python_Test_Scripts/Self Basic/Class_Test.py”,第 15 行,在亚马逊(链接)文件“C:/Users/TIMAH/OneDrive/study materials/Python_Test_Scripts/Self Basic /Class_Test.py", line 9, in amazon soup = BeautifulSoup(sourcecode_text, 'features="html.parser"') File "C:\Users\TIMAH\AppData\Local\Programs\Python\Python37\lib\site- packages\bs4__init__.py", line 196, in init % ",".join(features)) bs4.FeatureNotFound:找不到具有您请求的功能的树构建器:features="html.解析器”。你需要安装解析器库吗?进程以退出代码 1 结束
查看完整描述

3 回答

?
米琪卡哇伊

TA贡献1998条经验 获得超6个赞

您可以添加标题。然后当你这样做时find_all('a'),你可以得到它的href:


import requests

from bs4 import BeautifulSoup


link = "https://www.amazon.in/Power-Banks/b/ref=nav_shopall_sbc_mobcomp_powerbank?ie=UTF8&node=6612025031"


def amazon(url):

    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}


    sourcecode = requests.get(url, headers=headers)

    sourcecode_text = sourcecode.text

    soup = BeautifulSoup(sourcecode_text, 'html.parser')


    for link in soup.find_all('a', href=True):

        href = link.get('href')

        print(href)


amazon(link)


查看完整回答
反对 回复 2021-11-09
?
慕少森

TA贡献2019条经验 获得超9个赞

您的代码中的问题是您使用了错误的方法名称 findALL .. 汤对象中没有 findALL 方法,因此没有返回任何方法。要修复新代码使用 find_all 的问题,findAll 也应该工作(小写双 l)。希望这对你来说清楚。


import requests

from bs4 import BeautifulSoup


link = "https://www.amazon.in/Power-Banks/b/ref=nav_shopall_sbc_mobcomp_powerbank?ie=UTF8&node=6612025031"



def amazon(url):

    sourcecode = requests.get(url)

    sourcecode_text = sourcecode.text

    soup = BeautifulSoup(sourcecode_text, "html.parser")

    # add "html.parser" as second arg , so you not get a warning .

    # use soup.find_all for new code , also soup.findAll should work 

    for link in soup.find_all('a', {'class': 'a-link-normal aok-block a-text-normal'}):

        href = link.get('href')

        print(href)


amazon(link)


查看完整回答
反对 回复 2021-11-09
?
catspeake

TA贡献1111条经验 获得超0个赞

如果你现在试图刮亚马逊,requests你将不会得到任何回报,因为亚马逊会知道这是一个脚本,而标头也无济于事(据我所知)。


相反,作为回应,他们会告诉以下内容:


To discuss automated access to Amazon data please contact api-services-support@amazon.com.

您可以使用requests-html或selenium通过渲染来抓取亚马逊。


Requeests-html 抓取标题的简单示例(如果您在隐身选项卡中打开相同的链接,结果将类似):


from requests_html import HTMLSession


session = HTMLSession()

url = 'https://www.amazon.com/s?k=apple+watch+series+6+band'

r = session.get(url)

r.html.render(sleep=1, keep_page=True, scrolldown = 1)


for container in r.html.find('.a-size-medium'):

    title = container.text

    print(f"Title: {title}")

输出:


Title: New Apple Watch Series 6 (GPS, 40mm) - (Product) RED - Aluminum Case with (Product) RED - Sport Band

Title: SUPCASE [Unicorn Beetle Pro] Designed for Apple Watch Series 6/SE/5/4 [44mm], Rugged Protective Case with Strap Bands(Black)

Title: Spigen Rugged Armor Pro Designed for Apple Watch Band with Case for 44mm Series 6/SE/5/4 - Charcoal Gray

Title: Highly rated and well-priced products

Title: Fitlink Stainless Steel Metal Band for Apple Watch 38/40/42/44mm Replacement Link Bracelet Band Compatible with Apple Watch Series 6 Apple Watch Series 5 Apple Watch Series 1/2/3/4 (Grey,42/44mm)

Title: TalkWorks Compatible for Apple Watch Band 42mm / 44mm Comfort Fit Mesh Loop Stainless Steel Adjustable Magnetic Strap for iWatch Series 6, 5, 4, 3, 2, 1, SE - Rose Gold

Title: COOYA Compatible for Apple Watch Band 44mm 42mm Women Men iWatch Wristband with Protective Rugged Case Sport Strap Adjustable Replacement Band Compatible with Apple Watch Series 6 SE 5 4 3 2, Clear

Title: Stainless Steel Metal Bands Compatible with Apple Watch Band 42mm 44mm, Gold Replacement Strap with Adapter+Case Cover Compatible with iWatch Series 6 5 4 3 2 1 SE Sport

Title: elago W2 Charger Stand Compatible with Apple Watch Series 6/SE/5/4/3/2/1 (44mm, 42mm, 40mm, 38mm), Durable Silicone, Compatible with Nightstand Mode (Black)

Title: Element Case Black Ops Watch Band for Apple Watch Series 4/5/6/SE, 44mm - Black (EMT-522-244A-01)

...


查看完整回答
反对 回复 2021-11-09
  • 3 回答
  • 0 关注
  • 245 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号