为了账号安全,请及时绑定邮箱和手机立即绑定

Beautifulsoup 无法抓取元素

Beautifulsoup 无法抓取元素

烙印99 2023-09-19 13:57:10
您好,我尝试抓取以下网站: https: //www.footlocker.co.uk/en/all/new/我想抓取以下元素的价格和“href”:<span class=" fl-price--sale ">    <meta itemprop="priceCurrency" content="GBP">    <meta itemprop="price" content="84.99"><span>£ 84,99</span></span>和这个(参考):<a href="https://www.footlocker.co.uk/en/p/adidas-performance-don-issue-2-men-shoes-92815?v=314102617504#!searchCategory=all" data-product-click-link="314102617504" data-hash-key="searchCategory" data-hash-url="https://www.footlocker.co.uk/en/p/adidas-performance-don-issue-2-men-shoes-92815?v=314102617504" data-testid="fl-product-details-link-314102617504">我试过这段代码:import urllib.requestimport bs4 as bsfrom bs4 import BeautifulSoupimport requestsproxies = {'type':'ip:port'}r= requests.get('https://www.footlocker.de/de/alle/new/', proxies=proxies)soup = BeautifulSoup(r.content,'html.parser')# It doesn't find it...for a in (soup.find_all('a')):    try:        if a['href'] == 'https://www.footlocker.co.uk/en/p/adidas-performance-don-issue-2-men-shoes-92815?v=314102617504#!searchCategory=all':            print(a['href'])    except:        pass# It don't find it...for price in (soup.find_all('span', class_=' fl-price--sale ')):    print(price.text)我尝试使用代理抓取,但他拒绝抓取元素(我认为 HTML 不正确)感谢您的建议:-)(仅用于教育建议)
查看完整描述

1 回答

?
不负相思意

TA贡献1777条经验 获得超10个赞

要获取产品的名称、链接和价格,您可以使用以下示例:


import requests

from bs4 import BeautifulSoup



url = 'https://www.footlocker.co.uk/INTERSHOP/web/FLE/Footlocker-Footlocker_GB-Site/en_GB/-/GBP/ViewStandardCatalog-ProductPagingAjax?SearchParameter=____&sale=new&MultiCategoryPathAssignment=all&PageNumber={}'


for page in range(3):  # <--- increase the number of pages here

    print('Page {}...'.format(page))

    data = requests.get(url.format(page)).json()

    soup = BeautifulSoup(data['content'], 'html.parser')


    for d in soup.select('[data-request]'):

        s = BeautifulSoup(requests.get(d['data-request']).json()['content'], 'html.parser')

        

        print(s.select_one('[itemprop="name"]').text)

        print(s.select_one('[itemprop="price"]')['content'], s.select_one('[itemprop="priceCurrency"]')['content'])

        print(s.a['href'])

        print('-' * 80)

印刷:


Page 0...

adidas Performance Don Issue 2 - Men Shoes

84.99 GBP

https://www.footlocker.co.uk/en/p/adidas-performance-don-issue-2-men-shoes-92815?v=314102617504

--------------------------------------------------------------------------------

Nike Air Force 1 Crater - Women Shoes

94.99 GBP

https://www.footlocker.co.uk/en/p/nike-air-force-1-crater-women-shoes-98071?v=315349054502

--------------------------------------------------------------------------------

Jordan Jumpmcn Cl Iii Camo - Baby Tracksuits

39.99 GBP

https://www.footlocker.co.uk/en/p/jordan-jumpmcn-cl-iii-camo-baby-tracksuits-91611?v=318280390044

--------------------------------------------------------------------------------

Jordan 13 Retro - Grade School Shoes

99.99 GBP

https://www.footlocker.co.uk/en/p/jordan-13-retro-grade-school-shoes-952?v=316701533404

--------------------------------------------------------------------------------


...and so on.


查看完整回答
反对 回复 2023-09-19
  • 1 回答
  • 0 关注
  • 91 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信