为了账号安全,请及时绑定邮箱和手机立即绑定

为什么我得到的搜索结果列表比我正在抓取的网页上的列表更大

为什么我得到的搜索结果列表比我正在抓取的网页上的列表更大

繁星点点滴滴 2022-01-18 17:07:54
我正在尝试收集列出的待售房屋的所有 href 链接,但是当我运行我的程序时,我得到了一个大约 50 个的列表,尽管这远远高于此单页上列出的房屋数量/href 链接(url)。url我已经尝试查看页面的源代码并交叉引用我的程序的结果,虽然有些是匹配的,但有些在网站页面 ( )上找不到。import requestsfrom bs4 import BeautifulSoup as bsurl='https://www.rightmove.co.uk/property-for-sale/find.html?searchType=SALE&locationIdentifier=REGION%5E1091&insId=1&radius=0.0&minPrice=&maxPrice=&minBedrooms=&maxBedrooms=&displayPropertyType=&maxDaysSinceAdded=&_includeSSTC=on&sortByPriceDescending=&primaryDisplayPropertyType=&secondaryDisplayPropertyType=&oldDisplayPropertyType=&oldPrimaryDisplayPropertyType=&newHome=&auction=false'Web_Page = requests.get(url)Soup = bs(Web_Page.text,'html.parser')Web_Section_Of_Interest= Soup.find_all('a',class_="propertyCard-link")count=0for item in Web_Section_Of_Interest:    print('https://www.rightmove.co.uk'+item.get('href'))    count+=1print(count)我得到了 50 个 href 链接的列表url但我期待一个与网页上列出的房屋数量相匹配的列表,即25。
查看完整描述

2 回答

?
暮色呼如

TA贡献1853条经验 获得超9个赞

我设法通过将类替换为来解决"propertyCard-link"问题"propertyCard-img-link"


工作代码:


import requests

from bs4 import BeautifulSoup as bs


url='https://www.rightmove.co.uk/property-for-sale/find.html?searchType=SALE&locationIdentifier=REGION%5E1091&insId=1&radius=0.0&minPrice=&maxPrice=&minBedrooms=&maxBedrooms=&displayPropertyType=&maxDaysSinceAdded=&_includeSSTC=on&sortByPriceDescending=&primaryDisplayPropertyType=&secondaryDisplayPropertyType=&oldDisplayPropertyType=&oldPrimaryDisplayPropertyType=&newHome=&auction=false'


Web_Page = requests.get(url)

Soup = bs(Web_Page.text,'html.parser')

Web_Section_Of_Interest= Soup.find_all('a',class_="propertyCard-img-link")


count=0


for item in Web_Section_Of_Interest:

    print('https://www.rightmove.co.uk'+item.get('href'))

    count+=1


print(count)


查看完整回答
反对 回复 2022-01-18
?
胡说叔叔

TA贡献1804条经验 获得超8个赞

如果您查看正在打印的实际 url,您会注意到它正在打印重复项。所以从技术上讲,你只得到 25 个。


print(count)

https://www.rightmove.co.uk/property-for-sale/property-61358637.html

https://www.rightmove.co.uk/property-for-sale/property-61358637.html

https://www.rightmove.co.uk/property-for-sale/property-57044346.html

https://www.rightmove.co.uk/property-for-sale/property-57044346.html

https://www.rightmove.co.uk/commercial-property-for-sale/property-70211329.html

https://www.rightmove.co.uk/commercial-property-for-sale/property-70211329.html

https://www.rightmove.co.uk/property-for-sale/property-68319664.html

https://www.rightmove.co.uk/property-for-sale/property-68319664.html

....

只需查看您的 propertyCard-link 元素中的前 2 个元素。一个是“摘要”,另一个是“详细信息”:


Web_Section_Of_Interest[0]

Out[6]: 

<a class="propertyCard-link" data-bind="click: propertyCardClick('details'), attr: { href: computedDetailsLink() }" data-test="property-details" href="/property-for-sale/property-61358637.html">

<h2 class="propertyCard-title" data-bind="text: propertyTypeFullDescription" itemprop="name">

            2 bedroom semi-detached house for sale        </h2>

<address class="propertyCard-address" itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress">

<meta content="Auckland Road, Potters Bar" data-bind="attr: { content: displayAddress }" itemprop="streetAddress"/>

<meta content="GB" data-bind="attr: { content: countryCode }" itemprop="addressCountry"/>

<span data-bind="text: displayAddress">Auckland Road, Potters Bar</span>

</address>

</a>


Web_Section_Of_Interest[1]

Out[7]: 

<a class="propertyCard-link" data-bind="click: propertyCardClick('summary'), attr: { href: computedDetailsLink() }" href="/property-for-sale/property-61358637.html">

<span data-bind="html: summary" data-test="property-description" itemprop="description">BPM Auckland are pleased to offer this spacious Extended 2 Double bedroom 1930's built semi detached house, situated in this popular location within easy reach of good schools including Dame Alice Owens. The property benefits from a large 190' rear garden and also potential for a loft conversion...</span>

</a>


查看完整回答
反对 回复 2022-01-18
  • 2 回答
  • 0 关注
  • 131 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信