为了账号安全,请及时绑定邮箱和手机立即绑定

为什么即使请求数只有 1,我也会在 scrapy 响应中收到 429 个请求?

为什么即使请求数只有 1,我也会在 scrapy 响应中收到 429 个请求?

暮色呼如 2022-12-27 16:49:24
我正在使用scrapy抓取网站,但收到 429 响应。下面是它的输出日志:2020-06-06 21:39:45 [scrapy.core.engine] INFO: Spider openedINFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)2020-06-06 21:39:45 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)INFO:scrapy.extensions.telnet:Telnet console listening on 127.0.0.1:60232020-06-06 21:39:45 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023DEBUG:scrapy.core.engine:Crawled (429) <GET https://www.realestate.com.au/rent/in-aspendale+gardens,+vic+3195/list-1> (referer: None)2020-06-06 21:39:46 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://www.realestate.com.au/rent/in-aspendale+gardens,+vic+3195/list-1> (referer: None)INFO:scrapy.spidermiddlewares.httperror:Ignoring response <429 https://www.realestate.com.au/rent/in-aspendale+gardens,+vic+3195/list-1>: HTTP status code is not handled or not allowed2020-06-06 21:39:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <429 https://www.realestate.com.au/rent/in-aspendale+gardens,+vic+3195/list-1>: HTTP status code is not handled or not allowedINFO:scrapy.core.engine:Closing spider (finished)2020-06-06 21:39:46 [scrapy.core.engine] INFO: Closing spider (finished)INFO:scrapy.statscollectors:Dumping Scrapy stats:{'downloader/request_bytes': 343, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 2030, 'downloader/response_count': 1, 'downloader/response_status_count/429': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2020, 6, 6, 11, 39, 46, 255540), 'httperror/response_ignored_count': 1, 'httperror/response_ignored_status_count/429': 1, 'log_count/DEBUG': 1, 'log_count/INFO': 10, 'memusage/max': 50941952, 'memusage/startup': 50941952, 'response_received_count': 1, 'scheduler/dequeued': 1,你可以看到downloader/request_count只有 1。
查看完整描述

1 回答

?
斯蒂芬大帝

TA贡献1827条经验 获得超8个赞

状态代码429表示连接过多。下载器上的请求计数为 1,因为 429 表示拒绝并且不会通过下载器。他们错误地向他们认为是机器人的任何请求提供 429 代码。


经过实验后,由于缺少 cookie 标头,它拒绝了我,该 cookie 标头是在 set-cookie 标头的初始 GET 请求中设置的。这里有一些尝试将 Selenium 作为任何抓取项目中的最后一个选项。


尝试使用像下面这样的完整标题和COOKIES_ENABLED = True.

Host: www.realestate.com.au

User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8

Accept-Language: en-US,en;q=0.5

Accept-Encoding: gzip, deflate, br

Referer: https://duckduckgo.com/

Connection: keep-alive

Upgrade-Insecure-Requests: 1

Pragma: no-cache

Cache-Control: no-cache

TE: Trailers


查看完整回答
反对 回复 2022-12-27
  • 1 回答
  • 0 关注
  • 87 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号