代码如下。start_urls是可以爬取到信息的,可是无法匹配到其他的链接headers={ 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
}
start_urls=[ 'https://chaoshi.detail.tmall.com/item.htm?id=576632421624&tbpm=3'
]
rules=(
Rule(LinkExtractor(allow=(r'https://chaoshi.detail.tmall.com/item.htm\?id=\d+&tbpm=3')),process_request='request_tagPage',callback='parse_item',follow=True),
) def request_tagPage(self, request):
newRequest = request.replace(headers=self.headers) return newRequest def parse_item(self,response):
print(response.url)
添加回答
举报
0/150
提交
取消