先简单说下我的需求,我要从以下两个不同的url,获取不同的数据,最后统一调用itemhttps://www.esr.com/sc/map_ch...https://www.esr.com/sc/media_...spider代码#-*-coding:utf-8-*-importscrapyfromnews.itemsimportEsrItemclassEsrSpider(scrapy.Spider):name='esr'allowed_domains=['esr.com']defstart_requests(self):yieldscrapy.Request('https://www.esr.com/sc/map_china.php',self.parse1)yieldscrapy.Request('https://www.esr.com/sc/media_news.php',self.parse3)defparse1(self,response):forwebinresponse.xpath('//div[@class="earth_hide_ul"]/ul/li'):url_tmp=web.xpath('.//a/@href').extract()[0]urlquest="https://www.esr.com/sc/"+url_tmpyieldscrapy.Request(url=urlquest,callback=self.parse2)defparse2(self,response):item=EsrItem()item['assetstitle']=response.xpath('//div[@class="flexjustify_between_center"]/h3/text()').extract()[0]item['assetaddress']=response.xpath("//ul[@class='map_item_ul'][1]/li/b/text()").extract()[0]tmp=response.xpath("//ul[@class='map_item_ul'][2]")item['assettedian']=tmp.xpath("string(.)").extract()[0].strip()item['assetjiagou']=response.xpath("//ul[@class='map_store_ul']/li[1]/div/span/text()").extract()[0]item['assettudimianji']=response.xpath("//ul[@class='map_store_ul']/li[2]/div/span/text()").extract()[0].strip()item['assetjianzhumianji']=response.xpath("//ul[@class='map_store_ul']/li[3]/div/span/text()").extract()[0].strip()item['assetjungongtime']=response.xpath("//ul[@class='map_store_ul']/li[4]/div/span/text()").extract()[0].strip()assetpeople=response.xpath("//ul[@class='map_store_ul']/li[5]/div/span/a/text()").extract()[0].strip()assetpeople_mail=response.xpath("//ul[@class='map_store_ul']/li[5]/div/span/a/@href").extract()[0][6:]item['assetpeople']=assetpeople+assetpeople_mailyieldscrapy.Request(”这里如何写?“callback=self.parse3,meta={'item':item})defparse3(self,response):pass如上面我重写了start_request,yield返回2个请求分别调用parse1,parse2,我在parse2里面其实已经把item里面所有asset开头的字段数据取出来了,但是还有个new开头的字段需要在另一个url里面取,我想把parse2的assset的数据传到parse3里面继续处理,最后统一yielditem,但是这个parse2的回调函数url字段该怎么写呢?我其实不需要传url,只想把数据最后统一传到item,或者说我直接在parse里面直接yielditem吗?但是数据还不全啊。我的itemclassEsrItem(scrapy.Item):assetstitle=scrapy.Field()assetaddress=scrapy.Field()assettedian=scrapy.Field()assetjiagou=scrapy.Field()assettudimianji=scrapy.Field()assetjianzhumianji=scrapy.Field()assetjungongtime=scrapy.Field()assetpeople=scrapy.Field()newstitle=scrapy.Field()newtiems=scrapy.Field()newslink=scrapy.Field()
添加回答
举报
0/150
提交
取消