为了账号安全,请及时绑定邮箱和手机立即绑定

求助:scrapy爬取数据失败,反复调试都不成功

求助:scrapy爬取数据失败,反复调试都不成功

Helenr 2019-05-21 10:40:44
目标:爬取某一学习网站上课程信息,前期调试仅获取课程名称爬虫文件:importscrapyfromxtzx.itemsimportXtzxItemfromscrapy.httpimportRequestclassLessonSpider(scrapy.Spider):name='lesson'allowed_domains=['xuetangx.com']start_urls=['http://www.xuetangx.com/courses/course-v1:TsinghuaX+80512073X+2018_T1/about']'''defstart_requests(self):ua={"User-Agent":"Mozilla/5.0(WindowsNT10.0;WOW64;Trident/7.0;rv:11.0)likeGecko"}yieldRequest("www.xuetangx.com/courses/course-v1:TsinghuaX+80512073X+2018_T1/about",headers=ua)'''defparse(self,response):item=XtzxItem()item["title"]=response.xpath("//div[@class='title_detail'/h3[@class='courseabout_title']/text()").extract()print(item["title"])执行日志:2018-04-2811:08:33[scrapy.utils.log]INFO:Scrapy1.5.0started(bot:xtzx)2018-04-2811:08:33[scrapy.utils.log]INFO:Versions:lxml4.2.1.0,libxml22.9.7,cssselect1.0.3,parsel1.4.0,w3lib1.19.0,Twisted17.9.0,Python3.5.4(v3.5.4:3f56838,Aug82017,02:17:05)[MSCv.190064bit(AMD64)],pyOpenSSL17.5.0(OpenSSL1.1.0h27Mar2018),cryptography2.2.2,PlatformWindows-10-10.0.16299-SP02018-04-2811:08:33[scrapy.crawler]INFO:Overriddensettings:{'SPIDER_MODULES':['xtzx.spiders'],'BOT_NAME':'xtzx','NEWSPIDER_MODULE':'xtzx.spiders','USER_AGENT':'Mozilla/5.0(WindowsNT10.0;WOW64;Trident/7.0;rv:11.0)likeGecko'}2018-04-2811:08:33[scrapy.middleware]INFO:Enabledextensions:['scrapy.extensions.corestats.CoreStats','scrapy.extensions.telnet.TelnetConsole','scrapy.extensions.logstats.LogStats']2018-04-2811:08:34[scrapy.middleware]INFO:Enableddownloadermiddlewares:['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware','scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware','scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware','scrapy.downloadermiddlewares.useragent.UserAgentMiddleware','scrapy.downloadermiddlewares.retry.RetryMiddleware','scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware','scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware','scrapy.downloadermiddlewares.redirect.RedirectMiddleware','scrapy.downloadermiddlewares.cookies.CookiesMiddleware','scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware','scrapy.downloadermiddlewares.stats.DownloaderStats']2018-04-2811:08:34[scrapy.middleware]INFO:Enabledspidermiddlewares:['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware','scrapy.spidermiddlewares.offsite.OffsiteMiddleware','scrapy.spidermiddlewares.referer.RefererMiddleware','scrapy.spidermiddlewares.urllength.UrlLengthMiddleware','scrapy.spidermiddlewares.depth.DepthMiddleware']2018-04-2811:08:34[scrapy.middleware]INFO:Enableditempipelines:[]2018-04-2811:08:34[scrapy.core.engine]INFO:Spideropened----------好像从这开始出问题2018-04-2811:08:34[scrapy.extensions.logstats]INFO:Crawled0pages(at0pages/min),scraped0items(at0items/min)2018-04-2811:08:34[scrapy.extensions.telnet]DEBUG:Telnetconsolelisteningon127.0.0.1:60232018-04-2811:08:34[scrapy.core.engine]DEBUG:Crawled(200)(referer:None)2018-04-2811:08:34[scrapy.core.scraper]ERROR:Spidererrorprocessing(referer:None)Traceback(mostrecentcalllast):File"d:python3.5libsite-packagesparselselector.py",line228,inxpath**kwargs)File"srclxmletree.pyx",line1577,inlxml.etree._Element.xpathFile"srclxmlxpath.pxi",line307,inlxml.etree.XPathElementEvaluator.__call__File"srclxmlxpath.pxi",line227,inlxml.etree._XPathEvaluatorBase._handle_resultlxml.etree.XPathEvalError:InvalidpredicateDuringhandlingoftheaboveexception,anotherexceptionoccurred:Traceback(mostrecentcalllast):File"d:python3.5libsite-packagestwistedinternetdefer.py",line653,in_runCallbackscurrent.result=callback(current.result,*args,**kw)File"E:pythonxtzxxtzxspiderslesson.py",line16,inparseitem["title"]=response.xpath("//div[@class='title_detail'/h3[@class='courseabout_title']/text()").extract()File"d:python3.5libsite-packagesscrapyhttpresponsetext.py",line119,inxpathreturnself.selector.xpath(query,**kwargs)File"d:python3.5libsite-packagesparselselector.py",line232,inxpathsix.reraise(ValueError,ValueError(msg),sys.exc_info()[2])File"d:python3.5libsite-packagessix.py",line692,inreraiseraisevalue.with_traceback(tb)File"d:python3.5libsite-packagesparselselector.py",line228,inxpath**kwargs)File"srclxmletree.pyx",line1577,inlxml.etree._Element.xpathFile"srclxmlxpath.pxi",line307,inlxml.etree.XPathElementEvaluator.__call__File"srclxmlxpath.pxi",line227,inlxml.etree._XPathEvaluatorBase._handle_resultValueError:XPatherror:Invalidpredicatein//div[@class='title_detail'/h3[@class='courseabout_title']/text()2018-04-2811:08:35[scrapy.core.engine]INFO:Closingspider(finished)2018-04-2811:08:35[scrapy.statscollectors]INFO:DumpingScrapystats:{'downloader/request_bytes':301,'downloader/request_count':1,'downloader/request_method_count/GET':1,'downloader/response_bytes':24409,'downloader/response_count':1,'downloader/response_status_count/200':1,'finish_reason':'finished','finish_time':datetime.datetime(2018,4,28,3,8,35,118088),'log_count/DEBUG':2,'log_count/ERROR':1,'log_count/INFO':7,'response_received_count':1,'scheduler/dequeued':1,'scheduler/dequeued/memory':1,'scheduler/enqueued':1,'scheduler/enqueued/memory':1,'spider_exceptions/ValueError':1,'start_time':datetime.datetime(2018,4,28,3,8,34,418003)}2018-04-2811:08:35[scrapy.core.engine]INFO:Spiderclosed(finished)感觉程序很简单,但是就是不行,其他items都是常规的设置,pipelines里面没有添加新的内容,然后settings里面就修改了一下ROBOTSTXT_OBEY的值网上查了很久这样的错误,都没找到相应的方法,也试过伪装浏览器爬取也没用,自学,没有老师,完全没辙了,求助各位.
查看完整描述

2 回答

?
哔哔one

TA贡献1854条经验 获得超8个赞

xpath.div[@class='title_detail'这里是否少个]?
item["title"]=response.xpath("//div[@class='title_detail'/h3[@class='courseabout_title']/text()").extract()
                            
查看完整回答
反对 回复 2019-05-21
?
杨__羊羊

TA贡献1943条经验 获得超7个赞

File"srclxmlxpath.pxi",line227,inlxml.etree._XPathEvaluatorBase._handle_resultValueError:XPatherror:Invalidpredicatein//div[@class='title_detail'/h3[@class='courseabout_title']/text()
xpath写错了,少了个]
                            
查看完整回答
反对 回复 2019-05-21
  • 2 回答
  • 0 关注
  • 1064 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信