运行/保存我的脚本后,如下所示,我尝试在终端中查看结果但没有成功。代码非常简单,但我似乎找不到解决方法。import scrapyclass TickersSpider(scrapy.Spider): name = 'tickers' allowed_domains = ['www.seekingalpha.com/'] start_urls = ['https://seekingalpha.com/market-news/on-the-move'] def parse(self, response): articles_all = response.xpath('//div[@class="title"]/a/text()').getall() articles_gainers = response.path('//div[@class="title"]/a[contains(text(), "remarket gainers")]/text()').getall() yield { 'articles': articles_all, 'articles_gainers': articles_gainers } 我还仔细检查了我是否在正确的目录中运行。scrapy crawl tickers这是我在终端运行时显示的内容:2020-07-25 16:53:35 [scrapy.utils.log] INFO: Scrapy 2.2.0 started (bot: seekingalpha)2020-07-25 16:53:35 [scrapy.utils.log] INFO: Versions: lxml 4.5.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.7.7 (default, May 6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g 21 Apr 2020), cryptography 3.0, Platform Windows-10-10.0.18362-SP02020-07-25 16:53:35 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor2020-07-25 16:53:35 [scrapy.crawler] INFO: Overridden settings:{'BOT_NAME': 'seekingalpha', 'NEWSPIDER_MODULE': 'seekingalpha.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['seekingalpha.spiders']}2020-07-25 16:53:35 [scrapy.extensions.telnet] INFO: Telnet Password: 2cb47f969c26a4132020-07-25 16:53:35 [scrapy.middleware] INFO: Enabled extensions:['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats']
1 回答
慕码人8056858
TA贡献1803条经验 获得超6个赞
问题是您的代码中有错字。
articles_gainers = response.path('//div[@class="title"]/a[contains(text(), "remarket gainers")]/text()').getall()
它应该response.xpath()
代替response.path()
. 这就是异常消息告诉您的内容:
AttributeError: 'HtmlResponse' object has no attribute 'path'
添加回答
举报
0/150
提交
取消