如何在Python脚本中运行Scrapy

我是Scrapy的新手，我正在寻找一种从Python脚本运行它的方法。我找到2个资料来解释这一点：http://tryolabs.com/Blog/2011/09/27/calling-scrapy-python-script/http://snipplr.com/view/67006/using-scrapy-from-a-script/我不知道应该把我的Spider代码放在哪里以及如何从main函数中调用它。请帮忙。这是示例代码：# This snippet can be used to run scrapy spiders independent of scrapyd or the scrapy command line tool and use it from a script. # # The multiprocessing library is used in order to work around a bug in Twisted, in which you cannot restart an already running reactor or in this case a scrapy instance.# # [Here](http://groups.google.com/group/scrapy-users/browse_thread/thread/f332fc5b749d401a) is the mailing-list discussion for this snippet. #!/usr/bin/pythonimport osos.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'project.settings') #Must be at the top before other importsfrom scrapy import log, signals, projectfrom scrapy.xlib.pydispatch import dispatcherfrom scrapy.conf import settingsfrom scrapy.crawler import CrawlerProcessfrom multiprocessing import Process, Queueclass CrawlerScript(): def __init__(self): self.crawler = CrawlerProcess(settings) if not hasattr(project, 'crawler'): self.crawler.install() self.crawler.configure() self.items = [] dispatcher.connect(self._item_passed, signals.item_passed) def _item_passed(self, item): self.items.append(item) def _crawl(self, queue, spider_name): spider = self.crawler.spiders.create(spider_name) if spider: self.crawler.queue.append_spider(spider) self.crawler.start() self.crawler.stop() queue.put(self.items) def crawl(self, spider): queue = Queue() p = Process(target=self._crawl, args=(queue, spider,)) p.start() p.join() return queue.get(True)

查看完整描述

3 回答

LEATH

TA贡献1936条经验获得超6个赞

所有其他答案均参考Scrapyv0.x。根据更新的文档，Scrapy 1.0要求：

import scrapy

from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):

# Your spider definition

...

process = CrawlerProcess({

'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'

})

process.crawl(MySpider)

process.start() # the script will block here until the crawling is finished

反对回复 2019-12-26

热搜

最近搜索清空

如何在Python脚本中运行Scrapy

如何在Python脚本中运行Scrapy

3 回答

添加回答