用的github上最新的0.3.9版本,发现更改了project的代码后,schedule里面的内容居然没有更新,导致本来希望半小时抓取一次,结果爬虫是10秒钟爬取一次。不知道是不是bug,怎么解决。
代码是这样
class Handler(BaseHandler):
crawl_config = {
'headers':{
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.89 Safari/537.36',
}
}
@every(minutes=30)
def on_start(self):
self.crawl('http://www.xxxx.org/', callback=self.index_page)
@config(age=10)
def index_page(self, response):
schedul是这样
注:原来有写itag,后来删除了。
ACTIVE xxxx.index_page > http://www.xxxx.org/ (8 seconds ago updated )
taskid
9dfac8d63cb01eae0e33701e26de4778
lastcrawltime
1480581196.0514488 (8 seconds ago)
updatetime
1480581196.0515082 (8 seconds ago)
exetime
1480581206.0514526 (1 second ago)
track.fetch 1320.64ms
{
"content": null,
"encoding": "GBK",
"error": null,
"headers": {},
"ok": true,
"redirect_url": null,
"status_code": 200,
"time": 1.3206377029418945
}
track.process 34.6ms +16
{
"exception": null,
"follows": 16,
"logs": "",
"ok": true,
"result": null,
"time": 0.03459787368774414
}
schedule
{
"age": 10,
"auto_recrawl": true,
"exetime": 1480581206.0514526,
"itag": "v223",
"retried": 21
}
fetch
{}
process
{
"callback": "index_page"
}
添加回答
举报
0/150
提交
取消