课程
                    
                        /后端开发
                        
                            /Python
                        
                        /Python开发简单爬虫

downloader失败

try:

new_url=self.urls.get_new_url()

print("craw%d : %s"%(count,new_url))

html_cont=self.downloader.download(new_url)#调试的时候在这出错了，且没抓到任何

#另外在urllib测试时，就打不开百度的页面，https倒是可以，但这样爬取得最终结果也是一样的

new_urls,new_data=self.parser.parse(new_url,html_cont)

self.urls.add_new_urls(new_urls)

self.outputer.collect_data(new_data)

if count==1000:

break

count=count+1

except:

print("craw failed")

输出

craw1 : http://baike.baidu.com/view/21087.htm

craw failed

scylhy

2016-02-16

源自：Python开发简单爬虫 7-7

关注问题我要回答

1003

操作

收起

1 回答

blacksea3 回答被采纳 +3 积分
2016-02-16

有可能中间这一块哪里代码打错了而不是self.urls.has_new_url()==0导致退出循环，python的百科里面是有别的链接的，你可以试着把try-except去掉，让错误直接显示出来

中间这一块指的是：

html_cont=self.downloader.download(new_url)
new_urls,new_data=self.parser.parse(new_url,html_cont)
self.urls.add_new_urls(new_urls)
self.outputer.collect_data(new_data)
if count==1000:
    break
count=count+1

根据错误位置设置print 相应的变量查看错误原因。

0 回复有任何疑惑可以回复我~

收起回答

scylhy 提问者

非常感谢！找到了

2016-02-17 回复有任何疑惑可以回复我~

0/150

提交

取消

Python开发简单爬虫

参与学习 227558 人
解答问题 1288 个

本教程带您解开python爬虫这门神奇技术的面纱

进入课程

downloader失败

我要回答关注问题

热搜

最近搜索清空

downloader失败

1 回答

downloader失败