import urllib.request,http.cookiejar
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
urllib.request.install_opener(opener)
response3 = urllib.request.urlopen(url)
print (response3.getcode())
print (len(response2.read()))
print (cj)
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
urllib.request.install_opener(opener)
response3 = urllib.request.urlopen(url)
print (response3.getcode())
print (len(response2.read()))
print (cj)
2017-08-12
声明download的时候不要用response = urllib.request.urlopen(url);因为response已经被python占用,可以改为resp = urllib.request.urlopen(url);这样就不会出现response没有getcode用法的问题了
2017-08-11
我觉得应该在add_new_urls方法下for循环中添加一个if判断,判断下获取的url是否在old_urls中。
for url in urls:
if url not in self.old_urls:
self.new_urls.add(url);
——————————————————————————————
根本不需要啊,老师的add_new_urls()这个函数是调用add_new_url()来加入,add_new_url()里面已经有判断了,你这样不是多此一举?
for url in urls:
if url not in self.old_urls:
self.new_urls.add(url);
——————————————————————————————
根本不需要啊,老师的add_new_urls()这个函数是调用add_new_url()来加入,add_new_url()里面已经有判断了,你这样不是多此一举?
2017-08-08
Python2.7安装方法:
sudo python2.7 -m pip install --upgrade pip
sudo python -m pip install beautifulsoup4 储存在python2.7/site-packages
python3安装方法
pip3 install beautifulsoup4 #安装Python中会自带pip3.使用pip3安装的模块会储存在python3.6/site-packages
sudo python2.7 -m pip install --upgrade pip
sudo python -m pip install beautifulsoup4 储存在python2.7/site-packages
python3安装方法
pip3 install beautifulsoup4 #安装Python中会自带pip3.使用pip3安装的模块会储存在python3.6/site-packages
2017-08-08
教程源码:https://github.com/huazhicai/imooc/tree/master/spider
2017-08-07
学习模仿的代码:https://git.oschina.net/xiedongji/spider_demo.git
2017-08-07