已采纳回答 / qq_匠邮心生_03449154
在获取全部链接的循环后面加上代码: print type(link),type(links)结果为:<class 'bs4.element.Tag'> <class 'bs4.element.ResultSet'>说明soup.find_all的返回结果links是一个自定义的类,node也是一种自定义的类for i in range(3): print links[i].name, links[i]['href'],links[i].get_text(),links[...
2016-06-09
仅输出1条记录就craw failed,检查html_parser模块get_new_data方法里title_node的赋值,在最后find前有个括号).find('h1')
2016-06-08
python3.5第三段代码,urllib2在3.5中为urllib.request
print("第三种方法")
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
urllib.request.install_opener(opener)
response3 = urllib.request.urlopen(url)
print(response3.getcode())
print(len(response3.read()))
print("第三种方法")
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
urllib.request.install_opener(opener)
response3 = urllib.request.urlopen(url)
print(response3.getcode())
print(len(response3.read()))
2016-06-07
import requests
class HtmlDownloader(object):
def download(self,url):
if url is None:
return None
r = requests.get(url)
if r.status_code != 200:
return None
return r.text
class HtmlDownloader(object):
def download(self,url):
if url is None:
return None
r = requests.get(url)
if r.status_code != 200:
return None
return r.text
2016-06-07
最赞回答 / 死瘦子
这是因为虽然 fout.write(data['title'].encode('utf-8')) 指定了编码,但你用浏览器打开页面时浏览器并不是使用的utf-8的编码,可能是GBK的编码,你可以选择下浏览器编码就正常了。你也可以修改下代码,在 fout.write("<html>") 后面加句 fout.write('<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />'),这 <meta ...
2016-06-07