课程
                    
                        /后端开发
                        
                            /Python
                        
                        /Python开发简单爬虫

为什么第三种方法结果和老师的不同？感觉没出网页内容

第三种方法

200

<CookieJar[<Cookie BAIDUID=B47DAA3FB9F11031507F5932552A3BAB:FG=1 for .baidu.com/>, <Cookie BIDUPSID=B47DAA3FB9F11031507F5932552A3BAB for .baidu.com/>, <Cookie H_PS_PSSID=1451_21116_26350_20927 for .baidu.com/>, <Cookie PSTM=1525337354 for .baidu.com/>, <Cookie BDSVRTM=0 for www.baidu.com/>, <Cookie BD_HOME=0 for www.baidu.com/>]>

114788

浪小仙

2018-05-03

源自：Python开发简单爬虫 5-3

关注问题我要回答

1259

操作

收起

6 回答

人在梦游中
2018-05-07

from http import cookiejar
from urllib import request
url = "http://www.baidu.com"

print("第一种方法")
response1 = request.urlopen(url)
resp1 = response1.read()
print(response1.getcode())
print(len(resp1))
print(resp1)

print("第二种方法")
req = request.Request(url)
req.add_header("user-agent", "Mozilla/5.0")
response2 = request.urlopen(req)
print(response2.getcode())
resp2 = response2.read()
print(len(resp2))
print(resp2.decode("utf-8"))

print("第三种方法")
cj = cookiejar.CookieJar()
opener = request.build_opener(request.HTTPCookieProcessor(cj))
request.install_opener(opener)
response3 = request.urlopen(url)
print(response3.getcode())
print(len(response3.read()))
print(cj)
print(response3.read().decode("utf-8"))

1 回复有任何疑惑可以回复我~

收起回答

慕后端4582086

UnicodeDecodeError: 'utf8' codec can't decode byte 0xc4 in position 2320: invalid continuation byte 出现了这个错误~

2018-05-20 回复有任何疑惑可以回复我~

人在梦游中回复慕后端4582086

import前加上 # -*- coding:utf-8 -*-

2018-05-21 回复有任何疑惑可以回复我~

人在梦游中
2018-05-07

http cookiejar
urllib request
url = ()
response1 = request.urlopen(url)
resp1 = response1.read()
(response1.getcode())
((resp1))
(resp1)

()
req = request.Request(url)
req.add_header()
response2 = request.urlopen(req)
(response2.getcode())
resp2 = response2.read()
((resp2))
(resp2.decode())

()
cj = cookiejar.CookieJar()
opener = request.build_opener(request.HTTPCookieProcessor(cj))
request.install_opener(opener)
response3 = request.urlopen(url)
(response3.getcode())
((response3.read()))
(cj)
(response3.read().decode())

0 回复有任何疑惑可以回复我~

收起回答

G王
2018-05-07

第三种方法为什么把前面的覆盖了，也没出现内容啊

0 回复有任何疑惑可以回复我~

收起回答

人在梦游中
2018-05-07

resp2 = response2.read()
((resp2))
(resp2.decode())

这样就可以了

0 回复有任何疑惑可以回复我~

收起回答

G王

你的代码能贴一下吗

2018-05-07 回复有任何疑惑可以回复我~

人在梦游中
2018-05-07

我也是这样

0 回复有任何疑惑可以回复我~

收起回答

林二小
2018-05-03

需要把 len()去掉，打印 response3.read() 的内容，而不是长度

0 回复有任何疑惑可以回复我~

收起回答

0/150

提交

取消

Python开发简单爬虫

参与学习 227558 人
解答问题 1288 个

本教程带您解开python爬虫这门神奇技术的面纱

进入课程

为什么第三种方法结果和老师的不同？感觉没出网页内容

我要回答关注问题

热搜

最近搜索清空

为什么第三种方法结果和老师的不同？感觉没出网页内容

6 回答

为什么第三种方法结果和老师的不同？感觉没出网页内容