首页手记 Python开发简单爬虫urllib2实例测试在Pyt...

Python开发简单爬虫urllib2实例测试在Python3中的实现

标签：

Python

Python的基础知识看了看，感觉还是需要实例来练习。今天我准备把爬虫学习一下。因为Python3相较于Python2有较大改动，我前几天看的Python3，我使用的也是3.6.0版本。当我练习《Python开发简单爬虫5-3》的代码时，发现如果按照老师的代码敲，在Python3版本上是运行不出来的。我写这篇笔记是为了以后方便自己拿Python2的代码放在Python3中修改。
老师的代码是：

url = "http://www.baidu.com"

print '第一种方法'
response1 = urllib2.urlopen(url)
print response1.getcode()
print len(response1.read())

print '第二种方法'
request = urllib2.Request(url)
request.add_header("user-agent","Mozilla/5.0")
response2 = urllib2,urlopen(request)
print response2.getcode()
print len(response2.read())

print '第三种方法'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
response3 = urllib2.urlopen(url)
print response3.getcode()
print cj
print response3.read()

这样写在Python3中是不行的，以下是Python3中的正确代码：

import urllib.request
import http.cookiejar 
url = "https://www.baidu.com"
print ('第一种方法')
response1 = urllib.request.urlopen(url)
print(response1.getcode())
print(len(response1.read()))

print('第二种方法')
request = urllib.request.Request(url)
request.add_header("user-agent","Mozilla/5.0")
response2 = urllib.request.urlopen(request)
print (response1.getcode())
print(len(response2.read()))

print('第三种方法')
cj =http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
urllib.request.install_opener(opener)
response3 = urllib.request.urlopen(url)
print(response3.getcode())
print (cj)
print(response3.read())

在Python3中包urllib2归入了urllib中，所以要导入urllib.request，并且要把urllib2替换成urllib.request。第三种方法中要使用cookielib的话，就得导入http.cookiejar了。urlencode包的位置为http.cookiejar。
我刚刚学Python，菜鸟一个，如果哪里写的不对或者你有更好的方法，望指正或指点，大家相互学习。
至于最基本的print，在Python3中把它当作一个函数了，所以print ‘xxx’这样肯定是不行了，要加“()”的。

点击查看更多内容