python网络编程基础,第四版
pycharm实现,python版本2.7.5
urllib2扩展性更好
1.下载Web界面
2.在远程HTTP服务器上验证
3.提交表单(from)数据
4.处理错误
5.与非HTTP协议通信
1.下载Web界面
(1)
#coding=utf-8
import sys,urllib2
req=urllib2.Request(sys.argv[1])
fd=urllib2.urlopen(req)
while 1:
data=fd.read(1024)
if not len(data):
break
sys.stdout.write(data)
sys.stdout 是标准输出文件。write就是往这个文件写数据。
合起来就是打印数据到标准输出。类似print
运行结果:
D:\python\python.exe E:/code/python/unit6/dump_page.py
http://www.example.com<!doctype html> <html> <head> <title>Example Domain</title> <meta charset="utf-8" /> <meta http-equiv="Content-type" content="text/html; charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <style type="text/css"> body { background-color: #f0f0f2; margin: 0; padding: 0; font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif; } div { width: 600px; margin: 5em auto; padding: 50px; background-color: #fff; border-radius: 1em; } a:link, a:visited { color: #38488f; text-decoration: none; } @media (max-width: 700px) { body { background-color: #fff; } div { width: auto; margin: 0 auto; border-radius: 0; padding: 1em; } } </style> </head> <body> <div> <h1>Example Domain</h1> <p>This domain is established to be used for illustrative examples in documents. You may use this domain in examples without prior coordination or asking for permission.</p> <p><a href="http://www.iana.org/domains/example">More information...</a></p> </div> </body> </html>
Process finished with exit code 0
(2)
#coding=utf-8
import sys,urllib2
req=urllib2.Request(sys.argv[1])
fd=urllib2.urlopen(req)
print "Retrieved",fd.geturl()
info=fd.info()
for key,value in info.items():
print "%s=%s"%(key,value)
运行结果如下:
D:\python\python.exe E:/code/python/unit6/dump_info.py http://httpd.apache.org/dev
Retrieved http://httpd.apache.org/dev/
content-length=8870
accept-ranges=bytes
vary=Accept-Encoding
server=Apache/2.4.7 (Ubuntu)
last-modified=Wed, 25 Jan 2017 14:38:55 GMT
connection=close
etag="22a6-546ec313cb061"
date=Fri, 17 Mar 2017 06:29:52 GMT
content-type=text/html
Process finished with exit code 0
注:从geturl()得到的值与传入Request的对象不同,结尾处多了一条斜线,远程服务器做了一个Http转向,urllib自动跟随了转向。
其他行显示Http的header信息;
2.在远程HTTP服务器上验证
(1)
#coding=utf-8
import sys,urllib2,getpass
class TerminalPassword(urllib2.HTTPPasswordMgr):
def find_user_password(self, realm, authuri):
ret=urllib2.HTTPPasswordMgr.find_user_password(self,realm,authuri)
if ret[0] == None and ret[1] == None:
sys.stdout.write("Login reauired for %s at %sn" % (realm,authuri))
sys.stdout.write("Username: ")
username = sys.stdin.readline().rstrip()
password = getpass.getpass().rstrip()
return (username, password)
else:
return ret
req = urllib2.Request(sys.argv[1])
opener = urllib2.build_opener(urllib2.HTTPBasicAuthHandler(TerminalPassword()))
response = opener.open(req)
print response.read()
扩展urllib2.HTTPPasswordMgr类,允许程序在需要的时候像操作员询问用户名和密码,
build_opener:允许指定额外的处理程序,代码需要支持认证,所以HTTPBasicAuthHandler加到处理链接
3.提交表单(from)数据
GET方法:把表单数据编码至url,在给出请求的页面后,加一个问号,接着是表单的元素。每个键和值对用“&”分割,有些字符需要被避免。不适合数据量比较大的地方。
(1)
代码:
#coding=utf-8
import sys,urllib2
req=urllib2.Request(sys.argv[1])
fd=urllib2.urlopen(req)
while 1:
data=fd.read(1024)
if not len(data):
break
sys.stdout.write(data)
sys.stdout 是标准输出文件。write就是往这个文件写数据。
合起来就是打印数据到标准输出。类似print
运行结果:
D:\python\python.exe E:/code/python/unit6/dump_page.py http://weixin.sogou.com/weixin?p=01030402&query=%E5%8D%9A%E5%AE%A2%E5%9B%AD&type=2&ie=utf8
<!doctype html>
<html>
<head>
<link rel="shortcut icon" href="http://logo.www.sogou.com/images/logo2014/new/favicon.ico" type="image/x-icon">
<link href="/logo-safari.png?v=20170315" id="apple-touch-icon" rel="apple-touch-icon-precomposed"/>
<link href="https://www.sogou.com/sug/css/m3.min.v.7.css" rel="stylesheet" type="text/css">
<link href="/new/pc/css/weixin-public-new.min.css?v=20170315" rel="stylesheet" type="text/css">
注:必须给url加上引号
(2)
代码:
#coding=utf-8
import sys,urllib2,urllib
def addGETdata(url,data):
return url+'?'+urllib.urlencode(data)
zipcode=sys.argv[1]
url=addGETdata('http://www.weather.com.cn/cgi-bin/findweather/getForecast',[('query',zipcode)])
print "using URL",url
req=urllib2.Request(url)
fd=urllib2.urlopen(req)
while 1:
data=fd.read(1024)
if not len(data):
break
sys.stdout.write(data)
注:函数addGETdata(url,data)负责在url结尾添加所有的数据。在内部,他在URL和通过urllib.urlencode()得到的数据间添加问号。
POST方法:单独部分发送。URL永远不会被修改,附加信息通过第二个参数传递给urlopen().
(3)
代码:
#coding=utf-8
import sys,urllib2,urllib
zipcode=sys.argv[1]
url='http://www.wunderground.com/cgi-bin/findweather/getForcecast'
data=urllib.urlencode([('query',zipcode)])
req=urllib2.Request(url)
fd=urllib2.urlopen(req,data)
while 1:
data=fd.read(1024)
if not len(data):
break
sys.stdout.write(data)
4.处理错误
(1)
代码:
#coding=utf-8
import sys,urllib2
req=urllib2.Request(sys.argv[1])
try:
fd=urllib2.urlopen(req)
except urllib2.URLError,e:
print "Error reteiveving data:",e
sys.exit(1)
print "Retrieved",fd.geturl()
info=fd.info()
for key,value in info.items():
print "%s=%s"% (key,value)
运行结果:
D:\python\python.exe E:/code/python/unit6/error_basic.py
https://www.wunderground.com/cgi-bin/findweather/getForcecast
Error reteiveving data: HTTP Error 404: Not FoundProcess finished with exit code 1
(2)
代码:
#coding=utf-8
# import sys,urllib2
#
# req=urllib2.Request(sys.argv[1])
#
# try:
# fd=urllib2.urlopen(req)
# except urllib2.URLError,e:
# print "Error reteiveving data:",e
# sys.exit(1)
# print "Retrieved",fd.geturl()
# info=fd.info()
# for key,value in info.items():
# print "%s=%s"% (key,value)
import sys,urllib2
req=urllib2.Request(sys.argv[1])
try:
fd=urllib2.urlopen(req)
except urllib2.HTTPError,e:
print "Error reteiveving data:",e
print "Server error document follows:\n"
print e.read
sys.exit(1)
except urllib2.URLError,e:
print "Error retriveving data",e
sys.exit(2)
print "Retrieved",fd.geturl()
info=fd.info()
for key,value in info.items():
print "%s=%s"% (key,value)
运行结果:
D:\python\python.exe E:/code/python/unit6/error_basic.py
https://www.wunderground.com/cgi-bin/findweather/getForcecast
Error reteiveving data: HTTP Error 404: Not Found
Server error document follows:<bound method _fileobject.read of <socket._fileobject object at
0x0216A5B0>>Process finished with exit code 1
注:如果产生了一个HTTPEroor的实力,会捕获异常打印细节。否则,urllib2.URLError类的实例,会显示一条URLError信息。
读取数据错误:
通信错误,会使socket模块调用read()函数时发生socket.error;(会通过系统层传递)
没有通信情况下发送的文档被删节;
(3)
代码:
#coding=utf-8
import sys,urllib2,socket
req=urllib2.Request(sys.argv[1])
try:
fd=urllib2.urlopen(req)
except urllib2.HTTPError,e:
print "Error retrieving data:",e
print "Sever error document follows:\n"
print e.read()
sys.exit(1)
except urllib2.URLError,e:
print "Error retrieving data:",e
sys.exit(2)
print "Retrieved",fd.geturl()
bytesread=0
while 1:
try:
data=fd.read(1024)
except socket.error,e:
print "Error reading data:",e
sys.exit(3)
if not len(data):
break
bytesread+=len(data)
sys.stdout.write(data)
if fd.info().has_key('Content-Length') and long(fd.info()['Content-Length'])!=long(bytesread):
print "Excepted a document of size %d,but read %d bytes"%(long(fd.info()['Content-Length']),bytesread)
sys.exit(4)
运行结果:
> D:\python\python.exe E:/code/python/unit6/erroe_all.py
> https://www.wunderground.com/cgi-bin/findweather/getForcecast
> Error retrieving data: HTTP Error 404: Not Found
> Sever error document follows:
>
>
> <!DOCTYPE html>
> <!--[if IE 9]><html class="no-js ie9"> <![endif]-->
> <!--[if gt IE 9]><!--> <html class="no-js "> <!--<![endif]-->
> <head>
> <title>Error | Weather Underground</title>
> <link href="//icons.wxug.com/" rel="dns-prefetch" />
> <link href="//api-ak.wunderground.com/" rel="dns-prefetch" />
> <meta charset="utf-8">
> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
共同学习,写下你的评论
评论加载中...
作者其他优质文章