使用 Requests 库请求网址
在 Python 爬虫中,我们使用的最多的就是 requests 库, 截止到 2020年6月,request 库最新的版本为 v2.24.0。来看放放文档介绍:
Requests is an elegant and simple HTTP library for Python, built for human beings.
Requests 是 Python 中的一个简洁优雅的第三方库,且其比较符合人们的使用习惯,这也是大部分人会使用 Requests 来模拟 Http 请求的原因。接下来我们会从使用和源码角度来谈一谈 Requests 库。
1. Requests 库的使用
通常对于 Python 第三方模块的学习方式都是一样的。第一步都是先安装,然后是不断的使用和参考官方文档,待熟练掌握后便可以翻看其源码深入学习其实现原理,最后达到彻底掌握该模块的地步。
[store@server2 chap02]$ pip3 install requests -i http://pypi.douban.com/simple/
接下来我们参考官方文档的第一个实例进行测试,该实例主要是测试 requests 库的一些方法及其使用场景,后面我们会使用 requests 库对网页的数据进行手工爬取以比较和框架爬虫之间的区别。后续都将会在 CentOS7.8 和 Python 3 的环境下:
[store@server2 chap02]$ python3
Python 3.6.8 (default, Apr 2 2020, 13:34:55)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>>
接下来我们使用 requests 模块的 get()
方法模拟 http 的 get 请求,获取这样的页面结果:
>>> headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'}
>>> r1 = requests.get(url='https://baike.baidu.com/item/scrapy', headers=headers)
>>> r1.status_code
200
>>> r1.text[:1000]
'<!DOCTYPE html>\n<!--STATUS OK-->\n<html>\n\n\n\n<head>\n<meta charset="UTF-8">\n<meta http-equiv="X-UA-Compatible" content="IE=Edge" />\n<meta name="referrer" content="always" />\n<meta name="description" content="Scrapy是适用于Python的一个快速、高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。Scrapy吸引人的地方在于它是一个框架,任何人都可以根据需求方便的修改。它也提供了多种类型爬虫的基类,如BaseSpider、sitemap爬虫等,最新版本又提供了web2.0爬虫的支持。...">\n<title>scrapy_百度百科</title>\n<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />\n<link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu.svg">\n\n<meta name="keywords" content="scrapy scrapy基本功能 scrapyScrapy架构 scrapy如何开始">\n<meta name="image" content="https://bkssl.bdimg.com/cms/static/baike.png">\n<meta name="csrf-token" content="">\n<meta itemprop="dateUpdate" content="2020-03-19 08:23:19" />\n\n<!--[if lte IE 9]>\n<script>\r\n (function() {\r\n var e = "abbr,article,aside,audio,canvas,datalist,details,dialog,eventsource,figure,footer,header,hgroup,mark,menu,meter,nav,outpu
注意:这里 headers 非常重要,很多网站第一步会检查 headers,如果请求头中没有 User-Agent 就会直接判定为爬虫并采取相应措施进行限制。如下是没有加上 headers 的请求结果:
看到了么,简简单单的 get()
方法就能模拟 HTTP 的 get 请求,那么是不是还有 post()
、put()
、delete()
这些方法呢?答案是肯定的。
1.1 httpbin
httpbin 这个网站能测试 HTTP 请求和响应的各种信息,比如 cookie、ip、headers 和登录验证等,且支持 get、post、put、delete 等多种方法,对 Web 开发和测试很有帮助。接下来,我们就用 requests 在这个网站上测试下其他的 HTTP 请求:
>>> import requests
>>> r = requests.post('https://httpbin.org/post', data = {'key':'value'})
>>> r.text
'{\n "args": {}, \n "data": "", \n "files": {}, \n "form": {\n "key": "value"\n }, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Content-Length": "9", \n "Content-Type": "application/x-www-form-urlencoded", \n "Host": "httpbin.org", \n "User-Agent": "python-requests/2.24.0", \n "X-Amzn-Trace-Id": "Root=1-5ef4800b-da26cce71993bd5eb803d7c9"\n }, \n "json": null, \n "origin": "47.115.61.209", \n "url": "https://httpbin.org/post"\n}\n'
>>> r.json()
{'args': {'key1': 'value1', 'key2': ['value2', 'value3']}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.24.0', 'X-Amzn-Trace-Id': 'Root=1-5ef48767-49d16380b92523febb87f110'}, 'origin': '47.115.61.209', 'url': 'https://httpbin.org/get?key1=value1&key2=value2&key2=value3'}
>>>
在上面的示例代码中,我们使用 requests 库模拟发送了一个 post 请求,且带上了一个参数: key=value
。可以看到网站返回的结果是 json 形式的数据,包括了我们发生的数据、请求的头部、来源地址等。
1.2 带参数的 get 请求
我们再来看看 get 请求带参数的方式,示例代码如下:
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get('https://httpbin.org/get', params=payload)
>>> r.url
'https://httpbin.org/get?key1=value1&key2=value2'
可以看到 get 请求所携带的参数就是在 url 后使用 ? 将参数的 key 和 value 组合起来,形成完整的请求 url。下面是 get 请求带参数的另一个例子,这里参数 key2 的值是一个列表。
>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
>>> r = requests.get('https://httpbin.org/get', params=payload)
>>> r.url
'https://httpbin.org/get?key1=value1&key2=value2&key2=value3'
来看看 request 库请求的结果:
>>> type(r)
<class 'requests.models.Response'>
>>> dir(r)
['__attrs__', '__bool__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__nonzero__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_content', '_content_consumed', '_next', 'apparent_encoding', 'close', 'connection', 'content', 'cookies', 'elapsed', 'encoding', 'headers', 'history', 'is_permanent_redirect', 'is_redirect', 'iter_content', 'iter_lines', 'json', 'links', 'next', 'ok', 'raise_for_status', 'raw', 'reason', 'request', 'status_code', 'text', 'url']
这里用的最多的有5个,分别为 enconding、status_code、text、content 和 url,它们的含义如下:
- encoding:当读取 r.text 时会使用该值进行编解码;
- status_code:请求返回状态码,200 表示正常;
- text:返回请求的内容,使用 unicode 编码;
- content:返回请求的内容,字节编码;
- url:最终请求的 url。
此外,对于所有的请求,可以带上 headers 参数,这样可以模拟成浏览器的行为。通常不带 headers 很容易就被识别为爬虫程序,通过百度网站的 get 请求就可以看到。带上正常的 header 和 不带或者带上错误的 header 得到的结果不一样:
>>> url = 'https://www.baidu.com'
>>> headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'}
>>> r = requests.get(url, headers=headers)
>>> r.text[:1000]
'<!DOCTYPE html><!--STATUS OK-->\n\n\n <html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><meta name="description" content="全球最大的中文搜索引擎、致力于让网民更便捷地获取信息,找到所求。百度超过千亿的中文网页数据库,可以瞬间找到相关的搜索结果。"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="百度搜索" /><link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu_85beaf5496f291521eb75ba38eacbd87.svg"><link rel="dns-prefetch" href="//dss0.bdstatic.com"/><link rel="dns-prefetch" href="//dss1.bdstatic.com"/><link rel="dns-prefetch" href="//ss1.bdstatic.com"/><link rel="dns-prefetch" href="//sp0.baidu.com"/><link rel="dns-prefetch" href="//sp1.baidu.com"/><link rel="dns-prefetch" href="//sp2.baidu.com"/><title>百度一下,你就知道</title><style index="newi" type="text/css">
>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get(url, headers=headers)
>>> r.text
'<html>\r\n<head>\r\n\t<script>\r\n\t\tlocation.replace(location.href.replace("https://","http://"));\r\n\t</script>\r\n</head>\r\n<body>\r\n\t<noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>\r\n</body>\r\n</html>'
前面我们也介绍过 requests 库的 post 请求,其参数通过 data 进行传递,下面继续看几个示例:
>>> payload_tuples = [('key1', 'value1'), ('key1', 'value2')]
>>> r1 = requests.post('https://httpbin.org/post', data=payload_tuples)
>>> payload_dict = {'key1': ['value1', 'value2']}
>>> r2 = requests.post('https://httpbin.org/post', data=payload_dict)
>>> print(r1.text)
{
"args": {},
"data": "",
"files": {},
"form": {
"key1": [
"value1",
"value2"
]
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "23",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.24.0",
"X-Amzn-Trace-Id": "Root=1-5ef49697-c3f6e2a809e33d4895ee6938"
},
"json": null,
"origin": "47.115.61.209",
"url": "https://httpbin.org/post"
}
上传文件
最后看一看 requests 库中如何上传文件:
>>> url = 'https://httpbin.org/post'
>>> files = {'file': open('/home/store/shen/start.sh', 'rb')}
>>> r = requests.post(url, files=files)
>>> r.text
'{\n "args": {}, \n "data": "", \n "files": {\n "file": "#!/bin/bash\\n########################################################\\n# author: spyinx (https://blog.csdn.net/qq_40085317) #\\n# email: 2894577759@qq.com #\\n# date: 2020/6/24 #\\n# function: start agent server on CentOS 7.7 #\\n########################################################\\nAGENT_PORT=8765\\n\\n# check the agent process first\\nmain_pid=$(pstree -ap|grep gunicorn|grep -v grep|awk \'NR==1{print}\'|grep -o \\"[0-9]*\\"|awk \'NR==1{print}\')\\nif [ -n \\"$main_pid\\" ]; then\\n echo \\"get the agent server\'s main pid: $main_pid\\"\\n sudo kill -9 $main_pid\\n echo \\"stop the server first\\"\\n sleep 15\\n process_num=$(ps -ef|grep gunicorn|grep -v grep|wc -l)\\n if [ $process_num -ne 0 ]; then\\n echo \\"close agent server failed\\uff0cexit!\\"\\n exit 1\\n fi\\nfi\\n\\n# start agent server\\nmaster_addr=$(cat /etc/hosts | grep `hostname` | awk \'{print $1}\')\\necho \\"start agent server\\"\\ngunicorn -w 4 -b $master_addr:$AGENT_PORT xstore_agent.agent:app --daemon\\nsleep 5\\nprocess_num=$(ps -ef|grep gunicorn|grep -v grep|wc -l)\\nif [ $process_num -eq 0 ]; then\\n echo \\"start agent server failed\\uff0cplease check it!\\"\\n exit 2\\nfi\\necho \\"start agent server success\\uff0cok!\\""\n }, \n "form": {}, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Content-Length": "1356", \n "Content-Type": "multipart/form-data; boundary=565e2040b1d37bad527477863e64ba6c", \n "Host": "httpbin.org", \n "User-Agent": "python-requests/2.24.0", \n "X-Amzn-Trace-Id": "Root=1-5ef49e5f-a02b3e64f58fe4a3ff51fa94"\n }, \n "json": null, \n "origin": "47.115.61.209", \n "url": "https://httpbin.org/post"\n}\n'
>>>
在 requests 库中,只需要将上传文件参数传递给 post()
方法即可,是不是非常简单?另外,我们还可以在请求中添加 cookie 或者在相应中获取相应的 cookie 信息。
另外,我们还可以使用 requests 的 Session 来维持会话,这在有登录需求的网站获取数据时会非常有用:
# 创建一个session对象,用来存储session信息
>>> s = requests.session()
>>> s.get("http://www.baidu.com")
如果在登录之后,继续使用 session 对象再请求该网站的其他页面的 url,就会带着 session 信息去与该网站进行交互,模拟登录后的访问。
2. Requests 库源码分析
熟悉了上面的基本操作之后,我带领大家简单看看 requests 库的源码。首先我们带着问题去看源码:
为什么对返回的结果直接使用 json()
方法就能将相应内容转换成 json 格式,它和下面的实现有无区别?
>>> r = requests.post('https://httpbin.org/post', data = {'key':'value'})
>>> import json
>>> json.loads(r.text)
requests.get()
背后的实现过程究竟是怎样的呢?
2.1 json() 方法
带着这两个问题我们来看看 requests 库的源码,可以看到 requests 模块的源码非常少,比较适合阅读。首先看第一个问题,就是要分析下 Response 结果的 json()
方法即可,比较容易找到:
# 源码位置:requests/models.py
# ...
class Response(object):
# ...
def json(self, **kwargs):
r"""Returns the json-encoded content of a response, if any.
:param \*\*kwargs: Optional arguments that ``json.loads`` takes.
:raises ValueError: If the response body does not contain valid json.
"""
if not self.encoding and self.content and len(self.content) > 3:
# No encoding set. JSON RFC 4627 section 3 states we should expect
# UTF-8, -16 or -32. Detect which one to use; If the detection or
# decoding fails, fall back to `self.text` (using chardet to make
# a best guess).
encoding = guess_json_utf(self.content)
if encoding is not None:
try:
return complexjson.loads(
self.content.decode(encoding), **kwargs
)
except UnicodeDecodeError:
# Wrong UTF codec detected; usually because it's not UTF-8
# but some other 8-bit codec. This is an RFC violation,
# and the server didn't bother to tell us what codec *was*
# used.
pass
return complexjson.loads(self.text, **kwargs)
# ...
上面的 json()
方法中最核心的只有一句:
complexjson.loads(self.content.decode(encoding), **kwargs)
而这句和我们前面的得到响应内容,然后使用 json.loads()
是一样的,不过这里使用的是 complexjson
。继续看看这个 complexjson
的定义:
# 源码位置:requests/models.py
from .compat import json as complexjson
# 源码位置:requests/compact.py
try:
import simplejson as json
except ImportError:
import json
可以看到,这个 complexjson
其实就是 Python 的第三方 json 模块或者是 Python 的内置 json 模块。因此,对于第一个问题就是显而易见了,使用 r.json()
和我们用 json.loads(r.text)
得到的结果基本是一致的。
2.2 get() 方法
接下来我们要追踪一下 requests.get()
请求的完整过程。首先是找到相应的 get()
方法:
# 源码位置: requests/api.py
from . import sessions
def request(method, url, **kwargs):
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
def get(url, params=None, **kwargs):
kwargs.setdefault('allow_redirects', True)
return request('get', url, params=params, **kwargs)
def options(url, **kwargs):
kwargs.setdefault('allow_redirects', True)
return request('options', url, **kwargs)
def head(url, **kwargs):
kwargs.setdefault('allow_redirects', False)
return request('head', url, **kwargs)
def post(url, data=None, json=None, **kwargs):
return request('post', url, data=data, json=json, **kwargs)
def put(url, data=None, **kwargs):
return request('put', url, data=data, **kwargs)
def patch(url, data=None, **kwargs):
return request('patch', url, data=data, **kwargs)
def delete(url, **kwargs):
return request('delete', url, **kwargs)
可以看到,所有的请求最后都是调用同一个 session.request()
方法,我们继续追进去:
# 源码位置:requests/sessions.py
# ...
class Session(SessionRedirectMixin):
# ...
# 有了这两个方法就可以使用 with 语句了:
# with Session() as session:
# pass
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
# ...
def request(self, method, url,
params=None, data=None, headers=None, cookies=None, files=None,
auth=None, timeout=None, allow_redirects=True, proxies=None,
hooks=None, stream=None, verify=None, cert=None, json=None):
# Create the Request.
req = Request(
method=method.upper(),
url=url,
headers=headers,
files=files,
data=data or {},
json=json,
params=params or {},
auth=auth,
cookies=cookies,
hooks=hooks,
)
prep = self.prepare_request(req)
proxies = proxies or {}
settings = self.merge_environment_settings(
prep.url, proxies, stream, verify, cert
)
# Send the request.
send_kwargs = {
'timeout': timeout,
'allow_redirects': allow_redirects,
}
send_kwargs.update(settings)
# 核心地方,发送 http 请求
resp = self.send(prep, **send_kwargs)
return resp
# ...
我们不过多陷入细节,这些细节函数由读者自行去跟踪和调试。我们从上面的代码中可以看到核心发送 http 请求的代码如下:
resp = self.send(prep, **send_kwargs)
prep
是一个 PreparedRequest
类实例,它和 Request
类非常像。我们继续追踪这个 send()
方法的源码:
# 源码位置:requests/sessions.py:
# ...
class Session(SessionRedirectMixin):
# ...
def send(self, request, **kwargs):
"""Send a given PreparedRequest.
:rtype: requests.Response
"""
# Set defaults that the hooks can utilize to ensure they always have
# the correct parameters to reproduce the previous request.
kwargs.setdefault('stream', self.stream)
kwargs.setdefault('verify', self.verify)
kwargs.setdefault('cert', self.cert)
kwargs.setdefault('proxies', self.proxies)
# It's possible that users might accidentally send a Request object.
# Guard against that specific failure case.
if isinstance(request, Request):
raise ValueError('You can only send PreparedRequests.')
# Set up variables needed for resolve_redirects and dispatching of hooks
allow_redirects = kwargs.pop('allow_redirects', True)
stream = kwargs.get('stream')
hooks = request.hooks
# Get the appropriate adapter to use
adapter = self.get_adapter(url=request.url)
# Start time (approximately) of the request
start = preferred_clock()
# Send the request
r = adapter.send(request, **kwargs)
# Total elapsed time of the request (approximately)
elapsed = preferred_clock() - start
r.elapsed = timedelta(seconds=elapsed)
# Response manipulation hooks
r = dispatch_hook('response', hooks, r, **kwargs)
# Persist cookies
if r.history:
# If the hooks create history then we want those cookies too
for resp in r.history:
extract_cookies_to_jar(self.cookies, resp.request, resp.raw)
extract_cookies_to_jar(self.cookies, request, r.raw)
# Resolve redirects if allowed.
if allow_redirects:
# Redirect resolving generator.
gen = self.resolve_redirects(r, request, **kwargs)
history = [resp for resp in gen]
else:
history = []
# Shuffle things around if there's history.
if history:
# Insert the first (original) request at the start
history.insert(0, r)
# Get the last request made
r = history.pop()
r.history = history
# If redirects aren't being followed, store the response on the Request for Response.next().
if not allow_redirects:
try:
r._next = next(self.resolve_redirects(r, request, yield_requests=True, **kwargs))
except StopIteration:
pass
if not stream:
r.content
return r
代码会有点长,大家需要自行看看这个方法的逻辑,不要陷入细节。从上面的代码我们可以发现两个关键语句:
adapter = self.get_adapter(url=request.url)
:获取合适的请求适配器;r = adapter.send(request, **kwargs)
:发送请求,获取响应结果;
第一个 adapter 怎么来的呢?继续看那个 self.get_adapter()
方法:
# 源码位置:requests/sessions.py:
# ...
class Session(SessionRedirectMixin):
# ...
def __init__(self):
# ...
# Default connection adapters.
self.adapters = OrderedDict()
self.mount('https://', HTTPAdapter())
self.mount('http://', HTTPAdapter())
# ...
def get_adapter(self, url):
"""
Returns the appropriate connection adapter for the given URL.
:rtype: requests.adapters.BaseAdapter
"""
for (prefix, adapter) in self.adapters.items():
if url.lower().startswith(prefix.lower()):
return adapter
# Nothing matches :-/
raise InvalidSchema("No connection adapters were found for {!r}".format(url))
# ...
其实仔细在分析下,就可以知道我们在初始化 (__init__.py) 中添加了请求前缀 prefix (https://
和 http://)
对应的连接适配器 (HTTPAdapter()
),因此这里 adapter
对应的就是 HTTPAdapter
类实例。此时要找发送 http 请求的 send()
方法就需要去 ``HTTPAdapter` 中查找:
# 源码位置:requests/adapters.py
# ...
class BaseAdapter(object):
"""The Base Transport Adapter"""
def __init__(self):
super(BaseAdapter, self).__init__()
def send(self, request, stream=False, timeout=None, verify=True,
cert=None, proxies=None):
raise NotImplementedError
def close(self):
"""Cleans up adapter specific items."""
raise NotImplementedError
class HTTPAdapter(BaseAdapter):
# ...
def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
try:
conn = self.get_connection(request.url, proxies)
# 自行加上一个打印语句,查看conn类型
# print('conn:', type(conn))
except LocationValueError as e:
raise InvalidURL(e, request=request)
self.cert_verify(conn, request.url, verify, cert)
url = self.request_url(request, proxies)
self.add_headers(request, stream=stream, timeout=timeout, verify=verify, cert=cert, proxies=proxies)
chunked = not (request.body is None or 'Content-Length' in request.headers)
# ...
try:
if not chunked:
resp = conn.urlopen(
method=request.method,
url=url,
body=request.body,
headers=request.headers,
redirect=False,
assert_same_host=False,
preload_content=False,
decode_content=False,
retries=self.max_retries,
timeout=timeout
)
# Send the request.
else:
# ...
except (ProtocolError, socket.error) as err:
raise ConnectionError(err, request=request)
except MaxRetryError as e:
# ...
except ClosedPoolError as e:
raise ConnectionError(e, request=request)
except _ProxyError as e:
raise ProxyError(e)
except (_SSLError, _HTTPError) as e:
# ...
return self.build_response(request, resp)
就我们前面的请求而言,request.body
往往为 None,所以 chunked
一般为 False。那么最终的请求走的就是conn.urlopen()
方法。
注意:这里最关键的步骤是得到连接远端服务的信息 conn,后面发送数据都是通过 conn 走的。
# 源码位置:requests/adapters.py
# ...
class BaseAdapter(object):
"""The Base Transport Adapter"""
def get_connection(self, url, proxies=None):
"""Returns a urllib3 connection for the given URL. This should not be
called from user code, and is only exposed for use when subclassing the
:class:`HTTPAdapter <requests.adapters.HTTPAdapter>`.
:param url: The URL to connect to.
:param proxies: (optional) A Requests-style dictionary of proxies used on this request.
:rtype: urllib3.ConnectionPool
"""
proxy = select_proxy(url, proxies)
if proxy:
# 使用代理
# ...
else:
# Only scheme should be lower case
parsed = urlparse(url)
url = parsed.geturl()
conn = self.poolmanager.connection_from_url(url)
return conn
我们可以运行并打印这个 conn 变量。这里需要改源代码,在源码位置加上一行 print()
方法:
>>> import requests
>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
>>> r = requests.get('https://httpbin.org/get', params=payload)
conn: <class 'urllib3.connectionpool.HTTPSConnectionPool'>
>>>
我们终于看到,最后 requests 库其实就是封装 Python 内置的 urllib3 模块来完成 http 请求的。上面获取 conn 值的代码比较多且绕,有兴趣的读者可以自行跟踪下,限于篇幅,这里就不过多描述了。
3. 小结
本小节中我们首先从使用 Requests 库入手,介绍了其常用的类和方法;接下来从源码的角度来查看 Requests 库,挖掘一些库背后的实现原理,帮助我们更好的理解 Requests 库。今天的学习到此结束,大家有收获了吗?