-
import urllib.request import http.cookiejar url = "http://www.baidu.com" print("第一种方法") response1 = urllib.request.urlopen(url) print(response1.getcode()) print(len(response1.read())) print("第二种方法") request = urllib.request.Request(url) request.add_header("user-agent", "Mozilla/5.0") response2 = urllib.request.urlopen(request) print(response2.getcode()) print(len(response2.read())) print("第三种方法") cj = http.cookiejar.CookieJar() opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) urllib.request.install_opener(opener) response3 = urllib.request.urlopen(url) print(response3.getcode()) print(cj) print(response3.read())
查看全部 -
URL管理器
查看全部 -
爬虫运行流程
查看全部 -
简单爬虫结构运行流程
查看全部 -
基础的爬虫架构
查看全部 -
python3:
from urllib import request as urllib2
import http.cookiejar as cookielib
url = 'https://www.baidu.com'
# 第一种
response1 = urllib2.urlopen(url)
print(response1.getcode())
# 第二种
request = urllib2.Request(url)
request.add_header('user-agent','Mozilla/5.0')
response2 = urllib2.urlopen(url)
print(response2.getcode())
# 第三种
cookie = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
urllib2.install_opener(opener)
response3 = urllib2.urlopen(url)
print(response3.getcode())查看全部 -
6.2 beautifulSoup 模块安装,还没装好,
查看全部 -
访问节点信息
查看全部 -
# !/usr/bin/python # -*-coding:utf-8-*- import urllib from urllib import request from bs4 import BeautifulSoup response = request.urlopen("http://src.51elab.com") html = response.read() data = html.decode('utf-8') soup = BeautifulSoup(data) # print soup.findAll('span') for item in soup.find_all("a"): if item.string == None: continue else: # print type(item.string) #print item.string+":"+item.get("href") print(item.string,":",item.get("href"))
python3上爬取网页内容并显示
查看全部 -
# -*- coding: utf-8 -*- from urllib import request url = "http://www.51elab.com" response = request.urlopen(url) content=response.read() fp=open("test.htm","w+b") fp.write(content) fp.close()
python3读取网页并保存在本地文本中
查看全部 -
连续查询find
查看全部 -
cookie处理器查看全部
-
用户登录后访问,添加cookie处理:HTTPCookieProcessor
代理访问:ProxyHandler
网页HTTPS加密访问:HTTPSHandler
网页相互跳转关系:HTTPRedirectHandler
查看全部 -
几种处理器~查看全部
-
模拟mozilla服务器~
查看全部
举报