获取p段落文字 然后指定其class 来获取它的内容
print'获取p段落文字'
p_node=soup.find('p',class='title')
print p_node.name, p_name.get_text()
print'获取p段落文字'
p_node=soup.find('p',class='title')
print p_node.name, p_name.get_text()
2017-10-29
bs支持正则表达式的匹配 模糊匹配#
print '正则匹配'
href=re.compile(r'ill')
r'..' 如果正则表达式中出现反斜线 只需要写一个反斜线
print '正则匹配'
href=re.compile(r'ill')
r'..' 如果正则表达式中出现反斜线 只需要写一个反斜线
2017-10-29
给出的这个调度时序图很像讲操作系统里面的multithreading的时候,file IO和read带自身优先级的CPU和Algorithm资源配置调度
2017-10-28
#py3 实例 查看百度首页图片
import urllib.request
from bs4 import BeautifulSoup
url = "http://www.baidu.com/"
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
data = response.read()
data = data.decode('utf-8')
soup = BeautifulSoup(data,'html.parser')
print(soup.find_all('img'))
import urllib.request
from bs4 import BeautifulSoup
url = "http://www.baidu.com/"
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
data = response.read()
data = data.decode('utf-8')
soup = BeautifulSoup(data,'html.parser')
print(soup.find_all('img'))
2017-10-22
python3.6代码:https://github.com/Nana0606/PythonProject/tree/master/spider_me(将结果改成了输出100条url的信息)
2017-10-21