建议遇到问题的同学先修改以下三处代码
listurl = re.findall(r'//.+?\.jpg*', buf) #匹配src中的内容
f = open('D:/picture/' + str(i) + '.jpg', 'wb') #将图片存到D盘下的picture中
req = urllib2.urlopen('http:'+url) #爬取图片
listurl = re.findall(r'//.+?\.jpg*', buf) #匹配src中的内容
f = open('D:/picture/' + str(i) + '.jpg', 'wb') #将图片存到D盘下的picture中
req = urllib2.urlopen('http:'+url) #爬取图片
2018-01-18
i = 0
old_url = ''
for _url in listurl:
f = open(str(i)+'.jpg','wb')
url = 'http:'+_url
if url == old_url:
continue
old_url = url
#print (url,'')
req = request.urlopen(url)
buf = req.read()
f.write(buf)
i += 1
f.close()
print ('download %s '%(i))
old_url = ''
for _url in listurl:
f = open(str(i)+'.jpg','wb')
url = 'http:'+_url
if url == old_url:
continue
old_url = url
#print (url,'')
req = request.urlopen(url)
buf = req.read()
f.write(buf)
i += 1
f.close()
print ('download %s '%(i))
2018-01-07
Python3.6版本
from urllib import request
import re
url = 'https://www.imooc.com/course/list'
req = request.urlopen(url)
buf = req.read()
buf = buf.decode('utf-8')
listurl = re.findall(r'\/\/img.+?\.jpg',buf)
#for _url in listurl:
# print(_url)
from urllib import request
import re
url = 'https://www.imooc.com/course/list'
req = request.urlopen(url)
buf = req.read()
buf = buf.decode('utf-8')
listurl = re.findall(r'\/\/img.+?\.jpg',buf)
#for _url in listurl:
# print(_url)
2018-01-07
已采纳回答 / qq_爱吃羊的鲸鱼_0
\1就是代表了前面“([\w]+>)”这些内容,你将\1替换掉就成了ma=re.match(r'<([\w]+>)[\w]+</([\w]+>)','<book>python</book>') 其中括号已经没有意义,去掉后就变成ma=re.match(r'<[\w]+>[\w]+</[\w]+>','<book>python</book>') 这样看就应该没问题了吧。后面加1匹配不出来的原因也是应为&...
2017-12-25
最赞回答 / 华灯初上丶
import reimport urllibreq = urllib.request.urlopen('http://www.imooc.com/course/list')#此处加上decode(),不然拿下来的数据都是乱码buf = req.read().decode("utf-8")#老师讲课的url地址已经发生改变,改一下正则匹配就好# listurl = re.findall(r'src=.+\.jpg', buf)listurl = re.findall(r'//img.+?\.jpg', bu...
2017-12-11