含有中文的url不能download,
包含中文的url都不能download,,,求解
包含中文的url都不能download,,,求解
2019-01-12
import urllib.request
from urllib.parse import quote
import string
class HtmlDownloader(object):
def download(self,url):
if url is None:
return None
s=quote(url,safe=string.printable)
response=urllib.request.urlopen(s)
if response.getcode()!=200:
return None
return response.read()
urllib.quote 解决Python传递中文参数给URL
def _get_new_urls(self, page_url, soup): new_urls = set() #<a target="_blank" href="/item/%E9%98%BF%E5%A7%86%E6%96%AF%E7%89%B9%E4%B8%B9/2259975" data-lemmaid="2259975">阿姆斯特丹</a> #https: // baike.baidu.com / item / 阿姆斯特丹 / 2259975 links = soup.find_all('a',href=re.compile(r"/item/")) for link in links: new_url = '/item/'+link.get_text() new_full_url = urlparse.urljoin(page_url,new_url) new_urls.add(new_full_url) return new_urls
我也是这么写的,有哪里写错了吗?
举报