在抓取一个页面时HTMLParser.HTMLParseError: malformed start tag
在采用BeautifulSoup提取html页面时,出现HTMLParser.HTMLParseError: malformed start tag的错误,请问如何解决?
在采用BeautifulSoup提取html页面时,出现HTMLParser.HTMLParseError: malformed start tag的错误,请问如何解决?
2016-10-11
$ pip install beautifulsoup4
$ pip install html5lib
Python:
from bs4 import BeautifulSoup
import urllib2
url = 'http://www.example.com'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read(), 'html5lib')
links = soup.findAll('a')for link in links:
print link.string, link['href']
举报