我正在尝试通过提取中的城市和区号来抓取此阿富汗页面table。现在,当我尝试刮擦此美国萨摩亚页面时,findAll()找不到<td>正确的页面。如何捕获此异常?这是我的代码:from bs4 import BeautifulSoup import urllib2 import re url = "http://www.howtocallabroad.com/american-samoa"html_page = urllib2.urlopen(url)soup = BeautifulSoup(html_page)areatable = soup.find('table',{'id':'codes'})d = {}def chunks(l, n): return [l[i:i+n] for i in range(0, len(l), n)]li = dict(chunks([i.text for i in areatable.findAll('td')], 2))if li != []: print li for key in li: print key, ":", li[key]else: print "list is empty"这是我得到的错误Traceback (most recent call last): File "extract_table.py", line 15, in <module> li = dict(chunks([i.text for i in areatable.findAll('td')], 2))AttributeError: 'NoneType' object has no attribute 'findAll'我也试过了,但是也没用def gettdtag(tag): return "empty" if areatable.findAll(tag) is None else tagall_td = gettdtag('td')print all_td
1 回答
data:image/s3,"s3://crabby-images/ec182/ec1829210f261145bb2e46345c51529c9edc3a93" alt="?"
临摹微笑
TA贡献1982条经验 获得超2个赞
错误说areatable是None:
areatable = soup.find('table',{'id':'codes'})
#areatable = soup.find('table', id='codes') # Also works
if areatable is None:
print 'Something happened'
# Exit out
另外,我会用find_all代替findAll和get_text()代替text。
添加回答
举报
0/150
提交
取消