课程
                    
                        /后端开发
                        
                            /Python
                        
                        /Python开发简单爬虫

通过class获取报错

#!/usr/bin/python
# -*- coding: UTF-8 -*-
# 文件名：test.py
from bs4 import BeautifulSoup
import re
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""


soup = BeautifulSoup(html_doc,'html.parser',from_encoding='utf-8')

print '获取所有的链接'
links = soup.find_all('a')
for link in links:
	print link.name,link['href'],link.get_text()

print '获取lacie的链接'
link_node = soup.find('a',href='http://example.com/lacie')
print link_node.name,link_node['href'],link_node.get_text()

print '获取正则匹配'
link_node = soup.find('a',href=re.compile(r"lli"))
print link_node.name,link_node['href'],link_node.get_text()

print '获取p段落'
p_node = soup.find('p',class_="title")
print p_node.name,p_node.get_text()

报错如下

获取所有的链接

a http://example.com/elsie Elsie

a http://example.com/lacie Lacie

a http://example.com/tillie Tillie

获取lacie的链接

a http://example.com/lacie Lacie

获取正则匹配

a http://example.com/tillie Tillie

获取p段落

Traceback (most recent call last):

File "test.py", line 38, in <module>

print p_node.name,p_node.get_text()

AttributeError: 'NoneType' object has no attribute 'name'

田心枫

2016-03-29

源自：Python开发简单爬虫 6-4

关注问题我要回答

1809

操作

收起

7 回答

mobrary
2017-09-18

from bs4 import BeautifulSoup
from nt import link
import re
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

#创建beautifusoup对象
soup = BeautifulSoup(html_doc, 'html.parser',from_encoding='utf-8')

print('获取所有的链接')
links = soup.find_all('a')#查找所有标签为a的节点
for link in links :
    print (link.name,link['href'],link.get_text())#(节点名称，节点href属性，链接字符)
    
print('单独获取Lacie的链接')
link_node = soup.find('a',href='http://example.com/Lacie')   
print (link_node.name,link_node['href'],link_node.get_text()) 


print('正则匹配')
link_node = soup.find('a',href=re.compile("ill"))   
print (link_node.name,link_node['href'],link_node.get_text()) 


print('获取P段落文字')
p_node = soup.find('p',class_="title")   
print (p_node.name,p_node.get_text())

代码报错：

C:\Python34\lib\site-packages\bs4\__init__.py:146: UserWarning: You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.

warnings.warn("You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.")

获取所有的链接

a http://example.com/elsie Elsie

a http://example.com/lacie Lacie

a http://example.com/tillie Tillie

单独获取Lacie的链接

Traceback (most recent call last):

File "I:\workspace\HelloPython\src\hello\test_bs4.py", line 29, in <module>

print (link_node.name,link_node['href'],link_node.get_text())

AttributeError: 'NoneType' object has no attribute 'name'

代码看了几遍，感觉没问题啊

0 回复有任何疑惑可以回复我~

收起回答

mobrary

求大神解答！！！！

2017-09-18 回复有任何疑惑可以回复我~

Kaysonchan
2017-07-03

#coding:utf-8
import re
from bs4 import BeautifulSoup
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
The Dormouse's story

Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.

...
"""

soup = BeautifulSoup(html_doc,'html.parser',from_encoding='utf-8')

print '获取所有的链接'
links = soup.find_all('a') # 查找含有 a 的标签
for link in links:
print link.name,link['href'],link.get_text()

print '获取lacie的链接'
link_node = soup.find('a',href="http://example.com/lacie")
print link_node.name, link_node['href'], link_node.get_text()

print '正则表达式的匹配'
link_node = soup.find('a',href=re.compile(r'ti'))# 正则表达式的强大 ,r' ' 中，''里面可以加任何字符
print link_node.name, link_node['href'], link_node.get_text()

print '获取P段落文字'
p_node = soup.find('p',Class_= "story") # 这里的 class_ 加下划线是为了与关键字 class 重复
print p_node.name, p_node.get_text()

我这没有报错，但是文字没有打印出来！！

获取所有的链接

a http://example.com/elsie Elsie

a http://example.com/lacie Lacie

a http://example.com/tillie Tillie

获取lacie的链接

a http://example.com/lacie Lacie

正则表达式的匹配

a http://example.com/tillie Tillie

获取P段落文字

Process finished with exit code 1

0 回复有任何疑惑可以回复我~

收起回答

qq_乡_0
2017-01-23

出问题主要是下面的class要加一横杆，如下所示则不会报错，老师有讲到

title_node=soup.find('dd',class_="lemmaWgt-lemmaTitle-title").find("h1")

0 回复有任何疑惑可以回复我~

收起回答

侠客岛的含笑
2016-11-20

为什么我的没问题

0 回复有任何疑惑可以回复我~

收起回答

宇娃
2016-11-12

# coding:utf8

from bs4 import BeautifulSoup
import re

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""


soup = BeautifulSoup(html_doc, 'html.parser', from_encoding="utf-8")

print u'获取所有链接'
links = soup.find_all('a')
for link in links:
    print link.name,link['href'],link.get_text()
    
print '只获取Lacie的链接'
link_node = soup.find('a', href='http://example.com/lacie')    
print link_node.name,link_node['href'],link_node.get_text()


print '正则匹配'
link_node = soup.find('a', href=re.compile(r"ill"))    
print link_node.name,link_node['href'],link_node.get_text()

0 回复有任何疑惑可以回复我~

收起回答

Spectop
2016-04-06

这是我试了无数遍得到的结论，虽然不知道是现在的版本不支持还是什么其他原因：

title_node = soup.find('dd', attrs={'class':'lemmaWgt-lemmaTitle-title'}).find('h1')

我当时这一句也是错的，后来把 class_="???" 换成 attrs={'class':'???'} 就没问题了

1 回复有任何疑惑可以回复我~

收起回答

田心枫提问者
2016-03-29

#!/usr/bin/python
# -*- coding: UTF-8 -*-
# 文件名：test.py
from bs4 import BeautifulSoup
import re
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

soup = BeautifulSoup(html_doc,'html.parser',from_encoding='utf-8')

print '获取所有的链接'
links = soup.find_all('a')
for link in links:
	print link.name,link['href'],link.get_text()

print '获取lacie的链接'
link_node = soup.find('a',href='http://example.com/lacie')
print link_node.name,link_node['href'],link_node.get_text()

print '获取正则匹配'
link_node = soup.find('a',href=re.compile(r"lli"))
print link_node.name,link_node['href'],link_node.get_text()

print '获取p段落'
p_node = soup.find("p", "title")
print p_node.name,p_node.get_text()

看文档解决，不要那个class_

3 回复有任何疑惑可以回复我~

收起回答