首页猿问使用来自网站的...

使用来自网站的 BeautifulSoup 仅获取一些标签

Python

繁花不似锦 2022-03-05 15:13:54

我尝试仅从 selectet 标签中获取文本，例如：<div class="article-container"> tekst 1  none tekst 2  none tekst 3  none tekst 4 </div>我尝试获取'tekst 1 tekst 2 tekst 3 tekst 4'（但标签中的文本完全不同'tekst 1'等只是示例），我的简单 python 函数如下所示：def get_article(url): page = requests.get(str(url)) soup = BeautifulSoup(page.text, 'html.parser') article = soup.find(class_='article-container') article_only = article.text return(article_only)但他返回了整个文本。有没有办法像上面的例子一样使用 BS 来获取选定的元素？

查看完整描述

3 回答

狐的传说

TA贡献1804条经验获得超3个赞

所以你只需要 1,3,5,7 元素，你可以这样做：

代码：

from bs4 import BeautifulSoup as soup

html = """<div class="article-intro">

tekst 1

none

tekst 2

none

tekst 3

none

tekst 4

</div>"""

page = soup(html, 'html.parser')

div = page.find('div',{'class':'article-intro'})

ps = div.find_all('p')

for i in range(len(ps)):

if i % 2 == 0:

print(ps[i].text)

输出：

tekst 1

tekst 2

tekst 3

tekst 4

反对回复 2022-03-05

米脂

TA贡献1836条经验获得超3个赞

使用正则表达式re并搜索文本。

from bs4 import BeautifulSoup

import re

html='''<div class="article-intro">

tekst 1

none

tekst 2

none

tekst 3

none

tekst 4

</div>'''

soup=BeautifulSoup(html,'html.parser')

for item in soup.find('div', class_='article-intro').find_all('p', text=re.compile('tekst')):

print(item.text)

输出：

tekst 1

tekst 2

tekst 3

tekst 4

或者你可以使用 pythonlambda函数。

from bs4 import BeautifulSoup

html='''<div class="article-intro">

tekst 1

none

tekst 2

none

tekst 3

none

tekst 4

</div>'''

soup=BeautifulSoup(html,'html.parser')

for item in soup.find('div', class_='article-intro').find_all(lambda tag:tag.name=='p' and 'tekst' in tag.text):

print(item.text)

输出：

tekst 1

tekst 2

tekst 3

tekst 4

反对回复 2022-03-05

qq_花开花谢_0

TA贡献1835条经验获得超7个赞

一些不同的选择取决于你真正想做的事情。使用 bs4 4.7.1。

from bs4 import BeautifulSoup as bs

html = '''

tekst 1

none

tekst 2

none

tekst 3

none

tekst 4

</div>

'''

soup = bs(html, 'lxml')

#odd indices

items = [item.text for item in soup.select('.article-container p:nth-child(odd)')]

print(items)

#excluding None

items = [item.text for item in soup.select('.article-container p:not(:contains("none"))')]

print(items)

#including tekst

items = [item.text for item in soup.select('.article-container p:contains("tekst")')]

print(items)

#providing nth list

items = [item.text for item in soup.select('.article-container p:nth-of-type(1), .article-container p:nth-of-type(3), .article-container p:nth-of-type(5), .article-container p:nth-of-type(7)')]

print(items)

反对回复 2022-03-05

3 回答
0 关注
184 浏览

关注

添加回答

0/150

提交

取消

使用来自网站的 BeautifulSoup 仅获取一些标签 <p>

使用来自网站的 BeautifulSoup 仅获取一些标签 <p>

3 回答

添加回答

热搜

最近搜索清空

使用来自网站的 BeautifulSoup 仅获取一些标签 <p>

使用来自网站的 BeautifulSoup 仅获取一些标签 <p>

3 回答

添加回答