3 回答
TA贡献1804条经验 获得超3个赞
所以你只需要 1,3,5,7 元素,你可以这样做:
代码:
from bs4 import BeautifulSoup as soup
html = """<div class="article-intro">
<p>tekst 1</p>
<p>none</p>
<p>tekst 2</p>
<p>none</p>
<p>tekst 3</p>
<p>none</p>
<p>tekst 4</p>
</div>"""
page = soup(html, 'html.parser')
div = page.find('div',{'class':'article-intro'})
ps = div.find_all('p')
for i in range(len(ps)):
if i % 2 == 0:
print(ps[i].text)
输出:
tekst 1
tekst 2
tekst 3
tekst 4
TA贡献1836条经验 获得超3个赞
使用正则表达式re并搜索文本。
from bs4 import BeautifulSoup
import re
html='''<div class="article-intro">
<p>tekst 1</p>
<p>none</p>
<p>tekst 2</p>
<p>none</p>
<p>tekst 3</p>
<p>none</p>
<p>tekst 4</p>
</div>'''
soup=BeautifulSoup(html,'html.parser')
for item in soup.find('div', class_='article-intro').find_all('p', text=re.compile('tekst')):
print(item.text)
输出:
tekst 1
tekst 2
tekst 3
tekst 4
或者你可以使用 pythonlambda函数。
from bs4 import BeautifulSoup
html='''<div class="article-intro">
<p>tekst 1</p>
<p>none</p>
<p>tekst 2</p>
<p>none</p>
<p>tekst 3</p>
<p>none</p>
<p>tekst 4</p>
</div>'''
soup=BeautifulSoup(html,'html.parser')
for item in soup.find('div', class_='article-intro').find_all(lambda tag:tag.name=='p' and 'tekst' in tag.text):
print(item.text)
输出:
tekst 1
tekst 2
tekst 3
tekst 4
TA贡献1835条经验 获得超7个赞
一些不同的选择取决于你真正想做的事情。使用 bs4 4.7.1。
from bs4 import BeautifulSoup as bs
html = '''
<div class="article-container">
<p>tekst 1</p> <!-- this tag -->
<p>none</p>
<p>tekst 2</p> <!-- this tag -->
<p>none</p>
<p>tekst 3</p> <!-- this tag -->
<p>none</p>
<p>tekst 4</p> <!-- this tag -->
</div>
'''
soup = bs(html, 'lxml')
#odd indices
items = [item.text for item in soup.select('.article-container p:nth-child(odd)')]
print(items)
#excluding None
items = [item.text for item in soup.select('.article-container p:not(:contains("none"))')]
print(items)
#including tekst
items = [item.text for item in soup.select('.article-container p:contains("tekst")')]
print(items)
#providing nth list
items = [item.text for item in soup.select('.article-container p:nth-of-type(1), .article-container p:nth-of-type(3), .article-container p:nth-of-type(5), .article-container p:nth-of-type(7)')]
print(items)
添加回答
举报