2 回答

TA贡献1752条经验 获得超4个赞
对于静态页面的Web抓取,最好使用bs4软件包。并使用bs4可以轻松实现您的目标,如下所示:
from bs4 import BeautifulSoup
source = """<div class="container">
<b>1</b>
<b>2</b>
<b>3</b>
</div>
<div class="container">
<b>4</b>
<b>5</b>
<b>6</b>
</div>"""
soup = BeautifulSoup(source, 'html.parser') # parse content/ page source
soup.find_all('div', {'class': 'container'}) # find all the div element (second argument is optional mentioned to scrape/find only element with attribute value)
print([[int(x.text) for x in i.find_all('b')] for i in soup.find_all('div', {'class': 'container'})]) # get list of all div's number list as you require
输出:
[[1, 2, 3], [4, 5, 6]]

TA贡献1829条经验 获得超9个赞
您可以使用此xpath表达式,这将给您两个字符串
.//*[@class='container'] ➡ '1 2 3', '4 5 6'
如果您希望使用6根琴弦
.//*[@class='container']/b ➡ '1','2','3','4','5','6'
尽管您必须将xpath表达式分开,以获取所需的确切信息
.//*[@class='container'][1]/b ➡ '1','2','3' .//*[@class='container'][2]/b ➡ '4','5','6'
添加回答
举报