为了账号安全,请及时绑定邮箱和手机立即绑定

BeautifulSoap为具有特定类的div中的所有img获取多个元素

BeautifulSoap为具有特定类的div中的所有img获取多个元素

慕斯709654 2021-12-21 16:27:26
我试图在with下的标签中获取image-file属性(相对链接)中的链接(我不想要链接)。imgdivid previewImagessrc这是示例 HTML:<div id="previewImages">  <div class="thumb"> <a><img src="https://example.com/s/15.jpg" image-file="/image/15.jpg" /></a> </div>  <div class="thumb"> <a><img src="https://example.com/s/2.jpg" image-file="/image/2.jpg" /> </a> </div>  <div class="thumb"> <a><img src="https://example.com/s/0.jpg" image-file="/image/0.jpg" /> </a> </div>  <div class="thumb"> <a><img src="https://example.com/s/3.jpg" image-file="/image/3.jpg" /> </a> </div>  <div class="thumb"> <a><img src="https://example.com/s/4.jpg" image-file="/image/4.jpg" /> </a> </div></div>我尝试了以下操作,但它只给了我第一个链接,而不是全部:import sysimport urllib2from bs4 import BeautifulSoupquote_page = sys.argv[1] # this should be the first argument on the command linepage = urllib2.urlopen(quote_page)soup = BeautifulSoup(page, 'html.parser')images_box = soup.find('div', attrs={'id': 'previewImages'})if images_box.find('img'):    imagesurl = images_box.find('img').get('image-file')print imagesurl如何获取image-fileattritube 中所有img标签的链接divwith class previewImages?
查看完整描述

3 回答

?
潇湘沐

TA贡献1816条经验 获得超6个赞

利用 .findAll


前任:


from bs4 import BeautifulSoup


html = """<div id="previewImages">

  <div class="thumb"> <a><img src="https://example.com/s/15.jpg" image-file="/image/15.jpg" /></a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/2.jpg" image-file="/image/2.jpg" /> </a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/0.jpg" image-file="/image/0.jpg" /> </a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/3.jpg" image-file="/image/3.jpg" /> </a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/4.jpg" image-file="/image/4.jpg" /> </a> </div>

</div>"""


soup = BeautifulSoup(html, "html.parser")

images_box = soup.find('div', attrs={'id': 'previewImages'})

for link in images_box.findAll("img"):

    print link.get('image-file')

输出:


/image/15.jpg

/image/2.jpg

/image/0.jpg

/image/3.jpg

/image/4.jpg


查看完整回答
反对 回复 2021-12-21
?
萧十郎

TA贡献1815条经验 获得超13个赞

我认为将 id 与传递给的属性选择器一起使用会更快 select


from bs4 import BeautifulSoup as bs

html = '''

<div id="previewImages">

  <div class="thumb"> <a><img src="https://example.com/s/15.jpg" image-file="/image/15.jpg" /></a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/2.jpg" image-file="/image/2.jpg" /> </a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/0.jpg" image-file="/image/0.jpg" /> </a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/3.jpg" image-file="/image/3.jpg" /> </a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/4.jpg" image-file="/image/4.jpg" /> </a> </div>

</div>

'''

soup = bs(html, 'lxml')

links = [item['image-file'] for item in soup.select('#previewImages [image-file]')]

print(links)


查看完整回答
反对 回复 2021-12-21
?
陪伴而非守候

TA贡献1757条经验 获得超8个赞

如果我们对 lxml 执行相同的场景,则加起来,


import lxml.html

tree = lxml.html.fromstring(sample)

images = tree.xpath("//img/@image-file")

print(images)

输出 ['/image/15.jpg', '/image/2.jpg', '/image/0.jpg', '/image/3.jpg', '/image/4.jpg']


查看完整回答
反对 回复 2021-12-21
  • 3 回答
  • 0 关注
  • 276 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信