为了账号安全,请及时绑定邮箱和手机立即绑定

提取 src 属性

提取 src 属性

慕田峪7331174 2023-02-07 14:55:36
我想做的事:这个 HTML 代码:<img class="poster lazyload lazyloaded"     data-src="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg"     data-srcset="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 1x, https://image.tmdb.org/t/p/w188_and_h282_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 2x"     alt="Hitman"     src="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg"     srcset="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 1x, https://image.tmdb.org/t/p/w188_and_h282_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 2x"     data-loaded="true">我想提取“data-src”或“src”(或包含图像 URL 的每个属性)属性值。我试过的:Posters = soup.find("img")["src"]print(Posters)但这显然会返回每个 img 标签的所有值,因此每个链接都与海报无关。输出:https://www.themoviedb.org/assets/2/v4/logos/v2/blue_short-8e7b30f73a4020692ccca9c88bafe5dcb6f8a62a4c6bc55cd9ba82bb2cd95f6c.SVGhttps://www.themoviedb.org/assets/2/v4/logos/v2/blue_short-8e7b30f73a4020692ccca9c88bafe5dcb6f8a62a4c6bc55cd9ba82bb2cd95f6c.SVG对于海报,我指的是(检查此 URL https://www.themoviedb.org/search?&query=Hitman:)电影海报。概括我想在类“.lazyloaded”中提取属性内的值我希望一切都清楚。谢谢。
查看完整描述

1 回答

?
饮歌长啸

TA贡献1951条经验 获得超3个赞

您可以尝试过滤class:


posters  = soup.find_all("img", {"class": "lazyloaded"})


for poster in posters:

    print(poster["src"])

请参阅文档:https ://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class


编辑:更多解释


假设您有以下文件demo.html:


<!DOCTYPE html>

<html>

<head>

  <meta charset="UTF-8">

  <title>Title</title>

</head>

<body>

<img class="logo" src="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg">

<img class="poster lazyload lazyloaded"

     data-src="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg"

     data-srcset="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 1x, https://image.tmdb.org/t/p/w188_and_h282_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 2x"

     alt="Hitman"

     src="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg"

     srcset="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 1x, https://image.tmdb.org/t/p/w188_and_h282_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 2x"

     data-loaded="true">

</body>

</html>

您可以像这样解析“海报”图像:


import io


from bs4 import BeautifulSoup


with io.open("demo.html", encoding="utf8") as fd:

    soup = BeautifulSoup(fd.read(), features="html.parser")


posters = soup.find_all("img", {"class": "lazyloaded"})


for poster in posters:

    print(poster["src"])

你得到:


https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg


查看完整回答
反对 回复 2023-02-07
  • 1 回答
  • 0 关注
  • 126 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信