为了账号安全,请及时绑定邮箱和手机立即绑定

如何在BeautifulSoup中找到标签和文本的组合

如何在BeautifulSoup中找到标签和文本的组合

梵蒂冈之花 2021-08-24 17:33:02
我从网站上抓取了 HTMl,需要获取其中的特定标签,问题是,它的格式令人困惑,我无法获取整个标签。让我举例说明:data = """<div class="Answer">1. BOUNDARIES - EPB &amp; APL&nbsp;<i>(inferior)</i>, EPL&nbsp;<i>(superior).&nbsp;</i><div>2. FLOOR (proximal to distal) - radial styloid =&gt; scaphoid =&gt; trapezium =&gt; 1st MC base.&nbsp;<br /><div>3. CONTENTS - cutaneous branches of radial nerve&nbsp;<i>(on the roof),</i>&nbsp;cephalic vein&nbsp;<i>(begins here),</i>&nbsp;&nbsp;radial artery&nbsp;<i>(on the floor).</i></div></div><div><br /></div><div><img src="paste-27a44c801f0776d91f5f6a16a963bff67f0e8ef3.jpg" /><br /></div><div><b>Image:&nbsp;</b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</div></div>"""从上面,我只想得到这个:<div><b>Image:&nbsp;</b>Case courtesy of Dr Sachintha Hapugoda, &lt;a href="https://radiopaedia.org/"&gt;Radiopaedia.org&lt;/a&gt;. From the case &lt;a href="https://radiopaedia.org/cases/52525"&gt;rID: 52525&lt;/a&gt; [Accessed 15 Nov. 2018].</div>我写了以下代码:soup = BeautifulSoup(data, "html.parser")image_link = soup.find('div').find('b').next.nextprint(image_link)但它只能让我得到文本:Case courtesy of Dr Sachintha Hapugoda, <a href="https://radiopaedia.org/">Radiopaedia.org</a>. From the case <a href="https://radiopaedia.org/cases/52525">rID: 52525</a> [Accessed 15 Nov. 2018].我如何获得整个标签?
查看完整描述

1 回答

  • 1 回答
  • 0 关注
  • 189 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信