为了账号安全,请及时绑定邮箱和手机立即绑定

无法使用 BeautifulSoup 提取脚本标签的内容

无法使用 BeautifulSoup 提取脚本标签的内容

子衿沉夜 2023-03-30 16:13:04
soup.find('script',type='application/ld+json').text 返回空数据,为什么我无法提取文本。>>> soup = BeautifulSoup(page.text,'lxml')>>> soup.find('script',type='application/ld+json').text**''>>> soup.find('script',type='application/ld+json')<script type="application/ld+json">{"@context":"http://schema.org","@type":"Organization","name":"Hamilton Medical Group - Dunkeld","url":"https://www.healthdirect.gov.au/australian-health-services/23000130/hamilton-medical-group-dunkeld/services/dunkeld-3294-sterling","contactPoint":{"@type":"ContactPoint","telephone":"03 5572 2422","email":"","website":"http://www.hamiltonmedicalgroup.net.au","fax":"03 5571 1606"},"address":{"@type":"PostalAddress","streetAddress":"14 Sterling Street","addressLocality":"DUNKELD","addressRegion":"VIC","postalCode":"3294","addressCountry":"AU"}}</script>>>> json.loads(soup.find('script',type='application/ld+json'))Traceback (most recent call last):  File "<stdin>", line 1, in <module>NameError: name 'json' is not defined>>> import json>>> json.loads(soup.find('script',type='application/ld+json'))Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "C:\Users\*******\Python38\lib\json\__init__.py", line 341, in loads    raise TypeError(f'the JSON object must be str, bytes or bytearray, 'TypeError: the JSON object must be str, bytes or bytearray, not Tag
查看完整描述

1 回答

?
喵喔喔

TA贡献1735条经验 获得超5个赞

使用.string属性获取<script>数据:


import json

from bs4 import BeautifulSoup



html_text = '''<script type="application/ld+json">{"@context":"http://schema.org","@type":"Organization","name":"Hamilton Medical Group - Dunkeld","url":"https://www.healthdirect.gov.au/australian-health-services/23000130/hamilton-medical-group-dunkeld/services/dunkeld-3294-sterling","contactPoint":{"@type":"ContactPoint","telephone":"03 5572 2422","email":"","website":"http://www.hamiltonmedicalgroup.net.au","fax":"03 5571 1606"},"address":{"@type":"PostalAddress","streetAddress":"14 Sterling Street","addressLocality":"DUNKELD","addressRegion":"VIC","postalCode":"3294","addressCountry":"AU"}}</script>'''


soup = BeautifulSoup(html_text, 'html.parser')

parsed_data = json.loads(soup.find('script',type='application/ld+json').string)


# print parsed data to screen:

print(json.dumps(parsed_data, indent=4))

印刷:


{

    "@context": "http://schema.org",

    "@type": "Organization",

    "name": "Hamilton Medical Group - Dunkeld",

    "url": "https://www.healthdirect.gov.au/australian-health-services/23000130/hamilton-medical-group-dunkeld/services/dunkeld-3294-sterling",

    "contactPoint": {

        "@type": "ContactPoint",

        "telephone": "03 5572 2422",

        "email": "",

        "website": "http://www.hamiltonmedicalgroup.net.au",

        "fax": "03 5571 1606"

    },

    "address": {

        "@type": "PostalAddress",

        "streetAddress": "14 Sterling Street",

        "addressLocality": "DUNKELD",

        "addressRegion": "VIC",

        "postalCode": "3294",

        "addressCountry": "AU"

    }

}


查看完整回答
反对 回复 2023-03-30
  • 1 回答
  • 0 关注
  • 114 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信