为了账号安全,请及时绑定邮箱和手机立即绑定

如何使用 beautiful soup 从脚本标签中提取 json?

如何使用 beautiful soup 从脚本标签中提取 json?

慕田峪7331174 2023-12-25 15:55:10
reviewCount我想使用 beautiful soup 从脚本标签中提取。尝试了不同的方法但没有成功。<script type="application/json" data-initial-state="review-filter">{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}</script>
查看完整描述

3 回答

?
jeck猫

TA贡献1909条经验 获得超7个赞

这应该可行,我绝对确定有一种更优雅的方法:


import json

from bs4 import BeautifulSoup


html = '''

<script type="application/json" data-initial-state="review-filter">

{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}

</script>

'''


soup = BeautifulSoup(html, 'html.parser')

res = soup.find('script')

json_object = json.loads(res.contents[0])


for language in json_object['languages']:

    print('{}: {}'.format(language['displayName'], language['reviewCount']))

输出:


Toutes les langues: 573

français: 567

English: 6


查看完整回答
反对 回复 2023-12-25
?
慕无忌1623718

TA贡献1744条经验 获得超4个赞

导入 json 并加载数据json,然后 iterarte 获取所有reviewCount.


import json

html='''<script type="application/json" data-initial-state="review-filter">

{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}

</script>'''


soup=BeautifulSoup(html,"html.parser")

item=soup.select_one('script[data-initial-state="review-filter"]').text

jsondata=json.loads(item)

for item in jsondata['languages']:

    print(item['reviewCount'])

输出:


573

567

6


查看完整回答
反对 回复 2023-12-25
?
慕妹3242003

TA贡献1824条经验 获得超6个赞

import re


html = '''<script type="application/json" data-initial-state="review-filter">

{"languages":[{"isoCode":"all","displayName":"Toutes les langues","reviewCount":"573"},{"isoCode":"fr","displayName":"français","reviewCount":"567"},{"isoCode":"en","displayName":"English","reviewCount":"6"}],"selectedLanguages":["all"],"selectedStars":null,"selectedLocationId":null}

</script>'''



match = [item.group(1) for item in re.finditer('reviewCount":"(.+?)"', html)]


print(match)

输出:


['573', '567', '6']


查看完整回答
反对 回复 2023-12-25
  • 3 回答
  • 0 关注
  • 153 浏览

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信