为了账号安全,请及时绑定邮箱和手机立即绑定

Scrapy Python 网页抓取 JSON

Scrapy Python 网页抓取 JSON

缥缈止盈 2023-07-27 15:52:14
我正在努力弄清楚如何使用 Scrapy Python 抓取 JSON 响应。我能够成功地在同一站点的不同页面上抓取 JSON。我将不胜感激任何帮助。我如何抓取“tournamentGroup”中的值(即id、名称)以及年份、标题等。部分代码:start_url = 'https://api.wtatennis.com/tennis/tournaments/?page=0&pageSize=100&excludeLevels=ITF&from=2020-09-01&to=2020-09-30'    with urllib.request.urlopen(start_url) as start_url:    json_obj = start_url.read()    rank_list = json.loads(json_obj)    for item in rank_list:                rank_data = []        tourney_id = item['content']['id']        tourney_year = item['year']            rank_data = [tourney_id, tourney_year]         cur.execute("""insert into wta_rankings(tourney_id, tourney_year)                     values(%s, %s)                    ON CONFLICT DO NOTHING"""                    ,(rank_data))        conn.commit()            cur.close()JSON:{   "pageInfo":{      "page":0,      "numPages":0,      "pageSize":100,      "numEntries":10   },   "content":[      {         "tournamentGroup":{            "id":2023,            "name":"Prague 125K",            "level":"125K",            "metadata":null         },         "year":2020,         "title":"Prague Open",         "startDate":"2020-08-29",         "endDate":"2020-09-06",         "surface":"Clay",         "inOutdoor":"O",         "city":"PRAGUE",         "country":"Czech Republic",         "singlesDrawSize":128,         "doublesDrawSize":32,         "prizeMoney":3125000,         "prizeMoneyCurrency":"USD",         "liveScoringId":"2023"      },URL 示例:https://api.wtatennis.com/tennis/tournaments/?page =0&pageSize=100&excludeLevels=ITF&from=2020-09-01&to=2020-09-30
查看完整描述

1 回答

?
摇曳的蔷薇

TA贡献1793条经验 获得超6个赞

尝试这个:


import requests


url = "https://api.wtatennis.com/tennis/tournaments/?page=0&pageSize=100&excludeLevels=ITF&from=2020-09-01&to=2020-09-30"


response = requests.get(url).json()


for item in response["content"]:

    print(f"{item['tournamentGroup']['name']} - {item['year']} - {item['title']}")

这为您提供了(这只是一个示例,您可以获得任何您想要的字段):


Prague 125K - 2020 - Prague Open

US OPEN - 2020 - US Open - New York, United States, NY

WARSAW - 2020 - BNP Paribas Warsaw Open - Warsaw, Poland

ISTANBUL - 2020 - TEB BNP Paribas Tennis Championship Istanbul - Istanbul, Turkey

MADRID - 2020 - Mutua Madrid Open - Madrid, Spain

HIROSHIMA - 2020 - Hana-cupid Japan Women's Open - Hiroshima, Japan

ROME - 2020 - Internazionali BNL d'Italia - Rome, Italy

STRASBOURG - 2020 - Internationaux de Strasbourg - Strasbourg, France

ROLAND GARROS - 2020 - Roland Garros - Paris, France

TASHKENT - 2020 - Tashkent Open - Tashkent, Uzbekistan

如果您在 JSON 中“导航”遇到困难,只需将响应内容复制到在线JSON 格式化程序中,单击wrench图标即可修复它,然后单击Format / Beautify



查看完整回答
反对 回复 2023-07-27
  • 1 回答
  • 0 关注
  • 115 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信