如何使用 Python 抓取“sorting_1”类中的内容？

我得到了一个制作 covid 追踪器的项目。我决定通过网站 ( https://www.worldometers.info/coronavirus/ ) 抓取一些元素。我对 python 很陌生，所以决定使用 BeautifulSoup。我能够抓取基本元素，如总案例、活跃案例等。但是，每当我尝试获取国家名称或数字时，它都会返回一个空列表。即使存在类“sorting_1”，它仍然返回一个空列表。有人可以指导我哪里出错了吗？这是我想要抓住的东西：<td style="font-weight: bold; text-align:right" class="sorting_1">4,918,420</td>这是我当前的代码：import requestsimport bs4#making a request and a soupres = requests.get('https://www.worldometers.info/coronavirus/')soup = bs4.BeautifulSoup(res.text, 'lxml')#scraping starts heretotal_cases = soup.select('.maincounter-number')[0].texttotal_deaths = soup.select('.maincounter-number')[1].texttotal_recovered = soup.select('.maincounter-number')[2].textactive_cases = soup.select('.number-table-main')[0].textcountry_cases = soup.find_all('td', {'class': 'sorting_1'})

查看完整描述

2 回答

浮云间

TA贡献1829条经验获得超4个赞

您可以获得sorting_1课程，因为它不存在于页面源代码中。您已找到表中的所有行，然后从所需的列中读取信息。

因此，要获取每个国家/地区的总案例，您可以使用以下代码：

import requests

import bs4

res = requests.get('https://www.worldometers.info/coronavirus/')

soup = bs4.BeautifulSoup(res.text, 'lxml')

country_cases = soup.find_all('td', {'class': 'sorting_1'})

rows = soup.select('table#main_table_countries_today tr')

for row in rows[8:18]:

tds = row.find_all('td')

print(tds[1].text.strip(), '=', tds[2].text.strip())

反对回复 2023-04-25

白板的微信

TA贡献1883条经验获得超3个赞

这些类似乎sorting_X是由 javascript 添加的，因此它们不存在于原始 html 中。

但是，该表确实存在，因此我建议循环遍历类似于此的表行：

table_rows = soup.find("table", id="main_table_countries_today").find_all("tr")

for row in table_rows:

name = "unknown"

# Find country name

for td in row.find_all("td"):

if td.find("mt_a"): # This kind of link apparently only exists in the "name" column

name = td.find("a").text

# Do some more scraping

警告，我有一段时间没有喝汤了，所以这可能不是 100% 正确。你明白了。

反对回复 2023-04-25

热搜

最近搜索清空

如何使用 Python 抓取“sorting_1”类中的内容？

如何使用 Python 抓取“sorting_1”类中的内容？

2 回答

添加回答