首页猿问 Python -...

Python - Beautifulsoup - 只返回一个结果

Python

紫衣仙女 2023-03-16 10:59:41

我正试图从下面的链接中抓取运动日程数据https://sport-tv-guide.live/live/darts我在下面使用以下代码import requestsfrom bs4 import BeautifulSoupdef makesoup(url): page=requests.get(url) return BeautifulSoup(page.text,"lxml") def matchscrape(g_data): for match in g_data: datetimes = match.find('div', class_='main time col-sm-2 hidden-xs').text.strip() print("DateTimes; ", datetimes) print('-' *80) def matches(): soup=makesoup(url = "https://sport-tv-guide.live/live/darts") matchscrape(g_data = soup.findAll("div", {"class": "listData"}))我遇到的问题是只返回第一个结果（见下文）而应该输出两个值（见下文）我打印了从运行中收到的输出def matches(): soup=makesoup(url = "https://sport-tv-guide.live/live/darts") matchscrape(g_data = soup.findAll("div", {"class": "listData"}))并且由于某种原因似乎只有第一个结果在 HTML 中返回（见下文），这将导致为什么只返回第一个结果，因为这是可以从收到的 HTML 中找到的唯一结果。我不确定的是为什么 Beautifulsoup 没有输出整个 HTML，所以所有的结果都可以输出？

查看完整描述

4 回答

慕村225694

TA贡献1880条经验获得超4个赞

你的matchscrape功能是错误的。而不是match.find返回第一项的函数，您应该使用与matches函数match.findAll函数相同的方式。然后像下面的例子一样遍历找到的日期时间。

def matchscrape(g_data):

for match in g_data:

datetimes = match.findAll('div', class_='main time col-sm-2 hidden-xs')

for datetime in datetimes:

print("DateTimes; ", datetime.text.strip())

print('-' * 80)

第二件事是解析 html 页面。该页面是用编写的，html因此您可能应该使用BeautifulSoup(page.text, 'html.parser')而不是lxml

反对回复 2023-03-16

江户川乱折腾

TA贡献1851条经验获得超5个赞

我也只有 1 个时间戳。不过，还有其他可能导致问题的原因。在这种情况下，网站通常具有动态内容，并且在某些情况下，这些内容并不总是随请求正确加载。

如果您真的确定问题是请求没有正确获取站点，请尝试requests_html(pip install requests-html)，这会打开一个肯定会加载所有动态内容的会话：

from requests_html import HTMLSession

from bs4 import BeautifulSoup

session = HTMLSession()

request = session.get(LINK)

html = BeautifulSoup(request.text, "html.parser")

反对回复 2023-03-16

陪伴而非守候

TA贡献1757条经验获得超8个赞

今天只有一次，但您可以通过首先使用所需日期发出 POST 请求并重新加载页面来获得明天的时间。

例如：

import requests

from bs4 import BeautifulSoup

url = 'https://sport-tv-guide.live/live/darts'

select_date_url = 'https://sport-tv-guide.live/ajaxdata/selectdate'

with requests.session() as s:

# print times for today:

print('Times for today:')

soup = BeautifulSoup(s.get(url).content, 'html.parser')

for t in soup.select('.time'):

print(t.get_text(strip=True, separator=' '))

# select tomorrow:

s.post(select_date_url, data={'d': '2020-07-19'}).text

# print times for tomorrow:

print('Times for 2020-07-19:')

soup = BeautifulSoup(s.get(url).content, 'html.parser')

for t in soup.select('.time'):

print(t.get_text(strip=True, separator=' '))

印刷：

Times for today:

Darts 17:05

Times for 2020-07-19:

Darts 19:05

反对回复 2023-03-16

哈士奇WWW

TA贡献1799条经验获得超6个赞

在获得上述有用的答案后，我能够确定问题是网站上存储了一个 cookie，其中包含用户选择的国家/地区信息，以显示运动日程数据。在这个例子中，澳大利亚的一个频道在 18:00 有一个列表。由于从请求模块收到的请求没有 cookie 数据，这最初没有通过我上面的代码显示在输出中。

我能够通过以下代码提供必要的 cookie 信息

def makesoup(url):

cookies = {'mycountries' : '101,28,3,102,42,10,18,4,2'} # pass cookie data

r = requests.post(url, cookies=cookies)

return BeautifulSoup(r.text,"html.parser")

现在输出了正确的信息

//img1.sycdn.imooc.com//6412867f0001494002120066.jpg

只需发布此答案，以防将来帮助遇到类似问题的人。

反对回复 2023-03-16

4 回答
0 关注
87 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

Python - Beautifulsoup - 只返回一个结果

Python - Beautifulsoup - 只返回一个结果

4 回答

添加回答