2 回答
TA贡献1772条经验 获得超6个赞
页面加载后,网站就会动态加载JavaScript
。所以你可以使用requests-html或selenium
.
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get(
"https://www.weatherbug.com/weather-forecast/hourly/san-francisco-ca-94103")
data = driver.find_elements_by_css_selector("div.temp.ng-binding")
for item in data:
print(item.text)
driver.quit()
输出:
51°
52°
53°
54°
53°
53°
52°
51°
51°
50°
50°
49°
根据用户请求更新:
import requests
from bs4 import BeautifulSoup
r = requests.get(
"https://www.weatherbug.com/weather-forecast/hourly/san-francisco-ca-94103")
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.select("div.hour-card__mobile__cond"):
item = int(item.contents[1].get_text(strip=True)[:-1])
print(item, type(item))
输出:
51 <class 'int'>
52 <class 'int'>
53 <class 'int'>
53 <class 'int'>
53 <class 'int'>
53 <class 'int'>
52 <class 'int'>
51 <class 'int'>
51 <class 'int'>
50 <class 'int'>
50 <class 'int'>
50 <class 'int'>
TA贡献1784条经验 获得超9个赞
当您看到 class = "temp ng-binding" 时,这意味着该 div 具有“temp”类和“ng-binding”类,因此查找两者都不起作用。另外,当我运行你的脚本时,临时容器的 html 看起来像这样:
print(temp_containers[0])
<div class="temp">
51°
</div>
所以我运行了这个并得到了结果
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.weatherbug.com/weather-forecast/hourly/san-francisco-ca-94103'
header = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
page = requests.get(url, headers=header)
soup = BeautifulSoup(page.text, 'html.parser')
temp_containers = soup.find_all('div', class_ = 'hour-card__mobile__cond')
print(type(temp_containers))
print(len(temp_containers))
for div in temp_containers:
a = div.find('div', class_ = 'temp')
print(a.text)
- 2 回答
- 0 关注
- 107 浏览
添加回答
举报