为了账号安全,请及时绑定邮箱和手机立即绑定

BeautifulSoup 在网页上找不到表格

BeautifulSoup 在网页上找不到表格

慕村225694 2023-10-26 10:22:35
所以我发现我的问题,它位于代码的前面,我最初从另一个数据帧中切出了change_details。change_details = gdp_sched_today[[start_date', 'end_date']]change_details.columns = ['Planned Start Date', 'Planned End Date']change_details['Planned Start Date'] = change_details['Planned Start Date'].dt.strftime('%d/%m/%Y %h:%M')change_details['Planned End Date'] = change_details['Planned End Date'].dt.strftime('%d/%m/%Y %H:%M')我可以通过在第一行添加 .copy() 来解决这个问题,确保 Pandas 知道我打算将其设为副本而不是视图。change_details = gdp_sched_today[[start_date', 'end_date']].copy()change_details.columns = ['Planned Start Date', 'Planned End Date']change_details['Planned Start Date'] = change_details['Planned Start Date'].dt.strftime('%d/%m/%Y %h:%M')change_details['Planned End Date'] = change_details['Planned End Date'].dt.strftime('%d/%m/%Y %H:%M')如果警告能更清楚地说明触发它的原因,那就太好了:)
查看完整描述

2 回答

?
隔江千里

TA贡献1906条经验 获得超10个赞

表存在于iframe您需要iframe先切换才能访问的内部table。


引发WebDriverWait()等待frame_to_be_available_and_switch_to_it()和下面的定位符。


引发WebDriverWait()等待visibility_of_element_located()和下面的定位符。


driver.get("https://learn.microsoft.com/en-us/windows/release-information/")

WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))

table=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table.cells-centered")))

您需要导入以下库。


from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.common.by import By

from selenium.webdriver.support import expected_conditions as EC

或者您将下面的代码与xpath.


driver.get("https://learn.microsoft.com/en-us/windows/release-information/")

WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))

table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]')))

您可以将表数据进一步导入到 pandas 数据框,然后导出到 csv 文件。您需要导入 pandas。


driver.get("https://learn.microsoft.com/en-us/windows/release-information/")

WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))

table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]'))).get_attribute('outerHTML')

df=pd.read_html(str(table))[0]

print(df)

df.to_csv("path/to/csv")

导入熊猫:pip install pandas


然后添加以下库


import pandas as pd


查看完整回答
反对 回复 2023-10-26
?
撒科打诨

TA贡献1934条经验 获得超2个赞

该表位于 内部<iframe>,因此BeautifulSoup在原始页面中看不到它:


import requests 

from bs4 import BeautifulSoup



url = 'https://learn.microsoft.com/en-us/windows/release-information/'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

soup = BeautifulSoup(requests.get(soup.select_one('iframe')['src']).content, 'html.parser')


for row in soup.select('table tr'):

    print(row.get_text(strip=True, separator='\t'))

印刷:


Version Servicing option    Availability date   OS build    Latest revision date    End of service: Home, Pro, Pro Education, Pro for Workstations and IoT Core End of service: Enterprise, Education and IoT Enterprise

2004    Semi-Annual Channel 2020-05-27  19041.546   2020-10-01  2021-12-14  2021-12-14  Microsoft recommends

1909    Semi-Annual Channel 2019-11-12  18363.1110  2020-09-16  2021-05-11  2022-05-10

1903    Semi-Annual Channel 2019-05-21  18362.1110  2020-09-16  2020-12-08  2020-12-08

1809    Semi-Annual Channel 2019-03-28  17763.1490  2020-09-16  2020-11-10  2021-05-11

1809    Semi-Annual Channel (Targeted)  2018-11-13  17763.1490  2020-09-16  2020-11-10  2021-05-11

1803    Semi-Annual Channel 2018-07-10  17134.1726  2020-09-08  End of service  2021-05-11


...and so on.


查看完整回答
反对 回复 2023-10-26
  • 2 回答
  • 0 关注
  • 167 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信