为了账号安全,请及时绑定邮箱和手机立即绑定

如何使用 Python 和 Beautiful Soup 从 flexbox 元素/容器中抓取数据

如何使用 Python 和 Beautiful Soup 从 flexbox 元素/容器中抓取数据

万千封印 2022-12-20 15:09:04
我正在尝试使用 python、beautiful soup 和 selenium 从实用网站抓取数据。我试图抓取的数据是这样的:时间、原因、状态等。当我运行典型的页面请求时,解析页面并解析我正在寻找的数据(id="OutageListTable" 中的数据) ,然后打印出来,div 和字符串无处可寻。当我检查页面元素时,数据就在那里,但它在 flex 容器中。这是我正在使用的代码:from urllib.request import urlopen as uReqfrom bs4 import BeautifulSoup as soupimport urllib3from selenium import webdrivermy_url = 'https://www.pse.com/outage/outage-map'browser = webdriver.Firefox()browser.get(my_url)html = browser.page_sourcepage_soup = soup(html, features='lxml')outage_list = page_soup.find(id='OutageListTable')print(outage_list)browser.quit()您如何检索 flex/flexbox 容器中的信息?我没有在网上找到任何资源来帮助我解决这个问题。
查看完整描述

2 回答

?
慕哥6287543

TA贡献1831条经验 获得超10个赞

你把问题想多了。首先没有柔性板容器。这是分配正确的 div 类的简单案例。你应该看看div class_=col-xs-12 col-sm-6 col-md-4 listView-container


from bs4 import BeautifulSoup

from selenium import webdriver

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.support.ui import WebDriverWait

from selenium.common.exceptions import TimeoutException

from time import sleep


# create object for chrome options

chrome_options = Options()

base_url = 'https://www.pse.com/outage/outage-map'


chrome_options.add_argument('disable-notifications')

chrome_options.add_argument('--disable-infobars')

chrome_options.add_argument('start-maximized')

chrome_options.add_argument('user-data-dir=C:\\Users\\username\\AppData\\Local\\Google\\Chrome\\User Data\\Default')

# To disable the message, "Chrome is being controlled by automated test software"

chrome_options.add_argument("disable-infobars")

# Pass the argument 1 to allow and 2 to block

chrome_options.add_experimental_option("prefs", { 

    "profile.default_content_setting_values.notifications": 2

    })

# invoke the webdriver

browser = webdriver.Chrome(executable_path = r'C:/Users/username/Documents/playground_python/chromedriver.exe',

                          options = chrome_options)

browser.get(base_url)

delay = 5 #secods


while True:

    try:

        WebDriverWait(browser, delay)

        print ("Page is ready")

        sleep(5)

        html = browser.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

        #print(html)

        soup = BeautifulSoup(html, "html.parser")

        for item_n in soup.find_all('div', class_='col-xs-12 col-sm-6 col-md-4 listView-container'):

            for item_n_text in item_n.find_all(name="span"):

                print(item_n_text.text)

    except TimeoutException:

        print ("Loading took too much time!-Try again")

# close the automated browser

browser.close()


Cause: 

Accident

Status: 

Crew assigned

Last updated: 

06/02 11:00 PM

9. Woodinville

Start time: 

06/02 08:29 PM

Est. restoration time: 

06/03 03:30 AM

Customers impacted: 

2

Cause: 

Under Investigation

Status: 

Crew assigned

Last updated: 

06/03 12:15 AM

Page is ready

1. Bellingham

Start time: 

06/02 06:09 PM

Est. restoration time: 

06/03 06:30 AM

Customers impacted: 

1

Cause: 

Trees/Vegetation

Status: 

Crew assigned

Last updated: 

06/02 11:50 PM

2. Deming

Start time: 

06/02 07:10 PM

Est. restoration time: 

06/03 03:30 AM


查看完整回答
反对 回复 2022-12-20
?
BIG阳

TA贡献1859条经验 获得超6个赞

数据通过 Javascript 动态加载。您可以使用requests模块来获取数据。


例如:


import json

import requests


url = 'https://www.pse.com/api/sitecore/OutageMap/AnonymoussMapListView'


data = requests.get(url).json()


# uncomment this to print all data:

#print(json.dumps(data, indent=4))


for d in data['PseMap']:

    print('{} - {}'.format(d['DataProvider']['PointOfInterest']['Title'], d['DataProvider']['PointOfInterest']['MapType']))

    for info in d['DataProvider']['Attributes']:

        print(info['Name'], info['Value'])

    print('-' * 80)

印刷:


Bellingham - Outage

Start time 06/02 06:09 PM

Est. restoration time 06/03 06:30 AM

Customers impacted 1

Cause Trees/Vegetation

Status Crew assigned

Last updated 06/02 11:50 PM

--------------------------------------------------------------------------------

Deming - Outage

Start time 06/02 07:10 PM

Est. restoration time 06/03 03:30 AM

Customers impacted 568

Cause Accident

Status Repair crew onsite

Last updated 06/02 11:50 PM

--------------------------------------------------------------------------------

Everest - Outage

Start time 06/02 10:42 AM

Customers impacted 4

Cause Scheduled Outage

Status Repair crew onsite

Last updated 06/02 10:50 AM

--------------------------------------------------------------------------------

Kenmore - Outage

Start time 06/02 09:59 PM

Est. restoration time 05/29 01:00 AM

Customers impacted 2

Cause Scheduled Outage

Status Repair crew onsite

Last updated 06/02 10:05 PM

--------------------------------------------------------------------------------

Kent - Outage

Start time 06/02 06:43 PM

Est. restoration time To Be Determined

Customers impacted 26

Cause Car/Equip Accident

Status Waiting for repairs

Last updated 06/02 10:15 PM

--------------------------------------------------------------------------------

Kent - Outage

Start time 06/02 10:09 PM

Est. restoration time To Be Determined

Customers impacted 13

Cause Under Investigation

Status Repair crew onsite

Last updated 06/02 10:15 PM

--------------------------------------------------------------------------------

Northwest Bellevue - Outage

Start time 06/02 11:28 PM

Est. restoration time To Be Determined

Customers impacted 14

Cause Under Investigation

Status Repair crew onsite

Last updated 06/02 11:30 PM

--------------------------------------------------------------------------------

Pacific - Outage

Start time 06/02 06:19 PM

Est. restoration time 06/03 02:30 AM

Customers impacted 3

Cause Accident

Status Crew assigned

Last updated 06/02 11:00 PM

--------------------------------------------------------------------------------

Woodinville - Outage

Start time 06/02 08:29 PM

Est. restoration time 06/03 03:30 AM

Customers impacted 2

Cause Under Investigation

Status Crew assigned

Last updated 06/03 12:15 AM

--------------------------------------------------------------------------------



查看完整回答
反对 回复 2022-12-20
  • 2 回答
  • 0 关注
  • 95 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信