为了账号安全,请及时绑定邮箱和手机立即绑定

抓取表中的行需要与先前的元素关联

抓取表中的行需要与先前的元素关联

当年话下 2023-07-18 17:46:06
我想从这个网站上抓取表格: https ://www.oddsportal.com/moving-margins/我需要表内的数据#moving_margins_content_overall我尝试了这段代码,但有些游戏包含许多 class="odd" 并且我不知道如何将 class="odd" 数据与 class="dark" 数据关联import requestsfrom bs4 import BeautifulSoupimport timeimport jsonimport csvfrom selenium import webdriveru = 'https://www.oddsportal.com/moving-margins/'driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe")driver.get(u)driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")    driver.implicitly_wait(60) # secondstime.sleep(2)elem = driver.find_element_by_xpath("//*")source_code = elem.get_attribute("innerHTML")soup = BeautifulSoup(source_code, 'html.parser')for k in soup.select('#moving_margins_content_overall .table-main tbody tr'):    sport = k.select_one('tr.dark th > a').get_text(strip=True) #sport    country = soup.select_one('tr.dark th a:nth-child(3) span').get_text(strip=True) #country    competition = soup.select_one('tr.dark th a:nth-child(5)').get_text(strip=True) #sport
查看完整描述

1 回答

?
PIPIONE

TA贡献1829条经验 获得超9个赞

您可以使用下面的代码将所有数据存储在一个列表中,其中页面中的每一行都存储为列表。


u = 'https://www.oddsportal.com/moving-margins/'

driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe")

driver.maximize_window()

driver.get(u)

#Use Explicit time wait for fast execution

WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#moving_margins_content_overall")))

driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")

table_data =  driver.find_elements_by_xpath("//div[@id='moving_margins_content_overall']//tr[@class='odd' or @class='dark']")

table =[]

# Creating a list of lists, where each list consist all data in each row either with class dark or odd

for data in table_data:

    row = []

    dark_row = data.find_elements_by_xpath((".//th//a"))

    for col in dark_row:

        row.append(col.text.replace("\n"," "))

    row.append(data.find_element_by_xpath(".//following-sibling::tr//th[@class='first2']").text)# Add data in first2 th

    odd_row = data.find_elements_by_xpath((".//following-sibling::tr[@class='odd']//td"))

    for col in odd_row:

        row.append(col.text.replace("\n", " "))

    row.append(odd_row[-1].find_element_by_xpath('.//a').get_attribute("title")) #Add bookmaker name

    table.append(row)

for t in table:

    print(t)

输出 正如您所看到的橄榄球联盟比赛有两种赔率,因此该比赛的列表很长。

//img1.sycdn.imooc.com//64b65f9e0001026d16490185.jpg

查看完整回答
反对 回复 2023-07-18
  • 1 回答
  • 0 关注
  • 152 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信