无法使用 BeautifulSoup4 抓取正确的 wikitable（初学者）

Python

POPMUISE 2022-10-06 16:12:00

这里是一个完整的初学者......我正在尝试从这个维基百科页面中抓取成分表，但是刮掉的表格是年度回报（第一个表）而不是我需要的成分表（第二个表）。有人可以帮忙看看我是否可以使用 BeautifulSoup4 来定位我想要的特定表？import bs4 as bsimport pickleimport requestsdef save_klci_tickers(): resp = requests.get ('https://en.wikipedia.org/wiki/FTSE_Bursa_Malaysia_KLCI') soup = bs.BeautifulSoup(resp.text) table = soup.find ('table', {'class': 'wikitable sortable'}) tickers = [] for row in table.findAll ('tr') [1:]: ticker = row.findAll ('td') [0].text tickers.append(ticker) with open ("klcitickers.pickle", "wb") as f: pickle.dump (tickers, f) print (tickers) return tickerssave_klci_tickers()

查看完整描述

1 回答

慕盖茨4494581

TA贡献1850条经验获得超11个赞

试试 pandas 库，眨眼之间就可以从 csv 文件中的该页面获取表格数据：

import pandas as pd

url = 'https://en.wikipedia.org/wiki/FTSE_Bursa_Malaysia_KLCI'

df = pd.read_html(url, attrs={"class": "wikitable"})[1] #change the index to get the table you need from that page

new = pd.DataFrame(df, columns=["Constituent Name", "Stock Code", "Sector"])

new.to_csv("wiki_data.csv", index=False)

print(df)

如果您仍然想坚持使用 BeautifulSoup，则以下内容应该可以达到目的：

import requests

from bs4 import BeautifulSoup

res = requests.get("https://en.wikipedia.org/wiki/FTSE_Bursa_Malaysia_KLCI")

soup = BeautifulSoup(res.text,"lxml")

for items in soup.select("table.wikitable")[1].select("tr"):

data = [item.get_text(strip=True) for item in items.select("th,td")]

print(data)

如果您想使用.find_all()而不是.select()，请尝试以下操作：

for items in soup.find_all("table",class_="wikitable")[1].find_all("tr"):

data = [item.get_text(strip=True) for item in items.find_all(["th","td"])]

print(data)

反对回复 2022-10-06

热搜

最近搜索清空

无法使用 BeautifulSoup4 抓取正确的 wikitable（初学者）

无法使用 BeautifulSoup4 抓取正确的 wikitable（初学者）

1 回答

添加回答