1 回答
TA贡献1850条经验 获得超11个赞
试试 pandas 库,眨眼之间就可以从 csv 文件中的该页面获取表格数据:
import pandas as pd
url = 'https://en.wikipedia.org/wiki/FTSE_Bursa_Malaysia_KLCI'
df = pd.read_html(url, attrs={"class": "wikitable"})[1] #change the index to get the table you need from that page
new = pd.DataFrame(df, columns=["Constituent Name", "Stock Code", "Sector"])
new.to_csv("wiki_data.csv", index=False)
print(df)
如果您仍然想坚持使用 BeautifulSoup,则以下内容应该可以达到目的:
import requests
from bs4 import BeautifulSoup
res = requests.get("https://en.wikipedia.org/wiki/FTSE_Bursa_Malaysia_KLCI")
soup = BeautifulSoup(res.text,"lxml")
for items in soup.select("table.wikitable")[1].select("tr"):
data = [item.get_text(strip=True) for item in items.select("th,td")]
print(data)
如果您想使用.find_all()而不是.select(),请尝试以下操作:
for items in soup.find_all("table",class_="wikitable")[1].find_all("tr"):
data = [item.get_text(strip=True) for item in items.find_all(["th","td"])]
print(data)
添加回答
举报