我是Python的新手,并且正在努力将网络抓取数据打印到漂亮的Excel表格中。这是我试图在Python中抓取和复制的表格:HTML Table。以下是HTML页面的外观:</div> <section id="first" style="display:none" aria-label="Power situation graph section"> <div class="gridModule-2up"> <div class="prognos_controls hidden" data-proggraph="1"> Show data for: <button value="1" onclick="this.blur();" type="button" class="btn btn--secondary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Yesterday</button> <button value="2" onclick="this.blur();" type="button" class="btn btn--tertiary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Today</button> <button value="3" onclick="this.blur();" type="button" class="btn btn--secondary prognosdaybutton"><span class="fa fa-clock-o" aria-hidden="true"></span> Tomorrow</button> </div> <table summary="Consumption" id="prognos_datatable_total" class="prognos_datatable scrollable"> <thead> <tr> <th data-sheets-numberformat="[null,1]"></th> <th data-sheets-value="[null,2,'17/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-17</th> <th data-sheets-numberformat="[null,1]"></th> <th data-sheets-value="[null,2,'18/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-18</th> <th data-sheets-numberformat="[null,1]"></th> <th data-sheets-value="[null,2,'19/02/2020']" data-sheets-numberformat="[null,1]" scope="col">2020-02-19</th> </tr>
2 回答
呼如林
TA贡献1798条经验 获得超3个赞
问题出在转义字符上。
from bs4 import BeautifulSoup
with open("sample.html", "r") as f:
contents = f.read()
soup = BeautifulSoup(contents, 'lxml')
extract = soup.find("table")
# added strip() to remove leading and trailing characters
table = [[item.text.strip() for item in row_data.select("th,td")]
for row_data in extract.select("tr")]
for item in table:
print(' '.join(item))
烙印99
TA贡献1829条经验 获得超13个赞
尝试在这里与熊猫一起去。它在引擎盖下使用美丽的soup。我无法在您的URL上进行测试,因为您没有提供。
import pandas as pd
url = 'myURLlink'
df = pd.read_html(url)[1]
df.to_csv('file.csv', index=False)
print (df.to_string())
添加回答
举报
0/150
提交
取消