我无法让 Pandas 以我想要的格式导出一些网络抓取数据。我想访问其中的每个 URLURLs并从该页面获取各种元素,并将它们放入具有指定列名的 Excel 电子表格中。然后我想访问下一个 URLURLs并将这些数据放在 Excel 工作表的下一行,这样我就有一个包含 6 列和三行数据的 Excel 工作表,每个植物一个(每个植物在一个单独的 URL 中) .目前我有一个错误,说ValueError: Length mismatch: Expected axis has 18 elements, new values have 6 elements新记录被水平放置在一起,而不是放在 Excel 中的新行上,而 Pandas 没有预料到这一点。有人可以帮忙吗?谢谢import csvimport pandas as pdfrom pandas import ExcelWriterfrom pandas import ExcelFileimport numpy as npfrom urllib2 import urlopenimport bs4from bs4 import BeautifulSoupURLs = ["http://adbioresources.org/map/ajax-single/27881","http://adbioresources.org/map/ajax-single/27967","http://adbioresources.org/map/ajax-single/27880"]mylist = []for plant in URLs: soup = BeautifulSoup(urlopen(plant),'lxml') table = soup.find_all('td') for td in table: mylist.append(td.text) heading2 = soup.find_all('h2') for h2 in heading2: mylist.append(h2.text) para = soup.find_all('p') for p in para: mylist.append(p.text)df = pd.DataFrame(mylist)transposed_df = df.Ttransposed_df.columns = ['Status','Type','Capacity','Feedstock','Address1','Address2']writer = ExcelWriter('Pandas-Example.xlsx')transposed_df.to_excel(writer,'Sheet1',index=False)writer.save()
添加回答
举报
0/150
提交
取消