我试图从 www.hujjat.org 的网站上抓取祈祷时间。这是我感兴趣的区域的 html 部分(您可能已经注意到所有 4 个祈祷的 class 属性都相同):<table width="100%"> <tbody> <tr> <td class="NamaazTimes"> <div class="NamaazTimeName">Fajr</div> <div class="NamaazTime">04:42</div> </td> <td class="NamaazTimes"> <div class="NamaazTimeName">Sunrise</div> <div class="NamaazTime">06:32</div> </td> <td class="NamaazTimes"> <div class="NamaazTimeName">Zohr</div> <div class="NamaazTime">13:02</div> </td> <td class="NamaazTimes"> <div class="NamaazTimeName">Maghrib</div> <div class="NamaazTime">19:33</div> </td> </tr> </tbody></table>到目前为止,我已经编写了以下代码:# import librariesimport jsonimport urllib2from bs4 import BeautifulSoup# specify the urlquote_page = 'http://www.hujjat.org/'# query the website and return the html to the variable 'page'page = urllib2.urlopen(quote_page)# parse the html using beautiful soap and store in variable 'soup'soup = BeautifulSoup(page, 'html.parser')table = soup.find("div",class_="NamaazTimeName", text="Fajr").find_previous("table")for row in table.find_all("tr"): a = row.find_all("td") # print(row.find_all("td"))print (a)我的结果是:[<td class="NamaazTimes">\n<div class="NamaazTimeName">Fajr</div>\n<div class="NamaazTime">04:42</div>\n</td>, <td class="NamaazTimes">\n<div class="NamaazTimeName">Sunrise</div>\n<div class="NamaazTime">06:32</div>\n</td>, <td class="NamaazTimes">\n<div class="NamaazTimeName">Zohr</div>\n<div class="NamaazTime">13:02</div>\n</td>, <td class="NamaazTimes">\n<div class="NamaazTimeName">Maghrib</div>\n<div class="NamaazTime">19:33</div>\n</td>]我想从我的代码中得到的只是每个祈祷的时间,例如,如果是“Fajr”祈祷,那么输出应该是“04:42”。然后我想将这个“04:42”保存在一个文本文件中。有谁可以帮助我吗?
3 回答
慕工程0101907
TA贡献1887条经验 获得超5个赞
from bs4 import BeautifulSoup
import pandas as pd
data = BeautifulSoup(#HTML data)
NamaazName = data.find_all('div', {'class':'NamaazTimeName'})
NamaazTime = data.find_all('div', {'class':'NamaazTime'})
for i in range(len(NamaazName)):
coll[NamaazName[i].text] = NamaazTime[i].text
master_data.columns=pd.DataFrame()
master_data['NamaazName'] = coll.keys()
master_data['NamaazTime'] = coll.values()
print(master_data)
输出
Nammaz NammazTime
0 Fajr 04:42
1 Sunrise 06:32
2 Zohr 13:02
3 Maghrib 19:33
添加回答
举报
0/150
提交
取消