我正在尝试使用 Python 中的 BeautifulSoup 抓取各种网站。假设我有以下html摘录:<div class="member_biography"><h3>Biography</h3><span class="sub_heading">District:</span> AnyState - At Large<br/><span class="sub_heading">Political Highlights:</span> AnyTown City Council, 19XX-XX<br/><span class="sub_heading">Born:</span> June X, 19XX; AnyTown, Calif.<br/><span class="sub_heading">Residence:</span> Some Town<br/><span class="sub_heading">Religion:</span> Episcopalian<br/><span class="sub_heading">Family:</span> Wife, Some Name; two children<br/><span class="sub_heading">Education:</span> Some State College, A.A. 19XX; Some Other State College, B.A. 19XX<br/><span class="sub_heading">Elected:</span> 19XX<br/></div>我需要结果采用以下格式:District: AnyState - At LargePolitical Highlights: AnyTown City Council, 19XX-XXBorn: June X, 19XX; AnyTown, Calif.Residence: Some TownReligion: EpiscopalianFamily: Wife, Some Name; two childrenEducation: Some State College, A.A. 19XX; Some Other State College, B.A. 19XXElected: 19XX但是,到目前为止,我只能实现以下目标:District:Political Highlights:Born:Residence:Religion:Family:Education:Elected:使用以下代码:import urllib.requestimport sysfrom bs4 import BeautifulSoupdef main(url): fp = urllib.request.urlopen(url) site_bytearray = fp.read() fp.close() #bs_data = BeautifulSoup(site_str,features="html.parser") bs_data = BeautifulSoup(site_bytearray,'lxml') tmplist = bs_data.find_all('span',{'class':'sub_heading'}) for item in tmplist: print(item.text) sys.exit(0)if __name__ == "__main__": main(sys.argv[1])总之,我如何提取District和AnyState - At Large从<span class="sub_heading">District:</span> AnyState - At Large<br/>在作进一步处理列表积累的结果?
2 回答

慕桂英546537
TA贡献1848条经验 获得超10个赞
将您的打印命令替换为:
Python 3.6+:
print(f'{item.text:<25} {item.next_sibling}')
Python 3 - 3.5:
print('{:<25} {}'.format(item.text, item.next_sibling))
输出:
District: AnyState - At Large
Political Highlights: AnyTown City Council, 19XX-XX
Born: June X, 19XX; AnyTown, Calif.
Residence: Some Town
Religion: Episcopalian
Family: Wife, Some Name; two children
Education: Some State College, A.A. 19XX; Some Other State College, B.A. 19XX
Elected: 19XX
添加回答
举报
0/150
提交
取消