为了账号安全,请及时绑定邮箱和手机立即绑定

使用 BeautifulSoup 获取跨度之间的文本

使用 BeautifulSoup 获取跨度之间的文本

慕的地6264312 2021-07-05 05:11:27
我正在尝试使用 Python 中的 BeautifulSoup 抓取各种网站。假设我有以下html摘录:<div class="member_biography"><h3>Biography</h3><span class="sub_heading">District:</span> AnyState - At Large<br/><span class="sub_heading">Political Highlights:</span> AnyTown City Council, 19XX-XX<br/><span class="sub_heading">Born:</span> June X, 19XX; AnyTown, Calif.<br/><span class="sub_heading">Residence:</span> Some Town<br/><span class="sub_heading">Religion:</span> Episcopalian<br/><span class="sub_heading">Family:</span> Wife, Some Name; two children<br/><span class="sub_heading">Education:</span> Some State College, A.A. 19XX; Some Other State College, B.A. 19XX<br/><span class="sub_heading">Elected:</span> 19XX<br/></div>我需要结果采用以下格式:District:              AnyState - At LargePolitical Highlights:  AnyTown City Council, 19XX-XXBorn:                  June X, 19XX; AnyTown, Calif.Residence:             Some TownReligion:              EpiscopalianFamily:                Wife, Some Name; two childrenEducation:             Some State College, A.A. 19XX; Some Other State College, B.A. 19XXElected:               19XX但是,到目前为止,我只能实现以下目标:District:Political Highlights:Born:Residence:Religion:Family:Education:Elected:使用以下代码:import urllib.requestimport sysfrom bs4 import BeautifulSoupdef main(url):    fp = urllib.request.urlopen(url)    site_bytearray = fp.read()    fp.close()    #bs_data = BeautifulSoup(site_str,features="html.parser")    bs_data = BeautifulSoup(site_bytearray,'lxml')    tmplist = bs_data.find_all('span',{'class':'sub_heading'})    for item in tmplist:        print(item.text)    sys.exit(0)if __name__ == "__main__":    main(sys.argv[1])总之,我如何提取District和AnyState - At Large从<span class="sub_heading">District:</span> AnyState - At Large<br/>在作进一步处理列表积累的结果?
查看完整描述

2 回答

?
慕桂英546537

TA贡献1848条经验 获得超10个赞

将您的打印命令替换为:


Python 3.6+:


print(f'{item.text:<25} {item.next_sibling}') 

Python 3 - 3.5:


print('{:<25} {}'.format(item.text, item.next_sibling))

输出:


District:                  AnyState - At Large

Political Highlights:      AnyTown City Council, 19XX-XX

Born:                      June X, 19XX; AnyTown, Calif.

Residence:                 Some Town

Religion:                  Episcopalian

Family:                    Wife, Some Name; two children

Education:                 Some State College, A.A. 19XX; Some Other State College, B.A. 19XX

Elected:                   19XX


查看完整回答
反对 回复 2021-07-06
  • 2 回答
  • 0 关注
  • 84 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号