我想提取网页,如: https://www.glassdoor.com/Overview/Working-at-Apple-EI_IE1138.11,16.htm,所以我想以以下格式返回结果。Website Headquarters Size Revenue Typewww.apple.com Cupertino, CA 10000+ employees $10+ billion (USD) per year Company - Public (AAPL)然后我使用下面的代码beatifulsoup来得到这个。all_href = com_soup.find_all('span', {'class': re.compile('value')})all_href = list(set(all_href))它返回带有<span>. 此外,它没有在下面显示标签<label>[<span class="value"> Computer Hardware & Software</span>, <span class="value"> Company - Public (AAPL) </span>, <span class="value">10000+ employees</span>, <span class="value"> $10+ billion (USD) per year</span>, <span class="value-title" title="4.0"></span>, <span class="value">Cupertino, CA</span>, <span class="value"> 1976</span>, <span class="value-title" title="5.0"></span>, <span class="value website"><a class="link" href="http://www.apple.com" rel="nofollow noreferrer" target="_blank">www.apple.com</a></span>]
添加回答
举报
0/150
提交
取消