为了账号安全,请及时绑定邮箱和手机立即绑定

试图抓住某些元素

试图抓住某些元素

慕虎7371278 2022-01-18 15:41:08
我是 Python 中 lxml 模块的新手。我正在尝试从网站解析数据:https ://weather.com/weather/tenday/l/USCA1037:1:US我正在尝试获取以下内容:<span classname="narrative" class="narrative">  Cloudy. Low 49F. Winds WNW at 10 to 20 mph.</span>但是,我把我的 xpath 搞混了。准确地说,这条线的位置是//*[@id="twc-scrollabe"]/table/tbody/tr[4]/td[2]/span我尝试如下import requestsimport lxml.htmlfrom lxml import etreehtml = requests.get("https://weather.com/weather/tenday/l/USCA1037:1:US")element_object = lxml.html.fromstring(html.content)  # htmlelement object returns bytes  # element_object has root of <html>table = element_object.xpath('//div[@class="twc-table-scroller"]')[0]day_of_week = table.xpath('.//span[@class="date-time"]/text()')  # returns list of items from "dates-time"dates = table.xpath('.//span[@class="day-detail clearfix"]/text()')td = table.xpath('.//tbody/tr/td/span[contains(@class, "narrative")]')print td  # print td displays an empty list.  我希望我的程序也能解析“多云。低 49F。WNW 风速为 10 到 20 mph。”
查看完整描述

2 回答

?
交互式爱情

TA贡献1712条经验 获得超3个赞

有些<td>有title=说明


import requests

import lxml.html


html = requests.get("https://weather.com/weather/tenday/l/USCA1037:1:US")


element_object = lxml.html.fromstring(html.content)

table = element_object.xpath('//div[@class="twc-table-scroller"]')[0]


td = table.xpath('.//tr/td[@class="twc-sticky-col"]/@title')

print(td)

结果


['Mostly cloudy skies early, then partly cloudy after midnight. Low 48F. Winds SSW at 5 to 10 mph.', 

 'Mainly sunny. High 66F. Winds WNW at 5 to 10 mph.', 

 'Sunny. High 71F. Winds NW at 5 to 10 mph.', 

 'A mainly sunny sky. High 69F. Winds W at 5 to 10 mph.', 

 'Some clouds in the morning will give way to mainly sunny skies for the afternoon. High 67F. Winds WSW at 5 to 10 mph.', 

 'Considerable clouds early. Some decrease in clouds later in the day. High 67F. Winds WSW at 5 to 10 mph.', 

 'Partly cloudy. High near 65F. Winds WSW at 5 to 10 mph.', 

 'Cloudy skies early, then partly cloudy in the afternoon. High 61F. Winds WSW at 10 to 20 mph.', 

 'Sunny skies. High 62F. Winds WNW at 10 to 20 mph.', 

 'Mainly sunny. High 61F. Winds WNW at 10 to 20 mph.', 

 'Sunny along with a few clouds. High 64F. Winds WNW at 10 to 15 mph.', 

 'Mostly sunny skies. High around 65F. Winds WNW at 10 to 15 mph.', 

 'Mostly sunny skies. High 66F. Winds WNW at 10 to 20 mph.', 

 'Mainly sunny. High around 65F. Winds WNW at 10 to 20 mph.', 

 'A mainly sunny sky. High around 65F. Winds WNW at 10 to 20 mph.']

HTML中没有<tbody>,但 Web 浏览器可能会在 DevTool 中显示它 - 所以不要tbody在 xpath 中使用。


有些文字在,<span></span>但有些在<span><span></span></span>


import requests

import lxml.html


html = requests.get("https://weather.com/weather/tenday/l/USCA1037:1:US")


element_object = lxml.html.fromstring(html.content)

table = element_object.xpath('//div[@class="twc-table-scroller"]')[0]


td = table.xpath('.//tr/td//span/text()')

print(td)

结果


['Tonight', 'APR 21', 'Partly Cloudy', '--', '48', '10', '%', 'SSW 7 mph ', '85', '%', 

 'Mon', 'APR 22', 'Sunny', '66', '51', '10', '%', 'WNW 9 mph ', '67', '%', 

 'Tue', 'APR 23', 'Sunny', '71', '53', '0', '%', 'NW 8 mph ', '59', '%', 

 'Wed', 'APR 24', 'Sunny', '69', '52', '10', '%', 'W 9 mph ', '71', '%', 

 'Thu', 'APR 25', 'Partly Cloudy', '67', '51', '10', '%', 'WSW 9 mph ', '71', '%', 

 'Fri', 'APR 26', 'Partly Cloudy', '67', '51', '10', '%', 'WSW 9 mph ', '69', '%', 

 'Sat', 'APR 27', 'Partly Cloudy', '65', '50', '10', '%', 'WSW 9 mph ', '71', '%',   

 'Sun', 'APR 28', 'AM Clouds/PM Sun', '61', '49', '20', '%', 'WSW 13 mph ', '75', '%', 

 'Mon', 'APR 29', 'Sunny', '62', '48', '10', '%', 'WNW 14 mph ', '63', '%', 

 'Tue', 'APR 30', 'Sunny', '61', '49', '0', '%', 'WNW 14 mph ', '61', '%', 

 'Wed', 'MAY 1', 'Mostly Sunny', '64', '50', '0', '%', 'WNW 12 mph ', '60', '%', 

 'Thu', 'MAY 2', 'Mostly Sunny', '65', '50', '0', '%', 'WNW 12 mph ', '61', '%', 

 'Fri', 'MAY 3', 'Mostly Sunny', '66', '51', '0', '%', 'WNW 13 mph ', '61', '%', 

 'Sat', 'MAY 4', 'Sunny', '65', '51', '0', '%', 'WNW 14 mph ', '62', '%', 

 'Sun', 'MAY 5', 'Sunny', '65', '51', '0', '%', 'WNW 14 mph ', '63', '%']


查看完整回答
反对 回复 2022-01-18
?
繁星coding

TA贡献1797条经验 获得超4个赞

如果要抓取喜欢的文字Sunny. High 66F. Winds WNW at 5 to 10 mph.,可以从 的title 属性中获取<td>

这应该有效。

td = table.xpath('.//tbody/tr/td[@class="description"]/@title')


查看完整回答
反对 回复 2022-01-18
  • 2 回答
  • 0 关注
  • 147 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号