3 回答

TA贡献1799条经验 获得超6个赞
您将需要dateutils 解析器。显然没有办法告诉哪个<td>有日期,所以你只需要遍历匹配的 tr 中的所有 td,并尝试解析日期时间,如果日期时间解析成功,只需将它附加到日期列表对于特定的 ID。在获得每个 ID 的所有日期后,您只需在它们上查找最新的日期即可。
from dateutil import parser as du_parser
from collections import defaultdict
from bs4 import BeautifulSoup as BS
data = "<tr><td class=\"success\"></td><td class=\"truncate\">ABCD</td><td>12/18/2018 21:45</td><td>12/18/2018 21:46</td><td>10</td><td>10</td><td>100.0</td><td><span class=\"label success\">Success</span></td><td>SMS</td><td><a data-id=\"134717\" class=\"btn\" title=\"Go\">View</a></td></tr>"
b1 = BS(data, "html.parser")
td_of_interest = b1.find_all("td")
tr_that_contain_our_td = [x.parent for x in b1.find_all("td", string="ABCD")]
ids_dict = defaultdict(list)
# iterate over matched tr's to get their dates
for tr in tr_that_contain_our_td:
extracted_id = tr.find("a")['data-id']
for td in tr.find_all("td"):
try:
if len(td.contents) > 0:
actual_date = du_parser.parse(td.contents[0])
ids_dict[extracted_id].append(actual_date)
except ValueError:
pass #nothing to do here
ids_dict = {k: max(v) for k, v in ids_dict.items()}
print(ids_dict)
添加回答
举报