我正在尝试解析 3 个不同的 RSS 源,这些是源。https://www.nba.com/bucks/rss.xmlhttp://www.espn.com/espn/rss/ncb/newshttp://rss.nytimes.com/services/xml/rss/nyt/ProBasketball.xml在大多数情况下,所有这三个来源的结构都相似,除了 url我正在尝试将这些解析为以下 Feed 对象,class Feed(Base): title = models.CharField(db_index=True, unique=True, max_length=255) link = models.CharField(db_index=True, max_length=255, ) summary = models.TextField(null=True) author = models.CharField(null=True, max_length=255) url = models.CharField(max_length=512, null=True) published = models.DateTimeField() source = models.ForeignKey(Source, on_delete=models.CASCADE, null=True)这是源对象,class Source(Base): name = models.CharField(db_index=True, max_length=255) link = models.CharField(db_index=True, max_length=255, unique=True)这是我用来解析的代码,import loggingimport xml.etree.ElementTree as ETimport requestsimport mayafrom django.utils import timezonefrom aggregator.models import Feedclass ParseFeeds: @staticmethod def parse(source): logger = logging.getLogger(__name__) logger.info("Starting {}".format(source.name)) root = ET.fromstring(requests.get(source.link).text) items = root.findall(".//item") for item in items: title = '' if item.find('title'): title = item.find('title').text link = '' if item.find('link'): link = item.find('link').text description = '' if item.find('description'): description = item.find('description').text author = '' if item.find('author'): author = item.find('author').text published = timezone.now()虽然我可以在 python 控制台上解析这些源中的每一个,但此处创建的提要对象以所有None或默认字段结束。我在这里做错了什么。
1 回答
慕哥9229398
TA贡献1877条经验 获得超6个赞
你应该使用
for item in items:
title = ''
if item.find('title') is not None: # The "is not None" part is critical here.
title = item.find('title').text
# And so on ...
如果您在终端中尝试
bool(item.find('title')) # This is False
item.find('title') is not None # while this is True
每次你想检查某事是否为 None 时,请使用if something is None构造。
添加回答
举报
0/150
提交
取消