为了账号安全,请及时绑定邮箱和手机立即绑定

使用 Python 在包含给定单词的标签之间提取文本

使用 Python 在包含给定单词的标签之间提取文本

跃然一笑 2021-10-19 16:53:45
我有一些来自 XML 文档的文本,我试图在其中提取包含某些单词的标签中的文本。例如下面:search('adverse')应该返回包含单词“adverse”的所有标签的文本Out:   [    "<item>The most common adverse reactions reported in subjects receiving coadministered dutasteride and tamsulosin were impotence, decreased libido, breast disorders (including breast enlargement and tenderness), ejaculation disorders, and dizziness.</item>"  ]和 search('clinical')应该返回两个结果,因为两个标签包含这些词。Out:   [    "<title>6.1 Clinical Trials Experience</title>",     "<paragraph id="ID41">The clinical efficacy and safety of coadministered dutasteride and tamsulosin, which are individual components of dutasteride and tamsulosin hydrochloride capsules, have been evaluated in a multicenter, randomized, double-blind, parallel group trial (the Combination with Alpha-Blocker Therapy, or CombAT, trial) </paragraph>"  ]为此我应该使用哪些工具?正则表达式?BS4?任何建议都非常感谢。示例文本: </highlight> </excerpt> <component> <section id="ID40"> <id root="fbc21d1a-2fb2-47b1-ac53-f84ed1428bb4"></id> <title>6.1 Clinical Trials Experience</title> <text> <paragraph id="ID41">The clinical efficacy and safety of coadministered dutasteride and tamsulosin, which are individual components of dutasteride and tamsulosin hydrochloride capsules, have been evaluated in a multicenter, randomized, double-blind, parallel group trial (the Combination with Alpha-Blocker Therapy, or CombAT, trial) </paragraph> <list id="ID42" listtype="unordered" stylecode="Disc"> <item>The most common adverse reactions reported in subjects receiving coadministered dutasteride and tamsulosin were impotence, decreased libido, breast disorders (including breast enlargement and tenderness), ejaculation disorders, and dizziness.</item>
查看完整描述

1 回答

  • 1 回答
  • 0 关注
  • 184 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号