为了账号安全,请及时绑定邮箱和手机立即绑定

Jsoup 选择带有许多标签的标签后的文本

Jsoup 选择带有许多标签的标签后的文本

牧羊人nacy 2021-12-01 15:45:39
我想使用jsoup在每个文本之后提取一个文本。有什么办法可以选择吗?示例代码如下:<div class="content"><div name="panel-summary" id="summary">    <p>    <strong>A: </strong>*thank you* **I want to retrieve this text**<br>    <strong>B: </strong>*Bla..bla* *I don't want this text*<br>    <strong>C: </strong>*what ever text* *I dont want this*                                 <strong>D: </strong>*anythinh text* *I want this*<br>        <strong>E: </strong>*Bla..bla* *I don't want this text*t<br>        <strong>F: </strong>*anythinh text* *I want this*<br>    </p>    <p>I want this</p>当它完成时,它会创建自动 ID 示例 id=123
查看完整描述

1 回答

?
青春有我

TA贡献1784条经验 获得超8个赞

如果我们可以假设<strong>您要查找的所有元素将始终包含A:orD:或F:then with strong:matchesOwn(regex)(其中正则表达式将表示A:|D:|F:),我们可以选择这些元素。


处理后,strong我们可以转到第二个<p>并通过text().


String html = "<div class=\"content\">\n" +

        "<div name=\"panel-summary\" id=\"summary\">\n" +

        "    <p>\n" +

        "    <strong>A: </strong>*thank you* **I want to retrieve this text**<br>\n" +

        "    <strong>B: </strong>*Bla..bla* *I don't want this text*<br>\n" +

        "    <strong>C: </strong>*what ever text* *I dont want this*                         \n" +

        "        <strong>D: </strong>*anythinh text* *I want this*<br>\n" +

        "        <strong>E: </strong>*Bla..bla* *I don't want this text*t<br>\n" +

        "        <strong>F: </strong>*anythinh text* *I want this*<br>\n" +

        "    </p>\n" +

        "\n" +

        "    <p>I want this</p>";


Document doc = Jsoup.parse(html);

Elements pElements = doc.select("#summary p");

Elements strongElements = pElements.first().select("strong:matchesOwn(A:|D:|F:)");

for (Element strong : strongElements) {

    System.out.println(strong.nextSibling());//get next element, including textual element

}

System.out.println("---");

System.out.println(pElements.get(1).text());//textual content of <p>I want this</p>

输出:


*thank you* **I want to retrieve this text**

*anythinh text* *I want this*

*anythinh text* *I want this*

---

I want this

如果您不想依赖于的内容,<strong>而只想依赖其索引,则选择所有这些,例如


Elements allStrElemens = doc.select("#summary p strong");

并通过它们的索引简单地选择你需要的那些(记住索引从 0 开始),比如


System.out.println(allStrElemens.get(0).nextSibling());

System.out.println(allStrElemens.get(3).nextSibling());

System.out.println(allStrElemens.get(5).nextSibling());


查看完整回答
反对 回复 2021-12-01
  • 1 回答
  • 0 关注
  • 224 浏览

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信