1 回答
TA贡献1784条经验 获得超8个赞
如果我们可以假设<strong>您要查找的所有元素将始终包含A:orD:或F:then with strong:matchesOwn(regex)(其中正则表达式将表示A:|D:|F:),我们可以选择这些元素。
处理后,strong我们可以转到第二个<p>并通过text().
String html = "<div class=\"content\">\n" +
"<div name=\"panel-summary\" id=\"summary\">\n" +
" <p>\n" +
" <strong>A: </strong>*thank you* **I want to retrieve this text**<br>\n" +
" <strong>B: </strong>*Bla..bla* *I don't want this text*<br>\n" +
" <strong>C: </strong>*what ever text* *I dont want this* \n" +
" <strong>D: </strong>*anythinh text* *I want this*<br>\n" +
" <strong>E: </strong>*Bla..bla* *I don't want this text*t<br>\n" +
" <strong>F: </strong>*anythinh text* *I want this*<br>\n" +
" </p>\n" +
"\n" +
" <p>I want this</p>";
Document doc = Jsoup.parse(html);
Elements pElements = doc.select("#summary p");
Elements strongElements = pElements.first().select("strong:matchesOwn(A:|D:|F:)");
for (Element strong : strongElements) {
System.out.println(strong.nextSibling());//get next element, including textual element
}
System.out.println("---");
System.out.println(pElements.get(1).text());//textual content of <p>I want this</p>
输出:
*thank you* **I want to retrieve this text**
*anythinh text* *I want this*
*anythinh text* *I want this*
---
I want this
如果您不想依赖于的内容,<strong>而只想依赖其索引,则选择所有这些,例如
Elements allStrElemens = doc.select("#summary p strong");
并通过它们的索引简单地选择你需要的那些(记住索引从 0 开始),比如
System.out.println(allStrElemens.get(0).nextSibling());
System.out.println(allStrElemens.get(3).nextSibling());
System.out.println(allStrElemens.get(5).nextSibling());
添加回答
举报