我正在使用jsoup解析html并想要在body标签内提取innerHtml到目前为止,我尝试并使用document.body.childern()。outerHtml; 但它只提供html元素并在正文内部跳过浮动文本(不包含在任何html标记内)private String getBodyTag(final Document document) {
return document.body().children().outerHtml();}输入:<!DOCTYPE html><html lang="de">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" type="text/css" href="assets/style.css">
</head>
<body>
<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>
some sample raw/floating text </body></html>预期:<div>questions to improve formatting and clarity.</div><h3>Guided Mode</h3> some sample raw/floating text实际:<div>questions to improve formatting and clarity.</div><h3>Guided Mode</h3>
添加回答
举报
0/150
提交
取消