1 回答
TA贡献1810条经验 获得超5个赞
很遗憾得知您不想遵循最佳实践,使用正则表达式解析 HTML 充满了问题。但是,如果您想要快速而肮脏的解决方法,请使用
<script type="application\/ld\+json">((?:(?!<\/?script)[\w\W])*?"@type":\s*"NewsArticle"[\w\W]*?)<\/script>
解释
--------------------------------------------------------------------------------
<script '<script type="application'
type="application
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
ld 'ld'
--------------------------------------------------------------------------------
\+ '+'
--------------------------------------------------------------------------------
json"> 'json">'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/? '/' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
script 'script'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[\w\W] any character of: word characters (a-
z, A-Z, 0-9, _), non-word characters
(all but a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
"@type": '"@type":'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
"NewsArticle" '"NewsArticle"'
--------------------------------------------------------------------------------
[\w\W]*? any character of: word characters (a-z,
A-Z, 0-9, _), non-word characters (all
but a-z, A-Z, 0-9, _) (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
script> 'script>'
添加回答
举报