已解决430363个问题，去搜搜看，总会有你想问的

在 Go 中匹配 html 标记之外的 html 文本的最佳方法是什么？

首页猿问在 Go 中匹配 html...

在 Go 中匹配 html 标记之外的 html 文本的最佳方法是什么？

小唯快跑啊 2022-04-26 10:48:37

我有一堆我正在解析的 html，<a>如果它们包含某些文本，我需要删除它们。通常，我会使用 Goquery，但我正在搜索的文本通常不在 html 标记本身的范围内。例如，这个 html：<html><body>This is the start. <a href="http://example.com/path">We don't want to match this text.</a><a href="http://www.example.com/another/path" style="font-family:Arial, Helvetica, 'sans-serif'; color:#838383;font-size:12px; line-height:14px"></a> match this text.<a href="blah">We also don't want to match this text</a></body></html>我正在使用这个正则表达式，但它失败并匹配我不想匹配的文本：(?is)<a[^>]+href=["'](?P<link>.*?)["']*.?> match this text\.https://regex101.com/r/iEXpqc/1

查看完整描述

1 回答

回首忆惘然

TA贡献1847条经验获得超11个赞

像这样，使用路径（不是去，但逻辑可以重新实现）：

xmlstarlet ed -d '//a[contains(text(), "want to match")]' file.html

输出

<?xml version="1.0"?>

<html>

<body>

This is the start.

<a href="http://www.example.com/another/path" style="font-family:Arial, Helvetica, 'sans-serif'; color:#838383;font-size:12px; line-height:14px"/> match this text.

</body>

</html>

笔记

-L如果要即时更换，请添加开关

反对回复 2022-04-26

1 回答
0 关注
212 浏览

关注

添加回答

0/150

提交

取消

微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号

热搜

最近搜索清空

在 Go 中匹配 html 标记之外的 html 文本的最佳方法是什么？

在 Go 中匹配 html 标记之外的 html 文本的最佳方法是什么？

1 回答

添加回答