首页猿问使用正则表达式通配符获取没有周围文...

使用正则表达式通配符获取没有周围文本的标签

慕运维8079593 2023-07-26 10:06:55

我试图在下面获取“完成”值，该值位于分块 http 流末尾返回的字节切片中。X-sync-status: done\r\n这是我到目前为止所做的 go 正则表达式syncStatusRegex = regexp.MustCompile("(?i)X-sync-status:(.*)\r\n")我只是想让它返回这一点(.*)这是获取状态的代码syncStatus := strings.TrimSpace(string(syncStatusRegex.Find(body))) fmt.Println(syncStatus)如何让它只返回“完成”而不返回标题？

查看完整描述

1 回答

慕少森

TA贡献2019条经验获得超9个赞

您想要实现的是访问捕获组。我更喜欢命名捕获组，并且有一个非常简单的辅助函数可以处理这个问题：

package main

import (

"fmt"

"regexp"

)

// Our example input

const input = "X-sync-status: done\r\n"

// We anchor the regex to the beginning of a line with "^".

// Then we have a fixed string until our capturing group begins.

// Within our capturing group, we want to have all consecutive non-whitespace,

// non-control characters following.

const regexString = `(?i)^X-sync-status: (?P<status>\w*)`

// We ensure our regexp is valid and can be used.

var syncStatusRegexp *regexp.Regexp = regexp.MustCompile(regexString)

// The helper function...

func namedResults(re *regexp.Regexp, in string) map[string]string {

// ... does the matching

match := re.FindStringSubmatch(in)

result := make(map[string]string)

// and puts the value for each named capturing group

// into the result map

for i, name := range re.SubexpNames() {

if i != 0 && name != "" {

result[name] = match[i]

}

return result

}

func main() {

fmt.Println(namedResults(syncStatusRegexp, input)["status"])

}

Run on playground

注意您当前的正则表达式有些错误，因为您也会捕获空格。使用当前的正则表达式，结果将是“done”而不是“done”。

编辑：当然，如果没有正则表达式，您可以更便宜地做到这一点：

fmt.Print(strings.Trim(strings.Split(input, ":")[1], " \r\n"))

Run on playground

Edit2我很好奇 split 方法便宜多少，因此我想出了非常粗略的方法：

package main

import (

"fmt"

"log"

"regexp"

"strings"

)

// Our example input

const input = "X-sync-status: done\r\n"

// We anchor the regex to the beginning of a line with "^".

// Then we have a fixed string until our capturing group begins.

// Within our capturing group, we want to have all consecutive non-whitespace,

// non-control characters following.

const regexString = `(?i)^X-sync-status: (?P<status>\w*)`

// We ensure our regexp is valid and can be used.

var syncStatusRegexp *regexp.Regexp = regexp.MustCompile(regexString)

func statusBySplit(in string) string {

return strings.Trim(strings.Split(input, ":")[1], " \r\n")

}

func statusByRegexp(re *regexp.Regexp, in string) string {

return re.FindStringSubmatch(in)[1]

}

[...]

和一个小基准：

package main

import "testing"

func BenchmarkRegexp(b *testing.B) {

for i := 0; i < b.N; i++ {

statusByRegexp(syncStatusRegexp, input)

}

func BenchmarkSplit(b *testing.B) {

for i := 0; i < b.N; i++ {

statusBySplit(input)

}

然后，我让它们分别在 1 个、2 个和 4 个可用的 CPU 上运行 5 次。恕我直言，结果非常有说服力：

go test -run=^$ -test.bench=. -test.benchmem -test.cpu 1,2,4 -test.count=5

goos: darwin

goarch: amd64

pkg: github.com/mwmahlberg/so-regex

BenchmarkRegexp 5000000 383 ns/op 32 B/op 1 allocs/op

BenchmarkRegexp 5000000 382 ns/op 32 B/op 1 allocs/op

BenchmarkRegexp 5000000 384 ns/op 32 B/op 1 allocs/op

BenchmarkRegexp-2 5000000 384 ns/op 32 B/op 1 allocs/op

BenchmarkRegexp-2 5000000 382 ns/op 32 B/op 1 allocs/op

BenchmarkRegexp-2 5000000 384 ns/op 32 B/op 1 allocs/op

BenchmarkRegexp-2 5000000 382 ns/op 32 B/op 1 allocs/op

BenchmarkRegexp-4 5000000 382 ns/op 32 B/op 1 allocs/op

BenchmarkRegexp-4 5000000 380 ns/op 32 B/op 1 allocs/op

BenchmarkRegexp-4 5000000 377 ns/op 32 B/op 1 allocs/op

BenchmarkSplit 10000000 161 ns/op 80 B/op 3 allocs/op

BenchmarkSplit 10000000 164 ns/op 80 B/op 3 allocs/op

BenchmarkSplit 10000000 165 ns/op 80 B/op 3 allocs/op

BenchmarkSplit 10000000 162 ns/op 80 B/op 3 allocs/op

BenchmarkSplit-2 10000000 159 ns/op 80 B/op 3 allocs/op

BenchmarkSplit-2 10000000 167 ns/op 80 B/op 3 allocs/op

BenchmarkSplit-2 10000000 161 ns/op 80 B/op 3 allocs/op

BenchmarkSplit-2 10000000 159 ns/op 80 B/op 3 allocs/op

BenchmarkSplit-4 10000000 159 ns/op 80 B/op 3 allocs/op

BenchmarkSplit-4 10000000 161 ns/op 80 B/op 3 allocs/op

BenchmarkSplit-4 10000000 159 ns/op 80 B/op 3 allocs/op

BenchmarkSplit-4 10000000 160 ns/op 80 B/op 3 allocs/op

PASS

ok github.com/mwmahlberg/so-regex 61.340s

它清楚地表明，在拆分标签的情况下，实际使用拆分的速度是预编译正则表达式的两倍多。对于您的用例，我显然会选择使用 split。

反对回复 2023-07-26

1 回答
0 关注
76 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

使用正则表达式通配符获取没有周围文本的标签

使用正则表达式通配符获取没有周围文本的标签

1 回答

添加回答