首页猿问 XML 解析返回带有换行符的字符串

XML 解析返回带有换行符的字符串

MM们 2023-06-05 17:04:27

我正在尝试通过站点地图解析 XML，然后遍历地址以获取 Go 中帖子的详细信息。但是我收到了这个奇怪的错误：: URL 中的第一个路径段不能包含冒号这是代码片段：type SitemapIndex struct { Locations []Location `xml:"sitemap"`}type Location struct { Loc string `xml:"loc"`}func (l Location) String() string { return fmt.Sprintf(l.Loc)}func main() { resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml") bytes, _ := ioutil.ReadAll(resp.Body) var s SitemapIndex xml.Unmarshal(bytes, &s) for _, Location := range s.Locations { fmt.Printf("Location: %s", Location.Loc) resp, err := http.Get(Location.Loc) fmt.Println("resp", resp) fmt.Println("err", err) }}输出：Location: https://www.washingtonpost.com/news-sitemaps/politics.xmlresp <nil>err parse https://www.washingtonpost.com/news-sitemaps/politics.xml: first path segment in URL cannot contain colonLocation: https://www.washingtonpost.com/news-sitemaps/opinions.xmlresp <nil>err parse https://www.washingtonpost.com/news-sitemaps/opinions.xml: first path segment in URL cannot contain colon......我的猜测是Location.Loc在实际地址之前和之后返回一个新行。例如：\nLocation: https://www.washingtonpost.com/news-sitemaps/politics.xml\n因为硬编码 URL 按预期工作：for _, Location := range s.Locations { fmt.Printf("Location: %s", Location.Loc) test := "https://www.washingtonpost.com/news-sitemaps/politics.xml" resp, err := http.Get(test) fmt.Println("resp", resp) fmt.Println("err", err) }但是我是 Go 的新手，所以我不知道出了什么问题。你能告诉我我哪里错了吗？

查看完整描述

2 回答

子衿沉夜

TA贡献1828条经验获得超3个赞

您确实是对的，问题来自换行符。如您所见，您在使用时Printf没有添加任何内容\n，并且在输出的开头添加了一个，在输出的结尾添加了一个。

您可以使用strings.Trim删除这些换行符。这是一个使用您尝试解析的站点地图的示例。修剪字符串后，您将能够http.Get毫无错误地调用它。

func main() {

var s SitemapIndex

xml.Unmarshal(bytes, &s)

for _, Location := range s.Locations {

loc := strings.Trim(Location.Loc, "\n")

fmt.Printf("Location: %s\n", loc)

}

如预期的那样，此代码正确输出没有任何换行符的位置：

Location: https://www.washingtonpost.com/news-sitemaps/politics.xml

Location: https://www.washingtonpost.com/news-sitemaps/opinions.xml

Location: https://www.washingtonpost.com/news-sitemaps/local.xml

Location: https://www.washingtonpost.com/news-sitemaps/sports.xml

Location: https://www.washingtonpost.com/news-sitemaps/national.xml

Location: https://www.washingtonpost.com/news-sitemaps/world.xml

Location: https://www.washingtonpost.com/news-sitemaps/business.xml

Location: https://www.washingtonpost.com/news-sitemaps/technology.xml

Location: https://www.washingtonpost.com/news-sitemaps/lifestyle.xml

Location: https://www.washingtonpost.com/news-sitemaps/entertainment.xml

Location: https://www.washingtonpost.com/news-sitemaps/goingoutguide.xml

字段中有这些换行符的原因Location.Loc是此 URL 返回的 XML。条目遵循这种形式：

<loc>

https://www.washingtonpost.com/news-sitemaps/goingoutguide.xml

</loc>

</sitemap>

正如您所看到的，元素中的内容前后都有换行符loc。

反对回复 2023-06-05

BIG阳

TA贡献1859条经验获得超6个赞

查看修改代码中嵌入的注释以描述和修复问题

func main() {

resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")

bytes, _ := ioutil.ReadAll(resp.Body)

var s SitemapIndex

xml.Unmarshal(bytes, &s)

for _, Location := range s.Locations {

// Note that %v shows that there are indeed newlines at beginning and end of Location.Loc

fmt.Printf("Location: (%v)", Location.Loc)

// solution: use strings.TrimSpace to remove newlines from Location.Loc

resp, err := http.Get(strings.TrimSpace(Location.Loc))

fmt.Println("resp", resp)

fmt.Println("err", err)

}

反对回复 2023-06-05

2 回答
0 关注
204 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

XML 解析返回带有换行符的字符串

XML 解析返回带有换行符的字符串

2 回答

添加回答