多个 goroutine 访问/修改列表/地图

猛跑小猪 2021-12-13 18:45:09

我正在尝试使用 go lang 作为示例任务来实现一个多线程爬虫来学习语言。它应该扫描页面，跟踪链接并将它们保存到数据库中。为了避免重复，我尝试使用 map 来保存我已经保存的所有 URL。同步版本工作正常，但是当我尝试使用 goroutine 时遇到了麻烦。我正在尝试将互斥锁用作地图的同步对象，并将通道用作协调 goroutine 的一种方式。但显然我对它们没有清楚的了解。问题是我有很多重复的条目，所以我的地图存储/检查无法正常工作。有人可以向我解释如何正确执行此操作吗？

查看完整描述

1 回答

紫衣仙女

TA贡献1839条经验获得超15个赞

好吧，您有两个选择，对于一些简单的实现，我建议将地图上的操作分离到一个单独的结构中。

// Index is a shared page index

type Index struct {

access sync.Mutex

pages map[string]bool

}

// Mark reports that a site have been visited

func (i Index) Mark(name string) {

i.access.Lock()

i.pages[name] = true

i.access.Unlock()

}

// Visited returns true if a site have been visited

func (i Index) Visited(name string) bool {

i.access.Lock()

defer i.access.Unlock()

return i.pages[name]

}

然后，添加另一个结构，如下所示：

// Crawler is a web spider :D

type Crawler struct {

index Index

/* ... other important stuff like visited sites ... */

}

// Crawl looks for content

func (c *Crawler) Crawl(site string) {

// Implement your logic here

// For example:

if !c.index.Visited(site) {

c.index.Mark(site) // When marked

}

这样你就可以让事情变得清晰明了，可能会多一点代码，但绝对更具可读性。您需要像这样实例爬虫：

sameIndex := Index{pages: make(map[string]bool)}

asManyAsYouWant := Crawler{sameIndex, 0} // They will share sameIndex

如果您想更深入地使用高级解决方案，那么我会推荐生产者/消费者架构。

反对回复 2021-12-13

热搜

最近搜索清空

多个 goroutine 访问/修改列表/地图

多个 goroutine 访问/修改列表/地图

1 回答

添加回答