我正在使用具有 8 个内核的机器(具有“2,8 GHz Intel Core i7”处理器的 Mac),我可以看到正在运行fmt.Println(runtime.NumCPU()).我已经实现了一个非常简单的工作池模型来同时处理一些进入池的请求。进程类型是“CPU 密集型”,我想感受一下在给 GO 更多内核时性能会提高多少。所以代码如下func Run(poolSize int, workSize int, loopSize int, maxCores int) { runtime.GOMAXPROCS(maxCores) var wg sync.WaitGroup wg.Add(poolSize) defer wg.Wait() // this is the channel where we write the requests for work to be performed by the pool workStream := make(chan int) // cpuIntensiveWork simulates an CPU intensive process var cpuIntensiveWork = func(input int) { res := input for i := 0; i < loopSize; i++ { res = res + i } } // worker is the function that gets fired by the pool worker := func(wg *sync.WaitGroup, workStream chan int, id int) { defer wg.Done() for req := range workStream { cpuIntensiveWork(req) } } // launch the goroutines of the pool for i := 0; i < poolSize; i++ { go worker(&wg, workStream, i) } // feed the workStream until the end and then close the channel for workItemNo := 0; workItemNo < workSize; workItemNo++ { workStream <- workItemNo } close(workStream)}基准是这些var numberOfWorkers = 100var numberOfRequests = 1000var loopSize = 100000func Benchmark_1Core(b *testing.B) { for i := 0; i < b.N; i++ { Run(numberOfWorkers, numberOfRequests, loopSize, 1) }}func Benchmark_2Cores(b *testing.B) { for i := 0; i < b.N; i++ { Run(numberOfWorkers, numberOfRequests, loopSize, 2) }}func Benchmark_4Cores(b *testing.B) { for i := 0; i < b.N; i++ { Run(numberOfWorkers, numberOfRequests, loopSize, 4) }}func Benchmark_8Cores(b *testing.B) { for i := 0; i < b.N; i++ { Run(numberOfWorkers, numberOfRequests, loopSize, 8) }}运行基准测试我注意到,从 1 核到 2 核再到 4 核,性能几乎呈线性增长。但是我从 4 核到 8 核的性能差异非常有限。这是预期的行为吗?如果是这样,根本原因是什么?
1 回答
一只名叫tom的猫
TA贡献1906条经验 获得超3个赞
有了多核,事情就会变得有趣。最可能的解释是您没有八个内核,而是四个具有超线程的内核,这会给您带来更少的加速 - 有时根本没有。
要检查的另一个可能的解释是每个线程都使用大量内存,并且您的缓存内存不足。或者你达到了内存带宽饱和的地步,此时没有多少处理器可以帮助你。
- 1 回答
- 0 关注
- 131 浏览
添加回答
举报
0/150
提交
取消