3 回答
TA贡献1900条经验 获得超5个赞
>> A = rand(1024); gA = gpuArray(A);% warm up by executing the operations a couple of times, and then:>> tic, C = A * A; tocElapsed time is 0.075396 seconds.>> tic, gC = gA * gA; tocElapsed time is 0.008621 seconds.
gpuArray
使用R2014a更新timeit
gputimeit
>> A = rand(1024); gA = gpuArray(A);>> timeit(@()A*A)ans = 0.0324>> gputimeit(@()gA*gA)ans = 0.0022
使用R2018b更新
>> timeit(@()A*A)ans = 0.0229>> gputimeit(@()gA*gA)ans = 4.8019e-04
TA贡献1863条经验 获得超2个赞
历史:
BLAS:
争取更好的业绩:
矩阵乘法的技术细节:
dgemm
matice2[m][k]
matice2[0][k]
matice2[1][k]
matice2
8*1024*1024
timer.start();float temp = 0;//transpose matice2for (int p = 0; p < rozmer; p++){ for (int q = 0; q < rozmer; q++) { tempmat[p][q] = matice2[q][p]; }}for(int j = 0; j < rozmer; j++){ for (int k = 0; k < rozmer; k++) { temp = 0; for (int m = 0; m < rozmer; m++) { temp = temp + matice1[j][m] * tempmat[k][m]; } matice3[j][k] = temp; }}timer.stop();
dgemm
- 3 回答
- 0 关注
- 1893 浏览
添加回答
举报