3 回答
TA贡献1900条经验 获得超5个赞
>> A = rand(1024); gA = gpuArray(A);% warm up by executing the operations a couple of times, and then:>> tic, C = A * A; tocElapsed time is 0.075396 seconds.>> tic, gC = gA * gA; tocElapsed time is 0.008621 seconds.
gpuArray
使用R2014a更新timeitgputimeit
>> A = rand(1024); gA = gpuArray(A);>> timeit(@()A*A)ans = 0.0324>> gputimeit(@()gA*gA)ans = 0.0022
使用R2018b更新
>> timeit(@()A*A)ans = 0.0229>> gputimeit(@()gA*gA)ans = 4.8019e-04
TA贡献1863条经验 获得超2个赞
历史:
BLAS:
争取更好的业绩:
矩阵乘法的技术细节:
dgemm
matice2[m][k]matice2[0][k]matice2[1][k]matice28*1024*1024
timer.start();float temp = 0;//transpose matice2for (int p = 0; p < rozmer; p++){
for (int q = 0; q < rozmer; q++)
{
tempmat[p][q] = matice2[q][p];
}}for(int j = 0; j < rozmer; j++){
for (int k = 0; k < rozmer; k++)
{
temp = 0;
for (int m = 0; m < rozmer; m++)
{
temp = temp + matice1[j][m] * tempmat[k][m];
}
matice3[j][k] = temp;
}}timer.stop();dgemm
- 3 回答
- 0 关注
- 2085 浏览
添加回答
举报
