我正在尝试使用NumbaPro的cuda扩展来乘以大数组矩阵。我最后想要的是将大小为NxN的矩阵与对角矩阵相乘,该对角矩阵将作为一维矩阵输入(因此,a.dot(numpy.diagflat(b)),我发现它与a * b)。但是,我收到一个断言错误,它不提供任何信息。只有将两个1D数组矩阵相乘,我才能避免此断言错误,但这不是我想要的。from numbapro import vectorize, cudafrom numba import f4,f8import numpy as npdef generate_input(n): import numpy as np A = np.array(np.random.sample((n,n))) B = np.array(np.random.sample(n) + 10) return A, Bdef product(a, b): return a * bdef main(): cu_product = vectorize([f4(f4, f4), f8(f8, f8)], target='gpu')(product) N = 1000 A, B = generate_input(N) D = np.empty(A.shape) stream = cuda.stream() with stream.auto_synchronize(): dA = cuda.to_device(A, stream) dB = cuda.to_device(B, stream) dD = cuda.to_device(D, stream, copy=False) cu_product(dA, dB, out=dD, stream=stream) dD.to_host(stream)if __name__ == '__main__': main()这是我的终端吐出的内容:Traceback (most recent call last): File "cuda_vectorize.py", line 32, in <module> main() File "cuda_vectorize.py", line 28, in main cu_product(dA, dB, out=dD, stream=stream) File "/opt/anaconda1anaconda2anaconda3/lib/python2.7/site-packages/numbapro/_cudadispatch.py", line 109, in __call__ File "/opt/anaconda1anaconda2anaconda3/lib/python2.7/site-packages/numbapro/_cudadispatch.py", line 191, in _arguments_requirementAssertionError
2 回答
沧海一幻觉
TA贡献1824条经验 获得超5个赞
只是为了回弹所有这些考虑因素。我还想在CUDA上实现一些矩阵计算,但是后来听说了numpy.einsum函数。事实证明,einsum的速度非常快。在这种情况下,这是它的代码。但是它可以应用于许多类型的计算。
G = np.einsum('ij,j -> ij',A, B)
就速度而言,这是N = 10000的结果
Numpy took 8.387756 seconds
CUDA JIT took 0.218394 seconds, 38.41x speedup
EINSUM took 0.131751 seconds, 63.66x speedup
添加回答
举报
0/150
提交
取消