在 Numba 中，如何在面向 CUDA 时将数组复制到常量内存中？

我有一个示例代码来说明这个问题：import numpy as npfrom numba import cuda, typesimport configsdef main(): arr = np.empty(0, dtype=np.uint8) stream = cuda.stream() d_arr = cuda.to_device(arr, stream=stream) kernel[configs.BLOCK_COUNT, configs.THREAD_COUNT, stream](d_arr)@cuda.jit(types.void( types.Array(types.uint8, 1, 'C'),), debug=configs.CUDA_DEBUG)def kernel(d_arr): arr = cuda.const.array_like(d_arr)if __name__ == "__main__": main()当我用 cuda-memcheck 运行这段代码时，我得到：numba.errors.ConstantInferenceError: Failed in nopython mode pipeline (step: nopython rewrites)Constant inference not possible for: arg(0, name=d_arr)这似乎表明我传入的数组不是常量，因此无法将其复制到常量内存中——是这样吗？如果是这样，我如何将作为输入提供给内核的数组复制到常量内存？

查看完整描述

1 回答

守着一只汪

TA贡献1872条经验获得超3个赞

您不会使用作为输入提供给内核的数组复制到常量数组。该类型的输入数组已在设备中，设备代码无法写入常量内存。

常量内存只能从主机代码写入，并且常量语法期望数组是主机数组。

这是一个例子：

$ cat t32.py

import numpy as np

from numba import cuda, types, int32, int64

a = np.ones(3,dtype=np.int32)

@cuda.jit

def generate_mutants(b):

c_a = cuda.const.array_like(a)

b[0] = c_a[0]

if __name__ == "__main__":

b = np.zeros(3,dtype=np.int32)

generate_mutants[1, 1](b)

print(b)

$ python t32.py

[1 0 0]

请注意，与 CUDA C/C++ 相比，Numba CUDA 中常量内存的实现有一些行为差异，这个问题突出了其中的一些差异。

反对回复 2023-05-09

热搜

最近搜索清空

在 Numba 中，如何在面向 CUDA 时将数组复制到常量内存中？

在 Numba 中，如何在面向 CUDA 时将数组复制到常量内存中？

1 回答

添加回答