我可以在 python 中应用多线程来执行计算密集型任务吗？

更新：为了节省您的时间，我直接在这里给出答案。如果你使用纯Python编写代码， Python不能同时利用多CPU核心。但是Python在调用一些用C编写的函数或包时可以同时利用多核，例如Numpy等。我听说“ Python中的多线程并不是真正的多线程，因为有GIL ”。我还听说“ Python多线程可以处理IO密集型任务而不是计算密集型任务，因为只有一个线程同时运行”。但我的经历让我重新思考这个问题。我的经验表明，即使对于计算密集型任务，Python 多线程也可以几乎很快地加速计算。（在使用多线程之前，运行以下程序花费了 300 秒，使用多线程之后，花费了 100 秒。）python下图显示，使用CPythonpackage编译器创建了5个线程threading，几乎cpu cores都是100%的百分比。我认为截图可以证明5个cpu核心同时运行。那么有人可以给我解释吗？我可以在 python 中应用多线程来执行计算密集型任务吗？或者在Python中可以同时运行多线程/核心吗？我的代码：import threadingimport timeimport numpy as npfrom scipy import interpolatenumber_list = list(range(10))def image_interpolation(): while True: number = None with threading.Lock(): if len(number_list): number = number_list.pop() if number is not None: # Make a fake image - you can use yours. image = np.ones((20000, 20000)) # Make your orig array (skipping the extra dimensions). orig = np.random.rand(12800, 16000) # Make its coordinates; x is horizontal. x = np.linspace(0, image.shape[1], orig.shape[1]) y = np.linspace(0, image.shape[0], orig.shape[0]) # Make the interpolator function. f = interpolate.interp2d(x, y, orig, kind='linear') else: return 1workers=5thd_list = []t1 = time.time()for i in range(workers): thd = threading.Thread(target=image_interpolation) thd.start() thd_list.append(thd)for thd in thd_list: thd.join()t2 = time.time()print("total time cost with multithreading: " + str(t2-t1))number_list = list(range(10))for i in range(10): image_interpolation()t3 = time.time()print("total time cost without multithreading: " + str(t3-t2))输出是：total time cost with multithreading: 112.71922039985657total time cost without multithreading: 328.45561170578003

查看完整描述

2 回答

慕森王

TA贡献1777条经验获得超3个赞

正如您提到的，Python 有一个“全局解释器锁”（GIL），可以防止Python 代码的两个线程同时运行。多线程可以加速 IO 密集型任务的原因是 Python 在侦听网络套接字或等待磁盘读取时释放 GIL。因此，GIL 不会阻止计算机同时完成两批工作，它会阻止同一 Python 进程中的两个 Python 线程同时运行。

在您的示例中，您使用 numpy 和 scipy。这些主要是用 C 编写的，并利用用 C/Fortran/Assembly 编写的库（BLAS、LAPACK 等）。当您对 numpy 数组执行操作时，类似于监听套接字，因为GIL 被释放。当 GIL 被释放并且 numpy 数组操作被调用时，numpy 开始决定如何执行工作。如果需要，它可以生成其他线程或进程，并且它调用的 BLAS 子例程可能会生成其他线程。如果您想从源代码编译 numpy，则可以在构建时准确配置是否/如何完成此操作。

因此，总而言之，您已经找到了规则的例外。如果您仅使用纯 Python 函数重复该实验，您将得到完全不同的结果。

反对回复 2023-09-19

四季花海

TA贡献1811条经验获得超5个赞

Python 线程是真正的线程，只是解释器中不能同时存在两个线程（这就是 GIL 的含义）。代码的本机部分可以很好地并行运行，而不会在多个线程上发生争用，只有当深入解释器时，它们才必须在彼此之间进行序列化。

仅将所有 CPU 核心加载到 100% 的事实并不能证明您正在“高效”使用机器。您需要确保 CPU 使用率不是由上下文切换引起的。

如果您切换到多处理而不是线程（它们非常相似），则不必双重猜测，但是在线程之间传递时必须对有效负载进行封送。

所以无论如何都需要测量一切。

反对回复 2023-09-19

热搜

最近搜索清空

我可以在 python 中应用多线程来执行计算密集型任务吗？

我可以在 python 中应用多线程来执行计算密集型任务吗？

2 回答

添加回答