1 回答
TA贡献1827条经验 获得超4个赞
你有基本的想法。说“保存到内存”时要小心。NumPy 数组保存在内存 (RAM) 中。HDF5 数据保存在磁盘上(而不是内存/RAM!),然后访问(使用的内存取决于您的访问方式)。在第一步中,您将创建数据块并将其写入磁盘。在第二步中,您将分块访问磁盘中的数据。最后提供的工作示例。
使用h5py2 种读取数据的方式读取数据时:
返回 NumPy 数组:
myArrayNP = myArray[:,:,:]
返回 h5py 数据集对象,其操作类似于 NumPy 数组:
myArrayDS = myArray
区别:h5py 数据集对象不会一次全部读入内存。然后,您可以根据需要对它们进行切片。从上面继续,这是获取数据子集的有效操作:
myArrayChunkNP = myArrayDS[i*chunkSize):(i+1)*chunkSize),:,:]
我的示例还纠正了块大小增量方程中的 1 个小错误。你有:
myArray[(i*chunkSize):(i*(chunkSize+1)),:,:] = myArrayChunk
你想要:
myArray[(i*chunkSize):(i+1)*chunkSize),:,:] = myArrayChunk
工作示例(写入和读取):
import h5py
import numpy as np
# Make the file
with h5py.File("SO_61173314.h5", "w") as h5w:
numberOfChunks = 3
chunkSize = 4
print( 'WRITING %d chunks with w/ chunkSize=%d ' % (numberOfChunks,chunkSize) )
# Write dataset to disk
h5Array = h5w.create_dataset("myArray", (numberOfChunks*chunkSize,2,2), compression="gzip")
for i in range(numberOfChunks):
h5ArrayChunk = np.random.random(chunkSize*2*2).reshape(chunkSize,2,2)
print (h5ArrayChunk)
h5Array[(i*chunkSize):((i+1)*chunkSize),:,:] = h5ArrayChunk
with h5py.File("SO_61173314.h5", "r") as h5r:
print( '/nREADING %d chunks with w/ chunkSize=%d/n' % (numberOfChunks,chunkSize) )
# Access myArray dataset - Note: This is NOT a NumpPy array
myArray = h5r['myArray']
for i in range(numberOfChunks):
# Read a chunk into memory (as a NumPy array)
myArrayChunk = myArray[(i*chunkSize):((i+1)*chunkSize),:,:]
# ... Do some calculation on myArrayChunk
print (myArrayChunk)
添加回答
举报