3 回答
TA贡献1871条经验 获得超13个赞
您可以估计剩余基础上的时间消耗的压缩而不是数据,生产的未压缩数据。如果数据相对同质,结果将大致相同。(如果不是,那么无论如何使用输入或输出都不会给出准确的估计。)
您可以轻松找到压缩文件的大小,并使用到目前为止在压缩数据上花费的时间来估计处理剩余压缩数据的时间。
下面是一个简单的例子,使用BZ2Decompress对象一次对输入一个块进行操作,显示读取进度(Python 3,从命令行获取文件名):
# Decompress a bzip2 file, showing progress based on consumed input.
import sys
import os
import bz2
import time
def proc(input):
"""Decompress and process a piece of a compressed stream"""
dat = dec.decompress(input)
got = len(dat)
if got != 0: # 0 is common -- waiting for a bzip2 block
# process dat here
pass
return got
# Get the size of the compressed bzip2 file.
path = sys.argv[1]
size = os.path.getsize(path)
# Decompress CHUNK bytes at a time.
CHUNK = 16384
totin = 0
totout = 0
prev = -1
dec = bz2.BZ2Decompressor()
start = time.time()
with open(path, 'rb') as f:
for chunk in iter(lambda: f.read(CHUNK), b''):
# feed chunk to decompressor
got = proc(chunk)
# handle case of concatenated bz2 streams
if dec.eof:
rem = dec.unused_data
dec = bz2.BZ2Decompressor()
got += proc(rem)
# show progress
totin += len(chunk)
totout += got
if got != 0: # only if a bzip2 block emitted
frac = round(1000 * totin / size)
if frac != prev:
left = (size / totin - 1) * (time.time() - start)
print(f'\r{frac / 10:.1f}% (~{left:.1f}s left) ', end='')
prev = frac
# Show the resulting size.
print(end='\r')
print(totout, 'uncompressed bytes')
TA贡献1806条经验 获得超8个赞
在另一个答案的帮助下,我终于找到了解决方案。该想法是使用处理,压缩文件的总大小的压缩文件的大小,并将其用于估计的剩余时间的时间。为达到这个,
将压缩文件作为字节对象读入内存:
byte_data
,速度相当快计算
byte_data
使用的大小total_size = len(byte_data)
包装
byte_data
为byte_f = io.BytesIO(byte_data)
包装
byte_f
为bz2f = bz2.BZ2File(byte_f)
在处理过程中,使用
pos = byte_f.tell()
获取压缩文件中的当前位置计算处理的确切百分比
percent = pos/total_size
记录使用时间,并计算剩余时间
几秒钟后,估计会变得非常准确:
0.01% processed, 2.00s elapsed, 17514.27s remaining...
0.02% processed, 4.00s elapsed, 20167.48s remaining...
0.03% processed, 6.00s elapsed, 21239.60s remaining...
0.04% processed, 8.00s elapsed, 21818.91s remaining...
0.05% processed, 10.00s elapsed, 22180.76s remaining...
0.05% processed, 12.00s elapsed, 22427.78s remaining...
0.06% processed, 14.00s elapsed, 22661.80s remaining...
0.07% processed, 16.00s elapsed, 22840.45s remaining...
0.08% processed, 18.00s elapsed, 22937.07s remaining...
....
99.97% processed, 22704.28s elapsed, 6.27s remaining...
99.98% processed, 22706.28s elapsed, 4.40s remaining...
99.99% processed, 22708.28s elapsed, 2.45s remaining...
100.00% processed, 22710.28s elapsed, 0.54s remaining...
添加回答
举报