优雅的熊猫使用 date_range 和各种可能的频率设置进行预填充

我正在尝试预填充类似于以下内容的数据框：在示例中，我随机删除了一些行以突出挑战。我正在尝试*优雅地计算 dti 值。第一行中的 dti 值将为 0（即使第一行按照脚本被删除）但由于 dti 序列中出现间隙需要跳过缺失的行。一种合乎逻辑的方法是将 dt/delta 相除以创建一个唯一的整数来表示桶，但我尝试过的任何东西都感觉不到或看起来很优雅。一些代码来帮助模拟问题：from datetime import datetime, timedeltaimport pandas as pdimport numpy as npstart = datetime.now()nin = 24delta='4H'df = pd.date_range( start, periods=nin, freq=deltadf, name ='dt') # remove some random data pointsfrac_points = 8/24 # Fraction of points to retainr = np.random.rand(nin)df = df[r <= frac_points] # reduce the number of pointsdf = df.to_frame(index=False) # reindexdf['dti'] = ...先感谢您，

查看完整描述

1 回答

万千封印

TA贡献1891条经验获得超3个赞

一种解决方案是将每行之间的时间差除以 timedelta：

from datetime import datetime, timedelta

import pandas as pd

import numpy as np

start = datetime.now()

nin = 24

delta='4H'

df = pd.date_range(start, periods=nin, freq=delta, name='dt')

# Round to nearest ten minutes for better readability

df = df.round('10min')

# Ensure reproducibility

np.random.seed(1)

# remove some random data points

frac_points = 8/24 # Fraction of points to retain

r = np.random.rand(nin)

df = df[r <= frac_points] # reduce the number of points

df = df.to_frame(index=False) # reindex

df['dti'] = df['dt'].diff() / pd.to_timedelta(delta)

df['dti'] = df['dti'].fillna(0).cumsum().astype(int)

dt dti

0 2019-03-17 18:10:00 0

1 2019-03-17 22:10:00 1

2 2019-03-18 02:10:00 2

3 2019-03-18 06:10:00 3

4 2019-03-18 10:10:00 4

5 2019-03-19 10:10:00 10

6 2019-03-19 18:10:00 12

7 2019-03-20 10:10:00 16

8 2019-03-20 14:10:00 17

9 2019-03-21 02:10:00 20

反对回复 2021-12-08

热搜

最近搜索清空

优雅的熊猫使用 date_range 和各种可能的频率设置进行预填充

优雅的熊猫使用 date_range 和各种可能的频率设置进行预填充

1 回答

添加回答