首页猿问如果股票数据落在数据框中的特定时间...

如果股票数据落在数据框中的特定时间之间，则 pandas 会合并股票数据

Python

阿晨1998 2023-07-05 15:51:29

我有 2017 年到 2019 年每分钟的股票数据。我只想保留每天 9:16 之后的数据，因此我想将 9:00 到 9:16 之间的任何数据转换为 9:16 的值，即：09:16 的值应该是open：9:00 - 9:16 期间第一个数据的值，此处为 116.00high：9:00 - 9:16 之间的最高值，此处为 117.00low：9:00 - 9:16 之间的最低值，此处为 116.00close：这将是 9:16 的值，这里是 113.00 open high low closedate 2017-01-02 09:08:00 116.00 116.00 116.00 116.002017-01-02 09:16:00 116.10 117.80 117.00 113.002017-01-02 09:17:00 115.50 116.20 115.50 116.202017-01-02 09:18:00 116.05 116.35 116.00 116.002017-01-02 09:19:00 116.00 116.00 115.60 115.75... ... ... ... ...2029-12-29 15:56:00 259.35 259.35 259.35 259.352019-12-29 15:57:00 260.00 260.00 260.00 260.002019-12-29 15:58:00 260.00 260.00 259.35 259.352019-12-29 15:59:00 260.00 260.00 260.00 260.002019-12-29 16:36:00 259.35 259.35 259.35 259.35这是我尝试过的：#Get data from/to 9:00 - 9:16 and create only one data itemconvertPreTrade = df.between_time("09:00", "09:16") #09:00 - 09:16#combine modified value to original datadf.loc[df.index.strftime("%H:%M") == "09:16" , ["open","high","low","close"] ] = [convertPreTrade["open"][0], convertPreTrade["high"].max(), convertPreTrade["low"].min(), convertPreTrade['close'][-1] ] 但这不会给我准确的数据

查看完整描述

3 回答

杨__羊羊

TA贡献1943条经验获得超7个赞

d = {'date': 'last', 'open': 'last',

'high': 'max', 'low': 'min', 'close': 'last'}

# df.index = pd.to_datetime(df.index)

s1 = df.between_time('09:00:00', '09:16:00')

s2 = s1.reset_index().groupby(s1.index.date).agg(d).set_index('date')

df1 = pd.concat([df.drop(s1.index), s2]).sort_index()

细节：

用于DataFrame.between_time过滤数据框中介于以下df时间之间的行：09:0009:16

print(s1)         open   high    low  close
date                                           
2017-01-02 09:08:00  116.0  116.0  116.0  116.0
2017-01-02 09:16:00  116.1  117.8  117.0  113.0

用于DataFrame.groupby将此过滤后的数据帧分组s1并date使用字典进行聚合d：

print(s2)         open   high    low  close
date                                           
2017-01-02 09:16:00  116.1  117.8  116.0  113.0

使用从原始数据帧中删除介于时间之间的DataFrame.drop行，然后使用将其与相连接，最后使用对索引进行排序：df09:00-09:16pd.concats2DataFrame.sort_index

print(df1)

open high low close

date

2017-01-02 09:16:00 116.10 117.80 116.00 113.00

2017-01-02 09:17:00 115.50 116.20 115.50 116.20

2017-01-02 09:18:00 116.05 116.35 116.00 116.00

2017-01-02 09:19:00 116.00 116.00 115.60 115.75

2019-12-29 15:57:00 260.00 260.00 260.00 260.00

2019-12-29 15:58:00 260.00 260.00 259.35 259.35

2019-12-29 15:59:00 260.00 260.00 260.00 260.00

2019-12-29 16:36:00 259.35 259.35 259.35 259.35

2029-12-29 15:56:00 259.35 259.35 259.35 259.35

反对回复 2023-07-05

喵喵时光机

TA贡献1846条经验获得超7个赞

利用 @r-beginners 数据并添加额外的几行：

import pandas as pd

import numpy as np

import io

data = '''

datetime open high low close

"2017-01-02 09:08:00" 116.00 116.00 116.00 116.00

"2017-01-02 09:16:00" 116.10 117.80 117.00 113.00

"2017-01-02 09:17:00" 115.50 116.20 115.50 116.20

"2017-01-02 09:18:00" 116.05 116.35 116.00 116.00

"2017-01-02 09:19:00" 116.00 116.00 115.60 115.75

"2017-01-03 09:08:00" 259.35 259.35 259.35 259.35

"2017-01-03 09:09:00" 260.00 260.00 260.00 260.00

"2017-01-03 09:16:00" 260.00 260.00 260.00 260.00

"2017-01-03 09:17:00" 261.00 261.00 261.00 261.00

"2017-01-03 09:18:00" 262.00 262.00 262.00 262.00

"2017-12-03 09:18:00" 260.00 260.00 259.35 259.35

"2017-12-04 09:05:00" 260.00 260.00 260.00 260.00

"2017-12-04 09:22:00" 259.35 259.35 259.35 259.35

'''

df = pd.read_csv(io.StringIO(data), sep='\s+')

下面的代码开始了整个过程。可能不是最好的方法，但这是快速而肮脏的方法：

df['datetime'] = pd.to_datetime(df['datetime'])

df = df.set_index('datetime')

df['date'] = df.index.date

dates = np.unique(df.index.date)

first_rows = df.between_time('9:16', '00:00').reset_index().groupby('date').first().set_index('datetime')

first_rows['date'] = first_rows.index.date

dffs = []

for d in dates:

df_day = df[df['date'] == d].sort_index()

first_bar_of_the_day = first_rows[first_rows['date'] == d].copy()

bars_until_first = df_day.loc[df_day.index <= first_bar_of_the_day.index.values[0]]

if ~first_bar_of_the_day.empty:

first_bar_of_the_day['open'] = bars_until_first['open'].values[0]

first_bar_of_the_day['high'] = bars_until_first['high'].max()

first_bar_of_the_day['low'] = bars_until_first['low'].min()

first_bar_of_the_day['close'] = bars_until_first['close'].values[-1]

bars_after_first = df_day.loc[df_day.index > first_bar_of_the_day.index.values[0]]

if len(bars_after_first) > 1:

dff = pd.concat([first_bar_of_the_day, bars_after_first])

else:

dff = first_bar_of_the_day.copy()

print(dff)

dffs.append(dff)

combined_df = pd.concat([x for x in dffs])

print(combined_df)

打印结果如下：dff对于不同日期

open high low close date

datetime

2017-01-02 09:16:00 116.00 117.80 116.0 113.00 2017-01-02

2017-01-02 09:17:00 115.50 116.20 115.5 116.20 2017-01-02

2017-01-02 09:18:00 116.05 116.35 116.0 116.00 2017-01-02

2017-01-02 09:19:00 116.00 116.00 115.6 115.75 2017-01-02

open high low close date

datetime

2017-01-03 09:16:00 259.35 260.0 259.35 260.0 2017-01-03

2017-01-03 09:17:00 261.00 261.0 261.00 261.0 2017-01-03

2017-01-03 09:18:00 262.00 262.0 262.00 262.0 2017-01-03

open high low close date

datetime

2017-12-03 09:18:00 260.0 260.0 259.35 259.35 2017-12-03

open high low close date

datetime

2017-12-04 09:22:00 260.0 260.0 259.35 259.35 2017-12-04

这combined_df

open high low close date

datetime

2017-01-02 09:16:00 116.00 117.80 116.00 113.00 2017-01-02

2017-01-02 09:17:00 115.50 116.20 115.50 116.20 2017-01-02

2017-01-02 09:18:00 116.05 116.35 116.00 116.00 2017-01-02

2017-01-02 09:19:00 116.00 116.00 115.60 115.75 2017-01-02

2017-01-03 09:16:00 259.35 260.00 259.35 260.00 2017-01-03

2017-01-03 09:17:00 261.00 261.00 261.00 261.00 2017-01-03

2017-01-03 09:18:00 262.00 262.00 262.00 262.00 2017-01-03

2017-12-03 09:18:00 260.00 260.00 259.35 259.35 2017-12-03

2017-12-04 09:22:00 260.00 260.00 259.35 259.35 2017-12-04

旁注：我不太确定您清除数据的方式是否是最好的，也许您可以看看是否完全忽略每天上午 9:16 之前的时间，甚至进行分析以检查前 15 个数据的波动性分钟来决定。

反对回复 2023-07-05

元芳怎么了

TA贡献1798条经验获得超7个赞

摘录时间为 9:00 至 9:16。数据框按年、月和日分组，并根据 OHLC 值进行计算。该逻辑使用您的代码。最后，添加 9:16 的日期列。由于我们没有所有数据，因此我们可能遗漏了一些考虑因素，但基本形式保持不变。

import pandas as pd

import numpy as np

import io

data = '''

date open high low close

"2017-01-02 09:08:00" 116.00 116.00 116.00 116.00

"2017-01-02 09:16:00" 116.10 117.80 117.00 113.00

"2017-01-02 09:17:00" 115.50 116.20 115.50 116.20

"2017-01-02 09:18:00" 116.05 116.35 116.00 116.00

"2017-01-02 09:19:00" 116.00 116.00 115.60 115.75

"2017-01-03 09:08:00" 259.35 259.35 259.35 259.35

"2017-01-03 09:09:00" 260.00 260.00 260.00 260.00

"2017-12-03 09:18:00" 260.00 260.00 259.35 259.35

"2017-12-04 09:05:00" 260.00 260.00 260.00 260.00

"2017-12-04 09:22:00" 259.35 259.35 259.35 259.35

'''

df = pd.read_csv(io.StringIO(data), sep='\s+')

df.reset_index(drop=True, inplace=True)

df['date'] = pd.to_datetime(df['date'])

# 9:00-9:16

df_start = df[((df['date'].dt.hour == 9) & (df['date'].dt.minute >= 0)) & ((df['date'].dt.hour == 9) & (df['date'].dt.minute <=16))]

# calculate

df_new = (df_start.groupby([df['date'].dt.year, df['date'].dt.month, df['date'].dt.day])

.agg(open_first=('open', lambda x: x.iloc[0,]),

high_max=('high','max'),

low_min=('low', 'min'),

close_shift=('close', lambda x: x.iloc[-1,])))

df_new.index.names = ['year', 'month', 'day']

df_new.reset_index(inplace=True)

df_new['date'] = df_new['year'].astype(str)+'-'+df_new['month'].astype(str)+'-'+df_new['day'].astype(str)+' 09:16:00'

year month day open_first high_max low_min close_shift date

0 2017 1 2 116.00 117.8 116.00 113.0 2017-1-2 09:16:00

1 2017 1 3 259.35 260.0 259.35 260.0 2017-1-3 09:16:00

2 2017 12 4 260.00 260.0 260.00 260.0 2017-12-4 09:16:00

反对回复 2023-07-05

3 回答
0 关注
136 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如果股票数据落在数据框中的特定时间之间，则 pandas 会合并股票数据

如果股票数据落在数据框中的特定时间之间，则 pandas 会合并股票数据

3 回答

细节：

添加回答