3 回答
TA贡献1943条经验 获得超7个赞
d = {'date': 'last', 'open': 'last',
'high': 'max', 'low': 'min', 'close': 'last'}
# df.index = pd.to_datetime(df.index)
s1 = df.between_time('09:00:00', '09:16:00')
s2 = s1.reset_index().groupby(s1.index.date).agg(d).set_index('date')
df1 = pd.concat([df.drop(s1.index), s2]).sort_index()
细节:
用于DataFrame.between_time
过滤数据框中介于以下df
时间之间的行:09:00
09:16
print(s1) open high low close date 2017-01-02 09:08:00 116.0 116.0 116.0 116.0 2017-01-02 09:16:00 116.1 117.8 117.0 113.0
用于DataFrame.groupby
将此过滤后的数据帧分组s1
并date
使用字典进行聚合d
:
print(s2) open high low close date 2017-01-02 09:16:00 116.1 117.8 116.0 113.0
使用从原始数据帧中删除介于 时间 之间的DataFrame.drop
行,然后使用将其与 相连接,最后使用对索引进行排序:df
09:00-09:16
pd.concat
s2
DataFrame.sort_index
print(df1)
open high low close
date
2017-01-02 09:16:00 116.10 117.80 116.00 113.00
2017-01-02 09:17:00 115.50 116.20 115.50 116.20
2017-01-02 09:18:00 116.05 116.35 116.00 116.00
2017-01-02 09:19:00 116.00 116.00 115.60 115.75
2019-12-29 15:57:00 260.00 260.00 260.00 260.00
2019-12-29 15:58:00 260.00 260.00 259.35 259.35
2019-12-29 15:59:00 260.00 260.00 260.00 260.00
2019-12-29 16:36:00 259.35 259.35 259.35 259.35
2029-12-29 15:56:00 259.35 259.35 259.35 259.35
TA贡献1846条经验 获得超7个赞
利用 @r-beginners 数据并添加额外的几行:
import pandas as pd
import numpy as np
import io
data = '''
datetime open high low close
"2017-01-02 09:08:00" 116.00 116.00 116.00 116.00
"2017-01-02 09:16:00" 116.10 117.80 117.00 113.00
"2017-01-02 09:17:00" 115.50 116.20 115.50 116.20
"2017-01-02 09:18:00" 116.05 116.35 116.00 116.00
"2017-01-02 09:19:00" 116.00 116.00 115.60 115.75
"2017-01-03 09:08:00" 259.35 259.35 259.35 259.35
"2017-01-03 09:09:00" 260.00 260.00 260.00 260.00
"2017-01-03 09:16:00" 260.00 260.00 260.00 260.00
"2017-01-03 09:17:00" 261.00 261.00 261.00 261.00
"2017-01-03 09:18:00" 262.00 262.00 262.00 262.00
"2017-12-03 09:18:00" 260.00 260.00 259.35 259.35
"2017-12-04 09:05:00" 260.00 260.00 260.00 260.00
"2017-12-04 09:22:00" 259.35 259.35 259.35 259.35
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
下面的代码开始了整个过程。可能不是最好的方法,但这是快速而肮脏的方法:
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.set_index('datetime')
df['date'] = df.index.date
dates = np.unique(df.index.date)
first_rows = df.between_time('9:16', '00:00').reset_index().groupby('date').first().set_index('datetime')
first_rows['date'] = first_rows.index.date
dffs = []
for d in dates:
df_day = df[df['date'] == d].sort_index()
first_bar_of_the_day = first_rows[first_rows['date'] == d].copy()
bars_until_first = df_day.loc[df_day.index <= first_bar_of_the_day.index.values[0]]
if ~first_bar_of_the_day.empty:
first_bar_of_the_day['open'] = bars_until_first['open'].values[0]
first_bar_of_the_day['high'] = bars_until_first['high'].max()
first_bar_of_the_day['low'] = bars_until_first['low'].min()
first_bar_of_the_day['close'] = bars_until_first['close'].values[-1]
bars_after_first = df_day.loc[df_day.index > first_bar_of_the_day.index.values[0]]
if len(bars_after_first) > 1:
dff = pd.concat([first_bar_of_the_day, bars_after_first])
else:
dff = first_bar_of_the_day.copy()
print(dff)
dffs.append(dff)
combined_df = pd.concat([x for x in dffs])
print(combined_df)
打印结果如下:dff对于不同日期
open high low close date
datetime
2017-01-02 09:16:00 116.00 117.80 116.0 113.00 2017-01-02
2017-01-02 09:17:00 115.50 116.20 115.5 116.20 2017-01-02
2017-01-02 09:18:00 116.05 116.35 116.0 116.00 2017-01-02
2017-01-02 09:19:00 116.00 116.00 115.6 115.75 2017-01-02
open high low close date
datetime
2017-01-03 09:16:00 259.35 260.0 259.35 260.0 2017-01-03
2017-01-03 09:17:00 261.00 261.0 261.00 261.0 2017-01-03
2017-01-03 09:18:00 262.00 262.0 262.00 262.0 2017-01-03
open high low close date
datetime
2017-12-03 09:18:00 260.0 260.0 259.35 259.35 2017-12-03
open high low close date
datetime
2017-12-04 09:22:00 260.0 260.0 259.35 259.35 2017-12-04
这combined_df
open high low close date
datetime
2017-01-02 09:16:00 116.00 117.80 116.00 113.00 2017-01-02
2017-01-02 09:17:00 115.50 116.20 115.50 116.20 2017-01-02
2017-01-02 09:18:00 116.05 116.35 116.00 116.00 2017-01-02
2017-01-02 09:19:00 116.00 116.00 115.60 115.75 2017-01-02
2017-01-03 09:16:00 259.35 260.00 259.35 260.00 2017-01-03
2017-01-03 09:17:00 261.00 261.00 261.00 261.00 2017-01-03
2017-01-03 09:18:00 262.00 262.00 262.00 262.00 2017-01-03
2017-12-03 09:18:00 260.00 260.00 259.35 259.35 2017-12-03
2017-12-04 09:22:00 260.00 260.00 259.35 259.35 2017-12-04
旁注:我不太确定您清除数据的方式是否是最好的,也许您可以看看是否完全忽略每天上午 9:16 之前的时间,甚至进行分析以检查前 15 个数据的波动性分钟来决定。
TA贡献1798条经验 获得超7个赞
摘录时间为 9:00 至 9:16。数据框按年、月和日分组,并根据 OHLC 值进行计算。该逻辑使用您的代码。最后,添加 9:16 的日期列。由于我们没有所有数据,因此我们可能遗漏了一些考虑因素,但基本形式保持不变。
import pandas as pd
import numpy as np
import io
data = '''
date open high low close
"2017-01-02 09:08:00" 116.00 116.00 116.00 116.00
"2017-01-02 09:16:00" 116.10 117.80 117.00 113.00
"2017-01-02 09:17:00" 115.50 116.20 115.50 116.20
"2017-01-02 09:18:00" 116.05 116.35 116.00 116.00
"2017-01-02 09:19:00" 116.00 116.00 115.60 115.75
"2017-01-03 09:08:00" 259.35 259.35 259.35 259.35
"2017-01-03 09:09:00" 260.00 260.00 260.00 260.00
"2017-12-03 09:18:00" 260.00 260.00 259.35 259.35
"2017-12-04 09:05:00" 260.00 260.00 260.00 260.00
"2017-12-04 09:22:00" 259.35 259.35 259.35 259.35
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
df.reset_index(drop=True, inplace=True)
df['date'] = pd.to_datetime(df['date'])
# 9:00-9:16
df_start = df[((df['date'].dt.hour == 9) & (df['date'].dt.minute >= 0)) & ((df['date'].dt.hour == 9) & (df['date'].dt.minute <=16))]
# calculate
df_new = (df_start.groupby([df['date'].dt.year, df['date'].dt.month, df['date'].dt.day])
.agg(open_first=('open', lambda x: x.iloc[0,]),
high_max=('high','max'),
low_min=('low', 'min'),
close_shift=('close', lambda x: x.iloc[-1,])))
df_new.index.names = ['year', 'month', 'day']
df_new.reset_index(inplace=True)
df_new['date'] = df_new['year'].astype(str)+'-'+df_new['month'].astype(str)+'-'+df_new['day'].astype(str)+' 09:16:00'
year month day open_first high_max low_min close_shift date
0 2017 1 2 116.00 117.8 116.00 113.0 2017-1-2 09:16:00
1 2017 1 3 259.35 260.0 259.35 260.0 2017-1-3 09:16:00
2 2017 12 4 260.00 260.0 260.00 260.0 2017-12-4 09:16:00
添加回答
举报