首页猿问如何在python中查找连续出现的...

如何在python中查找连续出现的带有条件的值

Python

动漫人物 2022-07-05 17:02:39

我在熊猫中有以下数据框 code tank date time no_operation_flag 123 1 01-01-2019 00:00:00 1 123 1 01-01-2019 00:30:00 1 123 1 01-01-2019 01:00:00 0 123 1 01-01-2019 01:30:00 1 123 1 01-01-2019 02:00:00 1 123 1 01-01-2019 02:30:00 1 123 1 01-01-2019 03:00:00 1 123 1 01-01-2019 03:30:00 1 123 1 01-01-2019 04:00:00 1 123 1 01-01-2019 05:00:00 1 123 1 01-01-2019 14:00:00 1 123 1 01-01-2019 14:30:00 1 123 1 01-01-2019 15:00:00 1 123 1 01-01-2019 15:30:00 1 123 1 01-01-2019 16:00:00 1 123 1 01-01-2019 16:30:00 1 123 2 02-01-2019 00:00:00 1 123 2 02-01-2019 00:30:00 0 123 2 02-01-2019 01:00:00 0 123 2 02-01-2019 01:30:00 0 123 2 02-01-2019 02:00:00 1 123 2 02-01-2019 02:30:00 1 123 2 02-01-2019 03:00:00 1 123 2 03-01-2019 03:30:00 1 123 2 03-01-2019 04:00:00 1 123 1 03-01-2019 14:00:00 1 123 2 03-01-2019 15:00:00 1 123 2 03-01-2019 00:30:00 1 123 2 04-01-2019 11:00:00 1 123 2 04-01-2019 11:30:00 0 123 2 04-01-2019 12:00:00 1 123 2 04-01-2019 13:30:00 1 123 2 05-01-2019 03:00:00 1 123 2 05-01-2019 03:30:00 1 123 2 05-01-2019 04:00:00 1我想要做的是no_operation_flag在坦克级别和日级别标记连续 1 超过 5 次，但时间应该是连续的（时间是半小时级别）。Dataframe 已经在容器、日期和时间级别进行了排序。

查看完整描述

3 回答

守着星空守着你

TA贡献1799条经验获得超8个赞

我认为这是一种非常具有前瞻性且有些肮脏的方式，但很容易理解。

对于行循环，4 行后的检查时间是 2 小时远。
（如果 1 为真）检查所有对应的五个值df['no_operation_flag']都是 1。
（如果 2 为真）将 1 放入对应的 5 个值中df['final_flag']。

# make col with zero

df['final_flag'] = 0

for i in range(1, len(df)-4):

j = i + 4

dt1 = df['date'].iloc[i]+' '+df['time'].iloc[i]

ts1 = pd.to_datetime(dt1)

dt2 = df['date'].iloc[j]+' '+df['time'].iloc[j]

ts2 = pd.to_datetime(dt2)

# timedelta is 2 hours?

if ts2 - ts1 == datetime.timedelta(hours=2, minutes=0):

# all of no_operation_flag == 1?

if (df['no_operation_flag'].iloc[i:j+1] == 1).all():

df['final_flag'].iloc[i:j+1] = 1

反对回复 2022-07-05

慕尼黑5688855

TA贡献1848条经验获得超2个赞

您可以使用这样的解决方案，仅使用新助手过滤每个组的连续日期时间，DataFrame并添加所有缺少的日期merge时间，最后添加新列：

df['datetimes'] = pd.to_datetime(df['date'].astype(str) + ' ' + df['time'].astype(str))

df1 = (df.set_index('datetimes')

.groupby(['code','tank', 'date'])['no_operation_flag']

.resample('30T')

.first()

.reset_index())

shifted1 = df1.groupby(['code','tank', 'date'])['no_operation_flag'].shift()

g1 = df1['no_operation_flag'].ne(shifted1).cumsum()

mask1 = g1.map(g1.value_counts()).gt(5) & df1['no_operation_flag'].eq(1)

df1['final_flag'] = mask1.astype(int)

#print (df1.head(40))

df = df.merge(df1[['code','tank','datetimes','final_flag']]).drop('datetimes', axis=1)

print (df)

code tank date time no_operation_flag final_flag

0 123 1 01-01-2019 00:00:00 1 0

1 123 1 01-01-2019 00:30:00 1 0

2 123 1 01-01-2019 01:00:00 0 0

3 123 1 01-01-2019 01:30:00 1 1

4 123 1 01-01-2019 02:00:00 1 1

5 123 1 01-01-2019 02:30:00 1 1

6 123 1 01-01-2019 03:00:00 1 1

7 123 1 01-01-2019 03:30:00 1 1

8 123 1 01-01-2019 04:00:00 1 1

9 123 1 01-01-2019 05:00:00 1 0

10 123 1 01-01-2019 14:00:00 1 1

11 123 1 01-01-2019 14:30:00 1 1

12 123 1 01-01-2019 15:00:00 1 1

13 123 1 01-01-2019 15:30:00 1 1

14 123 1 01-01-2019 16:00:00 1 1

15 123 1 01-01-2019 16:30:00 1 1

16 123 2 02-01-2019 00:00:00 1 0

17 123 2 02-01-2019 00:30:00 0 0

18 123 2 02-01-2019 01:00:00 0 0

19 123 2 02-01-2019 01:30:00 0 0

20 123 2 02-01-2019 02:00:00 1 0

21 123 2 02-01-2019 02:30:00 1 0

22 123 2 02-01-2019 03:00:00 1 0

23 123 2 03-01-2019 03:30:00 1 0

24 123 2 03-01-2019 04:00:00 1 0

25 123 1 03-01-2019 14:00:00 1 0

26 123 2 03-01-2019 15:00:00 1 0

27 123 2 03-01-2019 00:30:00 1 0

28 123 2 04-01-2019 11:00:00 1 0

29 123 2 04-01-2019 11:30:00 0 0

30 123 2 04-01-2019 12:00:00 1 0

31 123 2 04-01-2019 13:30:00 1 0

32 123 2 05-01-2019 03:00:00 1 0

33 123 2 05-01-2019 03:30:00 1 0

34 123 2 05-01-2019 04:00:00 1 0

反对回复 2022-07-05

江户川乱折腾

TA贡献1851条经验获得超5个赞

利用：

df['final_flag'] = ( df.groupby([df['no_operation_flag'].ne(1).cumsum(),

'tank',

'date',

pd.to_datetime(df['time'].astype(str))

.diff()

.ne(pd.Timedelta(minutes = 30))

.cumsum(),

'no_operation_flag'])['no_operation_flag']

.transform('size')

.gt(5)

.view('uint8') )

print(df)

输出

code tank date time no_operation_flag final_flag

0 123 1 01-01-2019 00:00:00 1 0

1 123 1 01-01-2019 00:30:00 1 0

2 123 1 01-01-2019 01:00:00 0 0

3 123 1 01-01-2019 01:30:00 1 1

4 123 1 01-01-2019 02:00:00 1 1

5 123 1 01-01-2019 02:30:00 1 1

6 123 1 01-01-2019 03:00:00 1 1

7 123 1 01-01-2019 03:30:00 1 1

8 123 1 01-01-2019 04:00:00 1 1

9 123 1 01-01-2019 05:00:00 1 0

10 123 1 01-01-2019 14:00:00 1 1

11 123 1 01-01-2019 14:30:00 1 1

12 123 1 01-01-2019 15:00:00 1 1

13 123 1 01-01-2019 15:30:00 1 1

14 123 1 01-01-2019 16:00:00 1 1

15 123 1 01-01-2019 16:30:00 1 1

16 123 2 02-01-2019 00:00:00 1 0

17 123 2 02-01-2019 00:30:00 0 0

18 123 2 02-01-2019 01:00:00 0 0

19 123 2 02-01-2019 01:30:00 0 0

20 123 2 02-01-2019 02:00:00 1 0

21 123 2 02-01-2019 02:30:00 1 0

22 123 2 02-01-2019 03:00:00 1 0

23 123 2 03-01-2019 03:30:00 1 0

24 123 2 03-01-2019 04:00:00 1 0

25 123 1 03-01-2019 14:00:00 1 0

26 123 2 03-01-2019 15:00:00 1 0

27 123 2 03-01-2019 00:30:00 1 0

28 123 2 04-01-2019 11:00:00 1 0

29 123 2 04-01-2019 11:30:00 0 0

30 123 2 04-01-2019 12:00:00 1 0

31 123 2 04-01-2019 13:30:00 1 0

32 123 2 05-01-2019 03:00:00 1 0

33 123 2 05-01-2019 03:30:00 1 0

反对回复 2022-07-05

3 回答
0 关注
137 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何在python中查找连续出现的带有条件的值

如何在python中查找连续出现的带有条件的值

3 回答

添加回答