1 回答
TA贡献1831条经验 获得超9个赞
用于DataFrameGroupBy.shift
移动列stage
和Check Out Date
,通过 重塑形状DataFrame.unstack
,因此在最后一步中可以通过移动列减去DataFrame.sub
:
df['Check In Date'] = pd.to_datetime(df['Check In Date'])
df['Check Out Date'] = pd.to_datetime(df['Check Out Date'])
g = df.groupby('Number')
df = (df.assign(shitfted = g['Check Out Date'].shift(),
stage = g['stage'].shift() + ' -> ' + df['stage'])
.set_index(['stage','Number'])[['Check In Date','shitfted']]
.unstack()
.dropna()
)
df = df['Check In Date'].sub(df['shitfted'])
print (df)
Number 1 2
stage
a -> b 04:02:00 1 days 05:43:00
b -> c 00:01:00 0 days 00:01:00
c -> d 00:00:00 0 days 00:01:00
编辑:
对于所有组合,使用交叉连接并按所有组合进行过滤:
df['Check In Date'] = pd.to_datetime(df['Check In Date'])
df['Check Out Date'] = pd.to_datetime(df['Check Out Date'])
from itertools import combinations
c = [f'{a} -> {b}' for a, b in (combinations(df['stage'].unique(), 2))]
print (c)
['a -> b', 'a -> c', 'a -> d', 'b -> c', 'b -> d', 'c -> d']
df = (df.merge(df, on='Number')
.assign(stage = lambda x: x.pop('stage_x') + ' -> ' + x.pop('stage_y'))
.query('stage in @c')
# df = df[df['stage'].isin(c)]
.set_index(['stage','Number'])[['Check In Date_y','Check Out Date_x']]
.unstack())
df = df['Check In Date_y'].sub(df['Check Out Date_x'])
print (df)
Number 1 2
stage
a -> b 04:02:00 1 days 05:43:00
a -> c 07:25:00 1 days 05:44:00
a -> d 07:25:00 1 days 08:03:00
b -> c 00:01:00 0 days 00:01:00
b -> d 00:01:00 0 days 02:20:00
c -> d 00:00:00 0 days 00:01:00
添加回答
举报