1 回答
TA贡献1809条经验 获得超8个赞
您可以DataFrame.drop_duplicates稍微提高性能:
d = {'HitTime':0,'HitNumber':0,'PagePath':'Home'}
df_first = df.drop_duplicates(['VisitorID', 'EpochTime']).assign(**d)
df_final = (pd.concat([df, df_first], ignore_index=True)
.sort_values(['VisitorID', 'EpochTime', 'HitNumber'])
.reset_index(drop=True))
print(df_final)
VisitorID EpochTime HitTime HitNumber PagePath
0 1000 1521333510 0 0 Home
1 1000 1521333510 990 14 orders/details
2 1000 1521333510 4149 29 orders/payment
3 1000 1521333510 6450 54 orders/myorders
4 1000 1554888560 0 0 Home
5 1000 1554888560 1400 23 orders/details
6 1000 1554888560 5340 54 orders/payment
7 1000 1554888560 7034 55 orders/afterpayment
8 1000 1554888560 11034 65 orders/myorders
9 1000 1554888560 13059 110 customercare
另一个想法是df_first通过减去并按索引最后排序来更改索引值:
d = {'HitTime':0,'HitNumber':0,'PagePath':'Home'}
df_first = df.drop_duplicates(['VisitorID', 'EpochTime']).assign(**d)
df_first.index -= .5
df_final = pd.concat([df, df_first]).sort_index().reset_index(drop=True)
print(df_final)
VisitorID EpochTime HitTime HitNumber PagePath
0 1000 1554888560 0 0 Home
1 1000 1554888560 1400 23 orders/details
2 1000 1554888560 5340 54 orders/payment
3 1000 1554888560 7034 55 orders/afterpayment
4 1000 1554888560 11034 65 orders/myorders
5 1000 1554888560 13059 110 customercare
6 1000 1521333510 0 0 Home
7 1000 1521333510 990 14 orders/details
8 1000 1521333510 4149 29 orders/payment
9 1000 1521333510 6450 54 orders/myorders
添加回答
举报