3 回答

TA贡献1757条经验 获得超7个赞
我们可以groupby在您的索引(id)的第一级,然后标记所有的行eq。然后使用cumsumwhich 也转换True为1and Falseto 0:
df['status'] = df.groupby(level=0).apply(lambda x: x.eq(1).cumsum())
输出
event status
id year
1 2013 1 1
2014 0 1
2015 0 1
2016 0 1
2017 0 1
2 2014 0 0
2015 0 0
2016 1 1
2017 0 1
3 2016 1 1
2017 0 1
4 2013 0 0
2014 1 1
2015 0 1
5 2014 0 0
2015 0 0
2016 0 0
2017 1 1

TA贡献1780条经验 获得超5个赞
关键是使用cumsum下groupby
df = pd.DataFrame({'id' : [1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,5],
'year' : [2013,2014,2015,2016,2017,2014,2015,2016,2017,
2016,2017,2013,2014,2015,2014,2015,2016,2017],
'event' : [1,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1]})
(df.assign(status = lambda x: x.event.eq(1).mul(1).groupby(x['id']).cumsum())
.set_index(['id','year']))
输出
event status
id year
1 2013 1 1
2014 0 1
2015 0 1
2016 0 1
2017 0 1
2 2014 0 0
2015 0 0
2016 1 1
2017 0 1
3 2016 1 1
2017 0 1
4 2013 0 0
2014 1 1
2015 0 1
5 2014 0 0
2015 0 0
2016 0 0
2017 1 1

TA贡献1856条经验 获得超11个赞
带有段落解释的基本答案:
import pandas as pd
df = pd.DataFrame({'id' : [1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,5],
'year' : [2013,2014,2015,2016,2017,2014,2015,2016,2017,
2016,2017,2013,2014,2015,2014,2015,2016,2017],
'event' : [1,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1]})
# extract unique IDs as list
ids = list(set(df["id"]))
# initialize a list to keep the results
list_event_years =[]
#open a loop on IDs
for id in ids :
# set happened to 0
event_happened = 0
# open a loop on DF pertaining to the actual ID
for index, row in df[df["id"] == id].iterrows() :
# if event happened set the variable to 1
if row["event"] == 1 :
event_happened = 1
# add the var to the list of results
list_event_years.append(event_happened)
# add the list of results as DF column
df["event-happened"] = list_event_years
### OUTPUT
>>> df
id year event event-year
0 1 2013 1 1
1 1 2014 0 1
2 1 2015 0 1
3 1 2016 0 1
4 1 2017 0 1
5 2 2014 0 0
6 2 2015 0 0
7 2 2016 1 1
8 2 2017 0 1
9 3 2016 1 1
10 3 2017 0 1
11 4 2013 0 0
12 4 2014 1 1
13 4 2015 0 1
14 5 2014 0 0
15 5 2015 0 0
16 5 2016 0 0
17 5 2017 1 1
如果您需要像示例中那样对它们进行索引,请执行以下操作:
df.set_index(['id', 'year'], inplace = True)
df.sort_index(inplace = True)
### OUTPUT
>>> df
event event-year
id year
1 2013 1 1
2014 0 1
2015 0 1
2016 0 1
2017 0 1
2 2014 0 0
2015 0 0
2016 1 1
2017 0 1
3 2016 1 1
2017 0 1
4 2013 0 0
2014 1 1
2015 0 1
5 2014 0 0
2015 0 0
2016 0 0
2017 1 1
添加回答
举报