我有一个数据框,我想将重影行(现有行的副本)附加到该数据框。 id month as_of_date1 turn age 119 5712 201401 2014-01-01 9 0120 5712 201402 2014-02-01 9 1121 5712 201403 2014-03-01 9 2122 5712 201404 2014-04-01 9 3123 5712 201405 2014-05-01 9 4124 5712 201406 2014-06-01 9 5125 9130 201401 2014-01-01 9 0126 9130 201402 2014-02-01 9 1127 9130 201403 2014-03-01 9 2128 9130 201404 2014-04-01 9 3129 9130 201405 2014-05-01 9 4幻影行是根据条件选择的:如果年龄小于转弯,我们需要附加最新的行,直到age== turn of或as_of_date1 == now()现在我正在使用以下代码,但是由于数据很大,大约200k行,包含100个字段,因此永远都需要tdf1=tdf.loc[(tdf['age']<tdf['turn'])]tdf2=tdf1.drop_duplicates(subset=['id'],keep='last')leads=tdf2.index.tolist()for lead in leads: ttdf=tdf.loc[[lead]] diff1 = relativedelta.relativedelta(datetime.datetime(2018,6,1),tdf.loc[lead,'as_of_date1']).months diff2=tdf.loc[lead,'turn']-tdf.loc[lead,'age'] diff=min(diff1,diff2) for i in range(0,diff): tdf = tdf.append(ttdf, ignore_index=True)预期结果: id month as_of_date1 turn age 119 5712 201401 2014-01-01 9 0120 5712 201402 2014-02-01 9 1121 5712 201403 2014-03-01 9 2122 5712 201404 2014-04-01 9 3123 5712 201405 2014-05-01 9 4124 5712 201406 2014-06-01 9 5125 9130 201401 2014-01-01 9 0126 9130 201402 2014-02-01 9 1127 9130 201403 2014-03-01 9 2128 9130 201404 2014-04-01 9 3129 9130 201405 2014-05-01 9 4130 5712 201406 2014-06-01 9 5131 5712 201406 2014-06-01 9 5132 5712 201406 2014-06-01 9 5133 5712 201406 2014-06-01 9 5134 9130 201405 2014-05-01 9 4135 9130 201405 2014-05-01 9 4136 9130 201405 2014-05-01 9 4137 9130 201405 2014-05-01 9 4138 9130 201405 2014-05-01 9 4如果有人知道更快的算法,我将不胜感激
1 回答
元芳怎么了
TA贡献1798条经验 获得超7个赞
在附加到数据帧的评论中提到的那样确实很消耗内存,并且根本不建议在循环中执行此操作。所以我使用了以下令人难以置信的提高速度的方法
a=[]
for lead in leads:
ttdf=tdf.loc[[lead]]
diff1 = relativedelta.relativedelta(datetime.datetime(2018,6,1),tdf.loc[lead,'as_of_date1']).months
diff2=tdf.loc[lead,'turn']-tdf.loc[lead,'age']
diff=min(diff1,diff2)
for i in range(0,diff):
a.append(ttdf)
tdf = tdf.append(a, ignore_index=True)
添加回答
举报
0/150
提交
取消