2 回答
TA贡献1830条经验 获得超3个赞
起始 df
data = [
['10/01/2015',0 ,57 ,91,0.66],
['11/01/2015',0, 678, 8,0.99],
['02/01/2016',0,68,12,0.12],
['10/01/2015' ,1, 78,79,0.22],
['11/01/2015' ,1 ,99,56, 0.67],
['1/01/2016', 1 ,789,67,0.78],
['10/01/2015', 3,678, 178, 0.780],
['11/01/2015' ,3, 2880,578,0.678]
]
df = pd.DataFrame(data, columns=['Start_Month', 'Bucket', 'Count', 'Complete','Partial'])#.set_index('Start_Month')
df['Start_Month'] = pd.to_datetime(df['Start_Month'])
Start_Month Bucket Count Complete Partial
0 2015-10-01 0 57 91 0.660
1 2015-11-01 0 678 8 0.990
2 2016-02-01 0 68 12 0.120
3 2015-10-01 1 78 79 0.220
4 2015-11-01 1 99 56 0.670
5 2016-01-01 1 789 67 0.780
6 2015-10-01 3 678 178 0.780
7 2015-11-01 3 2880 578 0.678
为存储桶 0 制作一个包含完整日期的单独 df
df0 = pd.DataFrame([pd.date_range('2015-10-01', '2016-02-01', freq='MS'),
[0]*5]).T.rename(columns={0: 'Start_Month', 1:'Bucket'})
Start_Month Bucket
0 2015-10-01 0
1 2015-11-01 0
2 2015-12-01 0
3 2016-01-01 0
4 2016-02-01 0
按相应日期和桶过滤原始 df 并将结果与 df0 合并
df_filt = df[(df['Start_Month'].isin(df0['Start_Month'])) & (df['Bucket'] == 0)]
df0 = pd.merge(df0, df_filt, left_on='Start_Month', right_on='Start_Month', how='outer')
df0 = df0.drop('Bucket_y', axis=1).rename(columns={'Bucket_x': 'Bucket'})
Start_Month Bucket Count Complete Partial
0 2015-10-01 0 57.0 91.0 0.66
1 2015-11-01 0 678.0 8.0 0.99
2 2015-12-01 0 NaN NaN NaN
3 2016-01-01 0 NaN NaN NaN
4 2016-02-01 0 68.0 12.0 0.12
对存储桶 1、2 和 3 重复该过程,创建 df1、df2、df3。
(由于重复而没有显示这一点......当然你可以循环执行此操作)。然后将所有 4 个 df 连接在一起,并用零填充 na。
# Concat
df_final = pd.concat([df0, df1, df2, df3], axis=0).fillna(0)
Start_Month Bucket Count Complete Partial
0 2015-10-01 0 57.0 91.0 0.660
1 2015-11-01 0 678.0 8.0 0.990
2 2015-12-01 0 0.0 0.0 0.000
3 2016-01-01 0 0.0 0.0 0.000
4 2016-02-01 0 68.0 12.0 0.120
0 2015-10-01 1 78.0 79.0 0.220
1 2015-11-01 1 99.0 56.0 0.670
2 2015-12-01 1 0.0 0.0 0.000
3 2016-01-01 1 789.0 67.0 0.780
4 2016-02-01 1 0.0 0.0 0.000
0 2015-10-01 2 0.0 0.0 0.000
1 2015-11-01 2 0.0 0.0 0.000
2 2015-12-01 2 0.0 0.0 0.000
3 2016-01-01 2 0.0 0.0 0.000
4 2016-02-01 2 0.0 0.0 0.000
0 2015-10-01 3 678.0 178.0 0.780
1 2015-11-01 3 2880.0 578.0 0.678
2 2015-12-01 3 0.0 0.0 0.000
3 2016-01-01 3 0.0 0.0 0.000
4 2016-02-01 3 0.0 0.0 0.000
更新:显示完全循环的代码,并在评论中回答您的问题。
def get_separate_df(df, bucket_num):
df_bucket = pd.DataFrame([pd.date_range('2015-10-01', '2016-02-01', freq='MS'),
[bucket_num]*5]).T.rename(columns={0: 'Start_Month', 1:'Bucket'})
df_filt = df[(df['Start_Month'].isin(df_bucket['Start_Month'])) & \
(df['Bucket'] == bucket_num)]
df_bucket = pd.merge(df_bucket, df_filt, left_on='Start_Month', right_on='Start_Month', how='outer')
df_bucket = df_bucket.drop('Bucket_y', axis=1).rename(columns={'Bucket_x': 'Bucket'})
return df_bucket
dfs = [get_separate_df(df, i) for i in range(4)]
# Concat
df_final = pd.concat(dfs, axis=0).fillna(0)
至于评论中的问题,您可以获得一个空数据框,其中包含重复的日期和存储桶序列,如下所示:
bucket_list = [ele for ele in [0,1,2,3] for i in range(5)]
dates = list(pd.date_range('2015-10-01', '2016-02-01', freq='MS'))*4
df = pd.DataFrame(data=[dates, bucket_list]).T.rename(columns={0:'Start_Month', 1:'Bucket'})
Output:
Start_Month Bucket
0 2015-10-01 0
1 2015-11-01 0
2 2015-12-01 0
3 2016-01-01 0
4 2016-02-01 0
5 2015-10-01 1
6 2015-11-01 1
7 2015-12-01 1
8 2016-01-01 1
9 2016-02-01 1
10 2015-10-01 2
11 2015-11-01 2
12 2015-12-01 2
13 2016-01-01 2
14 2016-02-01 2
15 2015-10-01 3
16 2015-11-01 3
17 2015-12-01 3
18 2016-01-01 3
19 2016-02-01 3
TA贡献1878条经验 获得超4个赞
写了一篇类似的文章,但只是概括了一点
import pandas as pd
import numpy as np
# converting date string to date
df['Start_Month'] = pd.to_datetime(df['Start_Month'])
# finding the the date range and increasin by 1 month start
rng = pd.date_range(df['Start_Month'].min(),df['Start_Month'].max(), freq='MS')
# creating date dataframe
df1 = pd.DataFrame({ 'Start_Month': rng})
# Converting bucket field to integer
df['Bucket'] = df['Bucket'].astype(int)
# finding the bucket values max and min
Bucket=np.arange(df['Bucket'].min(),df['Bucket'].max()+1,1)
# Repeating the date range for every bucket
df1=pd.concat([df1]*len(Bucket))
# repeating bucket values to each date
df1['Bucket']=np.repeat(Bucket, len(rng))
# merging to the previous dataframe and filling it with 0
merged_left = pd.merge(left=df1, right=df, how='left', on=['Start_Month','Bucket']).fillna(0)
添加回答
举报